打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
人工智能和机器学习在放射学中的应用现状及对常规临床实施的思考

人工智能和机器学习在放射学中的应用现状及对常规临床实施的思考

2020-09-01 12:26

Artificial Intelligence and Machine Learning in Radiology Current State and Considerations for Routine Clinical Implementation

人工智能和机器学习在放射学中的应用现状及对常规临床实施的思考

Key Words: artificial intelligence, deep learning, machine learning, radiology, computer-assisted image processing, medical informatics, radiomics

关键词:人工智能,深度学习,机器学习,放射学,计算机辅助图像处理,医学信息学,放射组学。

摘要:尽管人工智能(AI) 几十年来一直是医学研究的焦点,但在过去的十年里,由于机器学习技术的发展和应用以开发新的算法,放射学领域经历了巨大的创新,也受到了公众的关注。有趣的是,这项创新是由学术界,现有的全球医疗设备供应商同时推动的,并且在风险投资的推动下,最近成立的初创公司也是如此。放射科医生发现自己再次处于领导这一创新的位置,以改善临床工作流程,并最终改善患者的治疗结果。但是,尽管已经多次宣告了当今放射科医生职业的终结,但这样的人工智能算法在2020年的常规临床应用仍然很少见。这篇综述文章的目的是详细描述适当的成像数据作为创新瓶颈的相关性,提供对技术实施中许多障碍的见解,并为经常仅从临床角色来看待人工智能的放射科医生提供更多的视角。由于此类医疗设备的监管审批程序目前正在公开讨论中,并且成像数据的相关性正在发生变化,放射科医生需要将自己确立为各自领域发展的领先看门人,并意识到众多利益相关者,有时还需要了解相互冲突的利益。

Particularly over the last decade, artificial intelligence (AI) algorithms have revolutionized our daily life and are increasingly being applied in routine clinical practice 1. Although improving diagnostic accuracy remains a key objective of such efforts, automation of repetitive clinical duties to save time and avoid reader fatigue is an increasing focus 2. Artificial intelligence has reshuffled the radiology device market: innovation is not only being driven by existing large corporations, but we have seen multiple startups fueled by venture capital introduce new concepts and software 3. The global market value for AI in medical imaging has been predicted to rise from $21.5 billion in 2018 to $264.9 billion by 2026 4.

在过去的十年里,人工智能已经彻底改变了我们的日常生活,并且越来越多地应用于日常临床实践中1。虽然提高诊断准确率仍然是这些努力的主要目标,但自动化重复性临床工作以节省时间和避免阅片者的疲劳日益成为人们关注的焦点2。人工智能已经重塑了放射设备市场:创新不仅是由现有的大公司推动的,而且我们已经看到由风险资本推动的多家初创公司引入了新的概念和软件3。预计人工智能在医学成像领域的全球市场价值将从2018 年的215亿美元增加到2026年的2649亿美元4。

但是,AI 的常规临床应用仍很匮乏,并且通常仅限于大型学术中心5。另外,从放射科医生的角度来看,变化不大:自动检测病理的软件已经存在了数十年,尽管被称为计算机辅助检测,而不是AI 6。然而,从发展的角度来看,软件设计确实有所发展7。目前正在使用的训练算法是机器学习(ML) 和深度学习(DL)技术。此类算法并未在编码阶段就如何检测病理制定一套规则,而是在大量的注释数据集上进行训练8。这迫使美国食品和药物管理局(FDA) 为监管这类软件设备开发全新的概念。

The goal of this review article is to give an insight into important considerations on data acquisition and quality control for training of ML/DL algorithms, provide an overview on the current clinical applicability of AI algorithms, and discuss technical and regulatory challenges when establishing AI in routine radiology practice.

这篇综述文章的目的是深入了解有关ML/DL 算法训练中在数据获取和质量控制方面的重要考虑因素,概述当前人工智能算法的临床适用性,并讨论在常规放射学实践中建立人工智能时面临的技术和监管挑战。

DATA AS THE BOTTLENECK FOR AI

数据成为AI 的瓶颈

The Relationship of Data Selection

数据选择的相关性

Artificial intelligence and ML more specifically represent a great opportunity to improve health care, optimizing workflow, and increasing accuracy and patient care 7. Fields involving large amounts of data, which heavily rely on accurate prognostic models and pattern recognition, such as radiology, can benefit the most from the advantages AI has to offer. Although imaging data are commonly available in academic institutions and large public datasets are accessible, meticulously annotated imaging data that can be used to train, validate, and test AI algorithms remains a crucial bottleneck for manufacturers at any scale 9,10. In computer sciences, this is described as the “garbage in, garbage out” principle as flawed data input would result in nonsense output 11.

更具体地说,人工智能和机器学习为改善医疗保健、优化工作流程以及提高准确性和患者护理提供了的巨大机遇7。涉及大量数据的领域在很大程度上依赖于准确的预测模型和模式识别,例如,放射学可以从人工智能提供的优势中获得最大好处。尽管成像数据通常在学术机构中就可以使用,并且可以访问大型公共数据集,但是对于任何规模的制造商而言,经过仔细注释的可以用于训练、验证和测试AI 算法的成像数据仍然是一个关键瓶颈9,10。在计算机科学中,这被描述为“ 垃圾输入,垃圾输出”原则,因为有缺陷的数据输入将导致无意义的输出11。

Large, publicly available datasets released by joint efforts of academic or even commercial institutions have been commonly used for development of DL-based AI algorithms in the past 12. However, a recent investigation into the quality of the commonly used ChestXray14 dataset revealed that included labels did not accurately reflect the image content 13.The reported positive predictive values of this recent study were 10%to 30% lower than described in the documentation 13. Inaccurate labels for a subset of cases with degenerative joint disease in the Musculoskeletal Radiology dataset were also observed in this study with a sensitivity of 60%. These findings emphasize the recommendation to either obtain data labeled by clinical experts specifically for a certain project or perform visual inspection of any included datasets before application to training algorithms 14.

由学术机构甚至商业机构共同努力发布的大型公开可用数据集在过去已被广泛用于开发基于DL 的AI算法12。然而,最近对常用ChestXray14 数据集质量的调查表明,所包含的标签不能准确反映图像内容13。最近这项研究报告的阳性预测值比文献中所述低10% 至30%13。这项研究还观察到肌肉骨骼放射学数据集中退行性关节疾病病例子集的标签不准确,敏感性为60% 。这些研究结果强调了这样的建议,即要么获得由临床专家专门针对某一项目标注的数据,要么对适用于训练算法的数据集进行视觉检查14。

Getting Data Ready for AI

为AI 准备数据

As a consequence, commercialization of datasets and proper preparation has evolved as a new branch of businesses 15. Such private companies serve both health care institutions wanting to monetize imaging data as well as software companies requiring high-quality datasets to train algorithms 16. When partnering with such a company, careful attention should be paid to ethical and legal matters 17,18. Depending on the clinical scenario in which the imaging data were acquired, individual patient consent may be required. Similar to clinical studies, evaluating existing data usually requires a retrospective study design, which often receives a waiver for informed consent by the local ethics committee.When evaluating imaging data from clinical trials, written informed consent is necessary and every primary investigator would have to approve data exchange, rendering this approach cumbersome.

因此,数据集的商业化和适当的准备已发展成为新的业务分支15。这些私营公司既可以为希望通过影像数据获利的医疗机构提供服务,也可以为需要高质量数据集来训练算法的软件公司提供服务16。与这样的公司合作时,应仔细注意伦理和法律问题17,18。根据获取成像数据的临床情况,可能需要征得个别患者的同意。与临床研究类似,评估现有数据通常需要回顾性研究设计,这通常会得到当地伦理委员会的知情同意豁免。在评估来自临床试验的成像数据时,必须获得书面知情同意,并且每个主要研究者都必须批准数据交换,这使得这种方法变得繁琐。

A crucial next step is identification of suitable cases and proper deidentification 19. Any protected health information (PHI) has to be deleted from the image itself (eg, overlaying information) as well as the digital imaging and communications in medicine (DICOM) metadata. As clinicians may be inexperienced in such tasks, multiple societies including the Radiological Society of North America have provided publicly available tools for this process 20. Nevertheless, potentially identifying information may still be present in DICOM headers, or depending on the type of examination, image data itself may cause reidentification. For example, volumetric reconstructions of head and neck imaging data could potentially be used to allow for facial recognition 21. In a recent study, Schwarz et al 22even demonstrated that reconstructed facial images from cranial MRI could be used to match 83% of anonymous study participants with their photographs. New algorithms for these potential reidentification issues are being developed and will have to be implemented in the future 23.

下一步至关重要的是确定合适的病例和适当地取消身份识别19。任何受保护的健康信息(PHI) 都必须从图像本身(例如,覆盖信息)以及医学数字成像和通信(DICOM)元数据中删除。由于临床医生在这类任务中可能缺乏经验,因此包括北美放射学会在内的多个协会已经为该过程提供了公开可用的工具20。然而,DICOM 标头中仍可能存在潜在的识别信息,或者根据检查类型的不同,图像数据本身可能会导致重新识别。例如,头部和颈部成像数据的容积重建可能被用于面部识别21。在最近的一项研究中,Schwarz 等人22甚至证明,可以使用头颅MRI 重建的面部图像将83%的匿名研究参与者与他们的照片进行匹配。针对这些潜在的重新识别问题的新算法正在开发中,并将在未来实施23。

Another potential PHI security risk is data storage itself. Although data in most health care institutions is stored locally (“on-premise”), AI software developers largely prefer cloud computing as security standards are usually higher, facilitates access to data as well as new algorithms, provides data backup, and tends to be more cost-efficient 10,24. Furthermore, a recent investigation by Gillum et al demonstrated that an alarming amount of medical imaging data in especially smaller health care facilities is unprotected from external access 25. The authors were able to access medical records of more than 5 million patients in the United States alone, although this is likely a global issue.

PHI 的另一个潜在安全风险是数据存储本身。尽管大多数医疗保健机构的数据都存储在本地(“ 内部”),但人工智能软件开发人员更喜欢云计算,因为安全标准通常更高,便于访问数据和新算法,提供数据备份,而且往往更具成本效益10,24。此外,Gillum 等人最近的一项调查表明,特别是规模较小的医疗机构,数量惊人的医学成像数据无法免受外部访问的侵害25。作者仅在美国就能够访问500 多万名患者的医疗记录,尽管这可能是一个全球性问题。

Nevertheless, the current main limitation of cloud storage remains the required speed of the Internet connection. Especially when transferring large datasets or initiating large database queries, cloud computing may be substantially slower than local systems and could even fail 26. However, with the introduction of 5G mobile networks, cloud computing is expected to become more standard for imaging in health care 27.

尽管如此,目前云存储的主要限制仍然是所需的互联网连接速度。尤其是在传输大型数据集或启动大型数据库查询时,云计算可能比本地系统慢很多,甚至可能出现故障26。然而,随着5G 移动网络的引入,云计算有望成为医疗成像的标准27。

Image Labeling and Defining the Ground-Truth

图像的标注和基本事实的定义。

Image selection itself highlights one of the many potential biases when training AI algorithms: by modifying search or inclusion criteria at this stage, the base on which algorithms can learn to identify pathologies may be limited and unfit for clinical reality 28.

在训练AI 算法时,图像选择本身突出了许多潜在偏差之一:通过在此阶段修改搜索或纳入标准,算法可以学习识别病理的基础可能有限并且不适合临床实际28。

Furthermore, there are different ground-truth levels 6: for some diagnoses, the image itself may show sufficient findings (eg, intracranial hemorrhage). However, similar to clinical routine, the majority of findings is not definitive on a single imaging examination and may require additional imaging (eg, lung cancer on x-ray and subsequent CT) or correlation with clinical parameters (eg, pathology), especially very large datasets usually come without accompanying detailed imaging reports, or often only include final diagnoses 12. This information can then be parsed using natural language processing (NLP), which has its own limitations 29. This approach may be the most scalable, and often, relatively low-quality data are sufficient to train certain AI algorithms due to the amount of data used. However, especially the fact that findings in images are not necessarily directly linked to certain diagnoses creates a level of uncertainty, which may be of limited help to properly train algorithms to detect disease 30. Oakden-Rayner 13demonstrated that images from the ChestXray14 dataset had been labeled as positive for emphysema, but upon review, subcutaneous emphysema instead of pulmonary emphysema was present in 86% of cases. Although such differences may be easily recognizable by a clinical expert, they may be overlooked by computer scientists leading to improperly trained algorithms. Other authors have described relevant limitations of NLP in evaluation of imaging reports 29. Figure 1 shows an example of data preparation by clinical experts following NLP preparation. The fact that structured reporting is still rare in clinical routine compared with free-text reports adds to this problem, although its advantages have been demonstrated for various clinical conditions 31.

此外,存在不同的基本事实级别6:对于某些诊断,图像本身可能会显示明显的表现( 例如,颅内出血)。然而,与临床常规类似,大多数发现在单一的影像学检查中并不确定,并且可能需要更多的影像学检查(例如,x光上的肺癌和随后的CT)或与临床参数(例如,病理)相关,特别是非常大的数据集,通常没有附带详细的影像报告,或者通常仅包含最终诊断12。然后,可以使用自然语言处理(NLP) 来分析这些信息,这有其自身的局限性29。这种方法可能是最具扩展性的,而且通常情况下,由于使用的数据量大,相对低质量的数据足以训练某些AI 算法。然而,尤其是图像中的发现不一定与某些诊断直接相关的事实造成了一定程度的不确定性,这可能对正确训练算法以检测疾病的帮助有限30。Oakden-Rayner 13证明,ChestXray14 数据集中的图像已被标记为肺气肿阳性,但回顾后发现,86%的病例中出现的是皮下气肿,而不是肺气肿,尽管临床专家很容易识别出这些差异,但它们可能会被计算机科学家忽视,从而导致算法训练不当。其他作者已经描述了NLP在影像报告评估中的相关局限性29。图1 显示了临床专家在NLP准备之后进行数据准备的示例。与自由文本报告相比,结构化报告在临床常规中仍然很少见,这一事实加剧了这个问题,尽管它的优势已经在各种临床条件下得到证明31。

Therefore, particularly when developing AI algorithms to detect very specific pathologies with high sensitivity and specificity, datasets with manually labeled findings are preferred 32. Figure 2 demonstrates a case with manual segmentation by 2 independent clinical experts. Although this approach has become relatively standard outside the medical field 33, it remains rare in assessment of radiological imaging. However, it appears that human interaction may be more important than imaging expertise. In a recent study, segmentations of the liver on publicly available CT datasets were compared between nonexperts, engineers with domain knowledge, medical students, and radiologists 34. Accuracy was similar between these groups while experts were more time efficient. Heim et al concluded that such crowdsourcing may be a possible solution for cost-effective large-scale annotation of medical imaging data. However, such efforts cannot be used to test algorithms for the purposes of regulatory approval since this requires review by certified clinical experts.

因此,特别是在开发人工智能算法来检测具有高灵敏度和特异性的非常特定的病理时,最好使用具有手动标记结果的数据集32。图2 展示了一个由2名独立临床专家手动分割的病例。尽管这种方法在医学领域之外已经变得相对标准33,但在放射影像学评估中仍然很少见。然而,似乎人与人之间的互动可能比成像专业知识更重要。在最近的一项研究中,比较了非专家、具有领域知识的工程师、医学生和放射科医生之间在公开可获得的CT 数据集上对肝脏的分割34。这些组之间的准确性相似,而专家的时间效率更高。Heim 等人的结论是,这种众包可能是对医学成像数据进行经济有效的大规模标注的一种可能的解决方案。但是,此类工作不能用于测试算法,因为这需要经过认证的临床专家的审查以获得监管部门的批准。

Considerations in Choosing the Right Dataset

选择正确数据集时的注意事项。

In general, training imaging datasets are substantially larger (minimum 5-fold larger) than datasets used for validation and testing 35. As mentioned before, potential bias in developing an AI algorithm may occur at multiple stages. Datasets from different continents may show both population variability as well as disease prevalence bias, which has been shown particularly for Asian, European, and American populations 36. Technical parameters can also lead to crucial potential biases. It is commonly recommended that image datasets used for training should have been acquired from systems from different vendors 37. This is particularly relevant for multislice imaging systems (CT/MRI) in which differences in acquisition protocols may have more impact than in x-ray images. Finally, clinical experts and researchers may be unaware of certain biases, for example, differences in local practice. Certain imaging findings may be more common in a specific region and thus not be reported 38. In conclusion, using diverse training data from multiple geographical regions is ideal but often not performed in clinical AI research due to limited access 39.

通常,训练成像数据集比用于验证和测试的数据集大得多( 至少大5倍)35。如前所述,开发人工智能算法的潜在偏差可能在多个阶段出现。来自不同大陆的数据集可能会显示人口异质性和疾病流行偏倚,这在亚洲、欧洲和美洲人群中表现得尤为明显36。技术参数也可能导致严重的潜在偏差。通常建议用于训练的图像数据集应从不同供应商的系统获取37。这对于多层成像系统(CT/MRI) 尤其重要,在多层成像系统中,采集协议的不同可能比X射线图像产生更大的影响。最后,临床专家和研究人员可能没有意识到某些偏见,例如,当地实际情况的差异。某些影像学表现在特定区域可能更常见,因此没有报道38。总而言之,使用来自多个地理区域的不同训练数据是理想的,但由于访问受限,因此在临床AI 研究中通常无法执行39。

Subsequent validation should be performed with a representative, yet independent dataset to further improve the performance of the algorithm 35. Here, annotations of high quality are again crucial. The quality of the testing dataset is even more important than that of the training dataset, as it is used for performance testing and regulatory approval.

后续验证应使用具有代表性但独立的数据集来执行,以进一步提高算法的性能35。在此,高质量的注释同样至关重要。测试数据集的质量甚至比训练数据集的质量更重要,因为它用于性能测试和监管审批。

A potential solution to these data availability limitations may be federated learning. Introduced in 2017 by Google (Mountain View, CA), the general principle of this approach is that data remains at the imaging institution while the algorithm itself can be trained at different locations 40. In addition, this solution may mitigate both Internet bandwidth as well as data capacity limitations as data is not moved. Although different specifications of federated learning have been proposed, these techniques have not been evaluated extensively or at a large scale, despite promising initial results 41.

针对这些数据可用性限制的潜在解决方案可以是联合学习。该方法由Google(Mountain View ,CA)于2017年推出,其一般原则是数据保留在成像机构,而算法本身可以在不同的位置进行训练40。此外,由于数据不移动,此解决方案可以缓解互联网带宽和数据容量限制。尽管已经提出了不同的联合学习规范且有望获得初步结果,但这些技术并没有得到广泛或大规模的评估41。

REGULATORY APPROVAL OF AI

监管部门批准AI

The traditional framework for regulatory approval for medical devices employed by the FDA was focused on the risk-benefit balance and intended clinical use 42. Truly novel devices may require a de novo pathway, although the majority of currently approved AI software underwent the 510(k) process after demonstrating that the algorithms represent a modification of an existing, previously approved device. Harvey et al 43very recently published a comprehensive in-depth review of the current FDA framework and legislature determining the future regulatory approval process for AI algorithms. The American College of Radiology under its Data Science Institute also published a regularly updated list of AI algorithms cleared by the FDA 44.

FDA 采用的医疗设备监管审批的传统框架侧重于风险-收益平衡和预期的临床应用42。尽管目前批准的大多数人工智能软件在证明算法代表对现有的、先前批准的设备的修改后都经过了510 (k)认证,但真正新颖的设备可能需要从头开始。Harvey 等43最近发表了一篇对当前FDA 框架和立法机构决定人工智能算法未来监管批准过程的全面深入审查。其数据科学研究所(Data Science Institute)下属的美国放射学院(American College Of Radiology)也定期发布了一份由FDA批准的人工智能算法清单44。

然而,FDA 最近认识到,传统的医疗器械监管过程对于评估AI/ML技术并不理想45,特别是一旦自学习组件成为软件的一部分时。因此,FDA 在2019年提出了一个修改基于AI/ML的软件的监管框架,目前正在与制造商和放射学领导者一起进一步开发43。尽管对此类AI 软件的上市前和上市后评估以及长期控制有不同的概念,但到目前为止,只有没有自学能力的算法被FDA批准用于临床。然而,FDA 已从头开始批准了几种软件产品44,这表明该机构正在努力使具有创新功能的新型设备进入市场。

FDA 传统的医疗器械审批系统,包括人工智能应用,并不是为这些技术而设计的。FDA预计,许多这些AI 和ML 驱动的软件对设备的更改可能需要市场前审查45。根据他们的新方法,FDA 要求制造商对基于AI的软件作为医疗设备的透明性和真实性能监控作出承诺42,45。尽管这一过程的框架仍在讨论中46,拟议的软件作为具有自学习能力的软件的医疗设备途径将很可能对此类设备的修改归类为以下类别之一43:(1 )仅修改算法性能(例如,对其他数据进行再训练,例如来自其他相关区域的大型胸部CT数据集),(2)仅修改输入(例如,其他数据类型,例如MRI相关性提高腹部CT算法检测可疑肝脏肿块的性能),或(3)修改预期用途(例如,不同适应症或患者集体,例如,训练检测早期肺癌的肺结节算法以预测生长)。

The involvement of leading radiological societies, device manufacturers, and patient representative groups will be crucial in this effort to establish a rigorous framework that still drives innovation 18.

领先的放射学会,设备制造商和患者代表团体的参与对于建立驱动创新的严格框架至关重要18。

TECHNICAL DEPLOYMENT

技术部署

Although we have already identified properly annotated data as an important bottleneck for development of advanced AI algorithms, technical integration is the leading bottleneck for implementation of such software into routine radiology practice 47. Interestingly, it appears to be often underestimated in complexity, in our opinion especially by smaller software companies 32.

虽然我们已经确定经过适当注释的数据是开发高级人工智能算法的一个重要瓶颈,但技术集成是将这类软件应用到常规放射实践中的主要瓶颈47。有趣的是,在我们看来,它在复杂性方面似乎经常被低估,特别是小型软件公司32。

复杂性有很多层,技术部署往往是最初的一层47。到目前为止,通常首选在本地服务器上安装软件。临床医生和当地IT 专业人员对本地安全措施充满信心,可以控制本地安装及其数据访问权限,通常会得到制造商的本地支持,在最糟糕的情况下,只需关闭服务器即可。更重要的是,数据通常不会离开医疗机构的网络。

Although this approach is still often found in clinical routine today, cloud-based solutions that upload deidentified datasets to secure remote servers where processing is performed are increasingly offered and preferred by manufacturers 24. There are many potential advantages to this solution:

尽管这种方法在当今的临床日常工作中仍然很常见,但是基于云的解决方案可以将已识别的数据集上传到安全的远程服务器,在该服务器中执行处理,并且越来越受到制造商的青睐24。此解决方案具有许多潜在优势:

1. Computational power required for advanced AI algorithms can easily exceed local capabilities and is available at much lower cost in a cloud setup. Although it may appear appealing to run all postprocessing including AI analysis on CT/MRI systems directly to avoid delays in data transfer, it could drastically slow down scanner performance. It would also directly connect scanners to the Internet, which may cause potential security issues, particularly with older systems. Furthermore, cloud-based solutions are limited by Internet connection upstream and downstream, but do rely on network performance alone, not scanner performance.

1. 高级人工智能算法所需的计算能力可以轻松超越本地功能,并且可以在云设置中以更低的成本获得。尽管直接在CT/MRI 系统上运行包括人工智能分析在内的所有后处理以避免数据传输延迟似乎很有吸引力,但这可能会极大地降低扫描仪的性能。它还会将扫描仪直接连接到互联网,这可能会引起潜在的安全问题,特别是对于较旧的系统。此外,基于云的解决方案受到互联网上下游连接的限制,但确实仅依赖网络性能,而不是扫描仪性能。

2. Security guidelines can be established, updated, and controlled much easier with a cloud-based access solution. In addition, this approach moves the burden of security to some extent away from the local institution to a usually globally acting provider with a much bigger security team. As previously mentioned, a recent investigation demonstrated unsecured external access to millions of PHI datasets at smaller local institutions 25. It also allows streamlining of required documentation, contracts, and local guidelines when establishing such solutions instead of locally developed frameworks.

2. 使用基于云的访问解决方案可以更轻松地制定、更新和控制安全准则。此外,此方法在一定程度上将安全负担从本地机构转移到具有更大安全团队的全球代理提供商。如前所述,最近的一项调查显示,较小规模的本地机构无法安全地从外部访问数百万PHI 数据集25。在建立此类解决方案时它还允许简化所需的文件、合同和地方指南,而不是本地制定的框架。

3. 特别是在上述FDA 努力为自学习人工智能算法建立新指南的情况下,算法生命周期中的定期更新至关重要43。与当前软件医疗设备的使用相反,更新可能还侧重于通过在更大的数据集上训练此类算法,甚至是自学习能力来提高诊断准确性,而不一定是增加新功能本身。分发此类更频繁的更新不能通过本地本地安装进行扩展。反之亦然,人工智能算法在临床放射学中的真正力量将通过将本地运行的软件的结果反馈给制造商来改进正在全球部署的算法,从而发挥其真正的作用48。

4. Associated costs will be lower, mainly expenses for local hardware, data storage, need for local employees, and regional support teams of the manufacturer. Furthermore, response times will be quicker as remote access can be routinely used for support.

4. 相关成本将会降低,主要是本地硬件,数据存储,本地员工需求以及制造商的区域支持团队的支出。此外,由于可以常规使用远程访问来提供支持,因此响应时间将更快。

5. 建立基于云的框架来运行单独的算法将推动创新,因为随后也可以根据放射科医生的偏好使用由较小公司或初创公司开发的软件。因此,所有主要的放射硬件和软件供应商都开发了便于第三方设备访问的市场平台。尽管这种方法有其自身的潜在局限性,但它将是允许竞争和准入的关键,对于为非常具体的病理开发算法的小公司来说也是如此,最终推动创新。

虽然基于云的解决方案也有明显的缺点,目前定期远程传输大数据集所需的时间是最主要的,但我们预计这一问题将在未来得到缓解,类似于摩尔定律,该定律最初声称计算机的处理能力将每两年翻一番49。越来越多的不同规模的医疗保健机构也适应了这一过程,并制定了当地指导方针,从而可以常规实施基于云的软件解决方案。

potentially suspicious lesions on a chest CT currently remains a challenge for all software algorithms, despite increasing efforts to improve data exchange.

当然,决定如何让AI 算法访问放射数据只是临床实施过程的一部分,影像信息学将在促进临床常规检查中发挥关键作用47。尽管直接叠加在PACS 图像上的结果呈现是对放射科医生工作场所的直接入口,但它具有一定的用户交互限制,并且以独立的格式保存数据50。放射信息系统在全球范围内是一个异构市场,并且在尝试包括医院信息系统中的信息时,其复杂性也在增加51。尽管正在努力改善数据交换,在胸部CT 上测量潜在可疑病变时,通过自动显示来自该患者其他检查的关键信息来支持放射科医生的阅读工作流程,目前对所有软件算法来说仍然是一个挑战。

Furthermore, structured reporting is still rarely used in clinical routine, although it would help streamline inclusion of data generated by AI algorithms and has also shown clinical benefits over free-text reports 31. Structured reporting may also help generate annotated imaging data on a large scale and thus drive AI innovation itself 52. To revolutionize the radiological reading workflow with AI, clinicians will also be required to adapt their reporting modus operandi given that appropriate software solutions are made available 8.

此外,尽管结构化报告将有助于简化包含人工智能算法生成的数据,并显示出比自由文本报告更多的临床好处,但在临床常规中仍然很少使用31。结构化报告还可能有助于大规模生成带注释的成像数据,从而推动人工智能本身的创新52。为了用人工智能彻底改变放射学阅读工作流程,在提供适当软件解决方案的前提下,临床医生还需要调整其报告方式8。

Finally, the role of the radiologist in the AI era will evolve 50,53. With the ability to outsource repetitive, low-risk tasks to AI algorithms (eg, measuring pulmonary nodules), radiologists will need to be the gatekeeper for these results into clinical reports 30. Although such innovation may be able to drastically increase workflow efficiency, it is likely to ultimately increase caseload for radiologists as a fraction of the reimbursement will be redirected to costs for such software 15. At the same time, it may help to ultimately improve health care professional satisfaction in daily routine as it may allow focusing attention on more complex tasks 54. Finally, one can expect that new job categories will be created and in particular the role of radiology technicians could evolve 30, as evaluation of AI results may also be considered part of prereading postprocessing, which is currently often performed by imaging laboratories at larger hospitals.55

最后,放射科医生在人工智能时代的作用将发生变化50,53。由于能够将重复性、低风险的任务外包给人工智能算法( 例如,测量肺结节),放射科医生将需要成为这些结果进入临床报告的看门人30。虽然这种创新可能能够大幅提高工作流程效率,但很可能最终会增加放射科医生的工作量,因为报销的一小部分将被重新定向至此类软件的成本15。同时,这可能有助于最终提高日常保健专业人员的满意度,因为它可能允许将注意力集中在更复杂的任务上54。最后,人们可以期望创建新的工作类别,尤其是放射技术人员的角色可以演变30,因为人工智能结果的评估也可能被认为是预读后处理的一部分,目前通常由大型医院的影像实验室执行。

CURRENT STATE OF ARTIFICIAL INTELLIGENCE IN RADIOLOGY

人工智能在放射学中的应用现状

At the current time, over 45 AI algorithms assisting in medical imaging have been approved by the FDA 45. Although still in the early stage of clinical AI implementation, we are entering a more mature stage where an AI application will be integrated in clinical practice and not only influence the daily routine of physicians involved but also will have an effect on patients and outcomes 48. Table 1 provides an overview of all currently approved FDA imaging applications. Figures 3 and 4 demonstrate cases evaluated with commercially available AI algorithms. Besides these approved applications, there are many algorithms being tested at the moment in a research setting only 57.

目前,超过45 种辅助医学成像的人工智能算法已获得FDA的批准45。虽然仍处于临床人工智能实施的早期阶段,但我们正在进入更成熟的阶段,在此阶段,人工智能应用程序将集成到临床实践中,不仅会影响相关医生的日常工作,还会对患者和结果产生影响48。表1 概述了FDA目前批准的所有成像应用程序。图3和图4展示了使用商业可用的人工智能算法评估的案例。除了这些获得批准的应用程序外,目前还有许多算法正在研究环境中进行测试57。

在图像识别的人工智能应用程序的开始阶段,人们假设人工智能最终将取代放射科医生。2016 年,人工智能先驱杰弗里·辛顿(Geoffrey Hinton)表示:“人们现在应该停止培训放射科医生。很明显,在5年内,深度学习将比放射科医生做得更好”58。然而,医学中人工智能的现状表明,这一点被高度夸大了,目前的人工智能应用还远远不能取代59。这些新技术的引入并不总是如此、也不会没有意想不到的负面影响8。正如Bluemke 60最近在一篇社论中所说的那样,放射科医生不可避免的需要采用这项新技术。

Once confined to projection images, for example, chest x-rays, the field of radiology has become more complex seeing the rise of cross-sectional imaging techniques such as MRI and CT. These advances in imaging technologies have markedly increased the amount of data and workload. According to the Medicare claims in the United States, between 1998 and 2010, the number of imaging examinations increased on average by 26% per trainee, with growth largely accounted for by disproportionate increases in complex services (CT and MRI) 61. These increases were confirmed by the Royal College of Radiologists in a 2018 UK report on workforce 62. With the growing number of examinations comes an increasing number of errors. Errors and discrepancies in radiology practice have been reported with an estimated day-to-day rate of 3%to 5%of studies with the main contributing

factors involving staff shortages, excess workload, and the use of inexperienced staff 61.

曾经局限于投影图像,例如胸部X 光,随着MRI和CT等横断面成像技术的兴起,放射学领域变得更加复杂。成像技术的这些进步显著增加了数据量和工作量。根据美国的医疗保险索赔,1998至2010年间,每个受训人员接受影像学检查的数量平均增加了26% ,增长主要是由复杂服务(CT和MRI)的不成比例增长造成的61。这些增长得到了皇家放射科学会在2018 年英国劳动力报告中的证实。随着检查数量的增加,错误的数量也在增加62。据报道,放射学实践中的错误和不符之处估计每天的比率为3% 至5%,主要促成因素包括人员短缺、工作量过大和使用缺乏经验的人员61。

This increase in data provides both opportunity and challenges because of incremental complexity and amount. This offers the perfect working ground for AI applications taking over repetitive tasks, enabling the automatic extraction of information and automated analysis, ultimately assisting radiologists with the workload 48. One of the conditions for the clinical use of AI applications is thorough validation of the algorithms used, ensuring adequate accuracy when used in a clinical population. Park and Han 63recently published a thorough guideline on methodological evaluation of clinical performance and impact of AI algorithms. Efforts by the radiological community to help shape also the regulatory approval process by the FDA with a clear focus on clinical performance under realistic conditions will be crucial 6.

由于日益增加的复杂性和数量,数据的增加既提供了机遇,也带来了挑战。这为人工智能应用程序接管重复性任务提供了理想的工作环境,实现了信息的自动提取和自动分析,最终帮助放射科医生处理工作量48。人工智能应用程序临床使用的条件之一是彻底验证所使用的算法,确保在临床人群中使用时具有足够的准确性。Park 和Han63最近发布了一份关于人工智能算法临床性能和影响的方法学评估的详尽指南。放射学界努力帮助塑造FDA 的监管批准程序,并明确关注现实条件下的临床表现将是至关重要的6。

Most AI applications reported in research are trained, validated, and tested in a research setting, isolated from clinical practice 32. Very few studies validating these applications use prospective data, most studies use retrospective, in silico (ie, via computer modeling or simulation), and previously assembled datasets to train and validate their algorithms 39,44. These datasets often do not represent a routine clinical setting, which might result in varying accuracies when used in clinical practice. Besides the initial hype, the field still lacks examples where AI has been successfully integrated in the complex radiological workflow and improved care, other than for more low-risk tasks such as categorization, detection, segmentation, or quantification 39,64–66. Nevertheless, promising results from smaller retrospective studies have been published regarding the appropriate categorization and thus diagnosis of various types of lesions 67–70. Although Liu et al 39reported in their meta-analysis promising results regarding the overall diagnostic performance of DL algorithms compared with health care professionals, they raised multiple concerns regarding the reproducibility and data selection of many of the included studies. Prospective multicenter trials will be crucial to evaluate the true diagnostic performance and impact on outcome.

研究中报告的大多数人工智能应用程序都是在与临床实践相分离的研究环境中进行培训、验证和测试的32。很少有研究使用前瞻性数据来验证这些应用程序,大多数研究使用回顾性数据( 即通过计算机建模或模拟)和先前组装的数据集来训练和验证其算法39,44。这些数据集通常不代表常规的临床环境,这可能会导致在临床实践中准确性发生变化。除了最初的炒作,该领域仍然缺乏人工智能成功集成到复杂的放射工作流程和改善护理水平中的示例,除了更低风险的任务,如分类、检测、分割或量化39,64–66。尽管如此,关于适当的分类和各种类型病变的诊断,来自较小规模的回顾性研究的有希望的结果已经发表67–70。尽管Liu 等人39在他们的荟萃分析中报告了关于DL 算法的总体诊断性能(与医疗保健专业人员相比)的有希望的结果,但他们对重复性提出了多种担忧。前瞻性多中心试验对于评估真正的诊断性能和对结果的影响至关重要。

Many of the published studies on clinical use of AI still suffer froma common limitation: the black boxmechanismof how these algorithms truly work. Users are forced to put their faith into the outcome of these algorithms, without precisely knowing how these predictions came to life 2. This requires a large amount of trust not only by physicians, but ultimately also patients once AI is routinely implemented.

许多已发表的关于人工智能临床应用的研究仍然受到一个共同的限制:这些算法如何真正发挥作用的黑匣子。用户被迫将他们的信心投入到这些算法的结果中,而不必精确地知道这些预测是如何实现的2。这不仅需要医生的广泛信任,而且一旦人工智能被常规实施,最终也需要患者的信任。

With the increased trust physicians put into medical AI applications, security and safety of these algorithms become similarly important. Adversarial attacks on ML models have gained a lot of increasing attention in the past years. Several examples have shown that AI algorithms can be misleading resulting in intentionally wrong predictions. For example, Thys et al 71demonstrated that a small adversarial patch can hamper the accuracy of object detection using neural networks. However, besides intentional changes of features, unintentional changes due to noise or changes in image protocol can also severely hamper the algorithms accuracy as discussed by Cabitza et al. 72Example studies have shown that hackers were able to manipulate imaging data undetected by radiologists to insert or remove pathologies 73. Multilayer security will be crucial in reliable implementation of AI into clinical routine.

随着医生对医疗人工智能应用程序信任度的提高,这些算法的安全性也变得同样重要。针对ML 模型的对抗攻击在过去几年中已引起越来越多的关注。几个例子表明,人工智能算法可能具有误导性,导致故意错误的预测。例如,Thys 等人71证明,较小的对抗性补丁可能会阻碍使用神经网络进行目标检测的准确性。然而,正如Cabitza 等人72所讨论的那样,除了有意改变特征之外,由于噪声或图像协议改变而引起的无意改变也会严重阻碍算法的准确性。实例研究表明,黑客能够操纵放射科医生未检测到的成像数据来插入或移除病理73。多层安全性将是将人工智能可靠地实施到临床例程中的关键。

人工智能技术的一大进步是提取和分析人眼看不见或人脑无法处理的信息。虽然这可能允许更准确的诊断和预后,但它也伴随着过度诊断的潜在缺点。虽然人工智能可以提供很好的机会来处理增加的工作量,但它也可能通过过度诊断伴随的发现而增加工作量8。可用的筛查程序( 例如乳腺癌、肺癌、结直肠癌等)数量的增加和筛查对象数量的增加代表了自动化人工智能检测算法的理想应用。然而,通过筛查程序61 、74-76发现的肺癌和乳腺癌中,多达三分之一可能被过度诊断77-79,意味着这些癌症在患者一生中不会出现症状,因此对其进行检测和后续治疗是有害的。通过从图像中提取定量信息,其中一些是人眼看不见的,人工智能也可能能够捕获与临床无关的疾病。然而,目前还没有适当的方案来处理这些过度诊断的病例。这不仅是基于AI 的应用的情况,而且对于利用AI的放射组学应用的使用也是如此。最近的一项研究表明,放射组学变量受到预处理的影响80,强调在将这些算法应用到日常临床实践之前需要标准化。

The next step for diagnostic applications is the development of prognostic AI applications. A combination of these 2 could help reduce overdiagnosis. These are often more complex, needing more data to capture the complex relationships. There are several examples of prognostic AI algorithms in a clinical setting, for example, using coronary CT angiography images to predict cardiac risk 81,82. Particularly in oncological imaging, prediction of tumor response and ultimately outcome will revolutionize the field 83. However, these efforts will also rely on the ability to extract more information from images or use advanced acquisition techniques such as dual-energy CT and photon-counting CT 84–89.

A

诊断性应用程序的下一步是开发预后性人工智能应用程序。这两种方法的结合可以帮助减少过度诊断。这些通常更复杂,需要更多数据来捕获复杂的关系。在临床环境中有一些预后性AI算法的示例,例如,使用冠状动脉CT血管造影图像来预测心脏风险81 ,82。特别是在肿瘤成像方面,对肿瘤反应和最终结果的预测将使该领域发生革命性的变化83。然而,这些努力也将依赖于从图像中提取更多信息的能力,或者使用先进的采集技术,如双能量CT和光子计数CT84-89。

CONCLUSIONS

结论

There are multiple obstacles to developing AI algorithms and ultimately employing them in radiological routine workflows. Quality and heterogeneity of data are an often overlooked topic in early, small-scale feasibility studies. However, as the field is evolving and the value of imaging data is of increasing focus, radiologists can be on the forefront of leading this innovation and guiding appropriate quality control frameworks. Radiologists will also need to motivate the industry to defer from siloed software solutions and facilitate interdevice communication so that AI algorithms can be put in position to control the increasingly overwhelming amounts of clinical data to improve workflows and ultimately outcomes. Academia and radiological societies will need to provide proper guidelines and constantly reemphasize the focus on peer-reviewed proof of clinical value despite media hype and potentially conflicting interests by medical device industry and investors.

开发人工智能算法并最终将其应用于放射学常规工作流程存在多个障碍。在早期的小规模可行性研究中,数据的质量和异质性是一个经常被忽视的话题。然而,随着该领域的发展和成像数据的价值日益受到关注,放射科医生可以站在领导这一创新和指导适当的质量控制框架的最前沿。放射科医生还需要激励该行业避免孤立的软件解决方案,促进设备间通信,以便人工智能算法能够到位,以控制日益压倒性的临床数据,以改善工作流程并最终改善结果。尽管媒体大肆宣传,医疗器械行业和投资者可能存在利益冲突,但学术界和放射学会仍需要提供适当的指导方针,并不断强调对同行评审的临床价值证据的关注。

原文链接:

www.investigativeradiology.com

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【精准影像之前沿解析】人工智能在放射领域应用
《经济学人·商论》三月刊免费文章
智能ai深度学习技术
它是「全球最智慧的50家公司」之一,它是AI智能诊断的一颗新星丨奇点猛科技
大数据在医疗行业中的5种应用
大数据分析AI和机器学习在医疗行业的应用
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服