打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
⏲️眼动追踪在理解和改进诊断解释中的应用综述|认知研究:原理与启示|全文
userphoto

2023.07.12 北京

关注

A review of eye tracking for understanding and improving diagnostic interpretation

眼球追踪对理解和改进诊断解释的研究进展

  • Tad T. Brunyé,
  • Trafton Drew,
  • Donald L. Weaver唐纳德 · L · 韦弗 &
  • Joann G. Elmore 

Cognitive Research: Principles and Implications volume 4, Article number: 7 (2019) Cite this article

认知研究: 原理和含义第4卷,文章编号: 7(2019)引用这篇文章

  • 25k Accesses

    两万五

  • 71 Citations

    71次引用

  • 6 Altmetric

    6高度

  • Metrics details

    公制细节

Abstract

摘要

Inspecting digital imaging for primary diagnosis introduces perceptual and cognitive demands for physicians tasked with interpreting visual medical information and arriving at appropriate diagnoses and treatment decisions. The process of medical interpretation and diagnosis involves a complex interplay between visual perception and multiple cognitive processes, including memory retrieval, problem-solving, and decision-making. Eye-tracking technologies are becoming increasingly available in the consumer and research markets and provide novel opportunities to learn more about the interpretive process, including differences between novices and experts, how heuristics and biases shape visual perception and decision-making, and the mechanisms underlying misinterpretation and misdiagnosis. The present review provides an overview of eye-tracking technology, the perceptual and cognitive processes involved in medical interpretation, how eye tracking has been employed to understand medical interpretation and promote medical education and training, and some of the promises and challenges for future applications of this technology.

为初步诊断检查数字成像引入了对负责解释视觉医学信息并作出适当诊断和治疗决定的医生的感知和认知需求。医学解释和诊断的过程涉及视知觉和多种认知过程之间的复杂相互作用,包括记忆提取、问题解决和决策。眼动追踪技术在消费者和研究市场日益普及,提供了新的机会,可以更多地了解解释过程,包括新手和专家之间的差异,启发式和偏见如何影响视知觉和决策,以及误解和误诊的机制。本综述概述了眼动跟踪技术,医学解释中涉及的感知和认知过程,眼动跟踪如何被用于理解医学解释和促进医学教育和培训,以及这一技术未来应用的一些承诺和挑战。

Significance

意义

During patient examinations, image interpretation, and surgical procedures, physicians are constantly accumulating multisensory evidence when inspecting information and ultimately arriving at a diagnostic interpretation. Eye-tracking research has shed light on the dynamics of this interpretive process, including qualitative and quantitative differences that help distinguish and possibly predict successes and errors. This progress affords novel insights into how the interpretive process might be improved and sustained during education, training, and clinical practice. The present review details some of this research and emphasizes future directions that may prove fruitful for scientists, educators, and clinical practitioners interested in accelerating the transition from novice to expert, monitoring and maintaining competencies, developing algorithms to automate error detection and classification, and informing tractable remediation strategies to train the next generation of diagnosticians.

在患者检查、图像解读和外科手术过程中,医生在检查信息并最终得出诊断解释时不断积累多感官证据。眼动跟踪研究揭示了这一解释过程的动态,包括帮助区分和可能预测成功和错误的定性和定量差异。这一进展为如何在教育、培训和临床实践中改进和维持解释过程提供了新的见解。本综述详细介绍了这项研究的一些内容,并强调了科学家,教育工作者和临床从业人员有兴趣加速从新手到专家的过渡,监测和维护能力,开发自动化错误检测和分类的算法,并通知易处理的补救策略,以培训下一代诊断医生。

Introduction

简介

Decades of research have demonstrated the involvement of diverse perceptual and cognitive processes during medical image interpretation and diagnosis (Bordage, 1999; Elstein, Shulman, & Sprafka, 1978; Gilhooly, 1990; Kundel & La Follette, 1972; Patel, Arocha, & Zhang, 2005). Broadly speaking, these include visual search and pattern matching, hypothesis generation and testing, and reasoning and problem-solving. As with many more general cognitive tasks, these processes interact dynamically over time via feed-forward and feed-back mechanisms to guide interpretation and decision-making (Brehmer, 1992; Newell, Lagnado, & Shanks, 2015). The reliable involvement of these processes has made them of interest as targets for both clinical research and the design of educational interventions to improve diagnostic decision-making (Crowley, Naus, Stewart, & Friedman, 2003; Custers, 2015; Nabil et al., 2013). Methodologies to investigate mental processes during interpretation and diagnosis have included think-aloud protocols (Lundgrén-Laine & Salanterä, 2010), knowledge and memory probes (Gilhooly, 1990; Patel & Groen, 1986), practical exercises (Bligh, Prideaux, & Parsell, 2001; Harden, Sowden, & Dunn, 1984), and tracking physicians’ interface navigation behavior while they inspect visual images (e.g., radiographs, histology slides) (Mercan et al., 2016; Mercan, Shapiro, Brunyé, Weaver, & Elmore, 2017).

几十年的研究已经证明了在医学图像解释和诊断过程中不同的感知和认知过程的参与(Bordage,1999; Elstein,Shulman,& Sprafka,1978; Gilhooly,1990; Kundel & La Follette,1972; Patel,Arocha,& Zhang,2005)。一般来说,这些包括视觉搜索和模式匹配、假设生成和检验、推理和问题解决。与许多更一般的认知任务一样,这些过程随着时间的推移通过前馈和反馈机制动态地相互作用,以指导解释和决策(Brehmer,1992; Newell,Lagado,& Shanks,2015)。这些过程的可靠参与使得它们成为临床研究和教育干预设计的目标,以改善诊断决策(Crowley,Naus,Stewart,& Friedman,2003; Custers,2015; Nabil 等,2013)。解释和诊断过程中调查心理过程的方法包括有声思维协议(Lundgren-laine & Salanterä,2010) ,知识和记忆探针(Gilhooly,1990; Patel & Groen,1986) ,实践练习(Bligh,Prideaux,& Parsell,2001; Harden,Sowden,& Dunn,1984) ,以及跟踪医生在检查视觉图像(例如放射照片,组织学幻灯片)时的界面导航行为(Mercan 等,2016; Mercan,Shapiro,Brunyé,Weaver,& Elmore,2017)。

Medical researchers have increasingly turned to eye-tracking technology to provide more detailed qualitative and quantitative assessments of how and where the eyes move during interpretation, extending research from other high-stakes domains such as air-traffic control (Martin, Cegarra, & Averty, 2011) and airport luggage screening (McCarley & Carruth, 2004; McCarley, Kramer, Wickens, Vidoni, & Boot, 2004). Studies in the medical domain have provided more nuanced understandings of visual interpretation and diagnostic decision-making in diverse medical specialties including radiology, pathology, pediatrics, surgery, and emergency medicine (Al-Moteri, Symmons, Plummer, & Cooper, 2017; Blondon & Lovis, 2015; van der Gijp et al., 2017). Eye tracking has the potential to revolutionize clinical practice and medical education, with far-reaching implications for the development of automated competency assessments (Bond et al., 2014; Krupinski, Graham, & Weinstein, 2013; Richstone et al., 2010; Tien et al., 2014), advanced clinical tutorials (e.g., watching an expert’s eye movements over an image; (Khan et al., 2012; O’Meara et al., 2015)), biologically inspired artificial intelligence to enhance computer-aided diagnosis (Buettner, 2013; Young & Stark, 1963), and the automated detection and mitigation of emergent interpretive errors during the diagnostic process (Ratwani & Trafton, 2011; Tourassi, Mazurowski, Harrawood, & Krupinski, 2010; Voisin, Pinto, Morin-Ducote, Hudson, & Tourassi, 2013).

医学研究人员越来越多地转向眼球跟踪技术,以提供更详细的定性和定量评估眼球如何和在解释过程中移动,从其他高风险领域的研究,如空中交通管制(马丁,Cegarra,& 埃弗蒂,2011)和机场行李检查(麦卡利 & 卡鲁斯,2004年; 麦卡利,克莱默,威肯斯,维多尼,& Boot,2004年)。医学领域的研究提供了对不同医学专业(包括放射学,病理学,儿科学,外科学和急诊医学)的视觉解释和诊断决策的更细致的理解(Al-Moteri,Symmons,Plummer,& Cooper,2017; Blondon & Lovis,2015; van der Gijp 等,2017)。眼球追踪有可能彻底改革临床实践和医学教育,对自动化能力评估的发展具有深远的意义(Bond 等,2014; Krupinski,Graham,& Weinstein,2013; Richstone 等,2010; Tien 等,2014) ,高级临床教程(例如,观察专家在图像上的眼球运动; (Khan 等,2012; O’Meara 等,2015)) ,生物启发的人工智能增强诊断过程中的电脑辅助诊断(Buettner,2013; Young & Stark,1963) ,以及自动检测和缓解紧急解释错误(Ratwani & Trafton,2011; Tourassi,Mazurowski,Harrawood,& Krupinski,2010; Voisin,Pinto,Morin-Ducote,Hudson,& Tourassi,2013)。

Eye tracking: technologies and metrics

眼球追踪: 技术和指标

Modern eye tracking involves an array of infrared or near-infrared light sources and cameras that track the gaze behavior of one (monocular) or both (binocular) eyes (Holmqvist et al., 2011). In most modern systems, an array of non-visible light sources illuminate the eye and produce a corneal reflection (the first Purkinje image); the eye tracker monitors the relationship between this reflection and the center of the pupil to compute vectors that relate eye position to locations in the perceived world (Hansen & Ji, 2010). As the eyes move, the computed point of regard in space also moves. Eye trackers are available in several hardware configurations, including systems with a chin rest for head stabilization, remote systems that can accommodate a limited extent of head movements, and newer mobile eye-wear based systems. Each of these form factors has relative advantages and disadvantages for spatial accuracy (i.e., tracking precision), tracking speed, mobility, portability, and cost (Funke et al., 2016; Holmqvist, Nyström, & Mulvey, 2012). Figure 1 depicts a relatively mobile and contact-free eye-tracking system manufactured by SensoMotoric Instruments (SMI; Berlin, Germany), the Remote Eye-tracking Device – mobile (REDm).

现代眼球追踪涉及一系列红外或近红外光源和照相机,它们追踪一只(单目)或两只(双目)眼睛的凝视行为(Holmqvist et al。 ,2011)。在大多数现代系统中,一系列不可见光源照亮眼睛并产生角膜反射(第一张浦肯野图像) ; 眼睛跟踪器监测这种反射和瞳孔中心之间的关系,计算出眼睛位置与感知世界中的位置相关的向量(Hansen & Ji,2010)。当眼睛移动时,空间中计算出的关注点也在移动。眼球追踪器有多种硬件配置可供选择,包括头部固定的下巴支撑系统,能够容纳有限的头部运动的远程系统,以及更新的基于眼睛磨损的移动系统。这些形式因素在空间精度(即跟踪精度)、跟踪速度、移动性、便携性和成本方面各有优缺点(Funke et al。 ,2016; Holmqvist,Nyström & Mulvey,2012)。图1描绘了一个相对移动和无接触的眼球追踪系统,由远程眼球追踪设备-移动(REdm) SMI 制造。

Fig. 1图1

A remote eye-tracking system (SensoMotoric Instruments’ Remote Eye-tracking Device – mobile; SMI REDm) mounted to the bottom of a computer monitor. In this study, a participating pathologist is inspecting a digital breast biopsy (Brunyé, Mercan, et al., 2017)

一个安装在电脑显示器底部的远程眼动跟踪系统(SMI 的远程眼动跟踪装置;。在这项研究中,参与的病理学家正在检查数字乳腺活检(Brunyé,Mercan,et al。 ,2017)

Eye trackers provide several measures of visual behavior that are relevant for understanding the interpretive process; these are categorically referred to as movement measures, position measures, numerosity measures, and latency measures (Holmqvist et al., 2011). Before describing these, it is important to realize that the eye is constantly moving between points of fixation. Fixations are momentary pauses of eye gaze at a spatial location for a minimum amount of time (e.g., > 99 ms), and the movements between successive fixations are called saccades (Liversedge & Findlay, 2000). Movement measures quantify the patterns of eye movements through space during saccades, including the distance between successive saccades (degrees of saccade amplitude) and the speed of saccades (typically average or peak velocity). Position measures quantify the location of the gaze in Cartesian coordinate space, such as the coordinate space of a computer monitor, or a real-world scene captured through a forward-view camera. Numerosity measures quantify the frequency with which the eyes fixate and saccade while perceiving a scene, such as how many fixations and saccades have occurred during a given time, and how those counts might vary as a function of position (and the visual information available at different positions). Finally, latency measures allow for an assessment of the temporal dynamics of fixations and saccades, including first and subsequent fixation durations and saccade duration. Table 1 provides an overview of commonly used eye-tracking measures, and current theoretical perspectives on their relationships to perceptual and cognitive processing.

眼球追踪器提供了几种与理解解释过程相关的视觉行为测量; 这些被明确地称为运动测量、位置测量、数量测量和延迟测量(Holmqvist et al。 ,2011)。在描述这些之前,重要的是要认识到眼睛是不断地在注视点之间移动的。注视是眼睛凝视一个空间位置的最短时间(例如 > 99毫秒)的短暂停顿,连续注视之间的运动被称为扫视(Liversedge & Findlay,2000)。运动测量量化了空间中眼球在扫视过程中的运动模式,包括连续扫视之间的距离(扫视幅度的程度)和扫视的速度(通常是平均速度或峰值速度)。位置测量量化凝视在笛卡尔坐标空间中的位置,例如计算机显示器的坐标空间,或者通过前视摄像机捕捉到的真实世界场景。数量测量量化了眼睛在感知一个场景时注视和扫视的频率,例如在给定的时间内有多少注视和扫视发生,以及这些数量如何随着位置的变化而变化(以及在不同位置可获得的视觉信息)。最后,潜伏期测量允许评估注视和扫视的时间动力学,包括第一次和随后的注视持续时间和扫视持续时间。表1提供了常用的眼球追踪测量的概述,以及目前关于它们与知觉和认知加工的关系的理论观点。

Table 1 A taxonomy relating commonly used eye-tracking metrics and their respective units to perceptual and cognitive processes of interest to researchers
表1研究人员感兴趣的知觉和认知过程中常用的眼球跟踪指标及其各自单位的分类

Eye tracking in medical interpretation

医学解释中的眼动跟踪

Some of the earliest research using eye tracking during medical image interpretation was done during x-ray film inspection (Kundel & Nodine, 1978). In this task, radiologists search chest x-ray films for evidence of lung nodules; Kundel and Nodine were interested in whether radiologists were making errors of visual search versus errors of recognition and/or decision-making. A search error would be evidenced by a failure to fixate on a nodule, and a recognition or decision error would occur when a fixation on a nodule is not followed by a successful identification and diagnosis. To further differentiate errors of recognition versus decision-making, Kundel and Nodine distinguished trials where the radiologist fixated within 2.8° of a nodule for greater than or less than 600 ms. If the fixation occurred for less than 600 ms this was considered a recognition error, and if greater than 600 ms it was considered a decision error. The former was considered a failure to disembed the nodule from the background noise (despite fixating on it), and the latter was considered a successful recognition of a nodule without appropriately mapping it to diagnostic criteria. Their results demonstrated that about 30% of all errors were due to a failed search. About 25% of errors were due to a recognition failure, and the remaining 45% of errors were due to decision failure. Thus, interpretive errors were primarily driven by failures of recognition and decision-making, rather than failures of search (Kundel & Nodine, 1978). In other words, radiologists would fixate upon and process the critical visual information in a scene but fail to successfully map that information to known schemas and/or candidate diagnoses. A follow-up study confirmed that fixations over 300 ms did not improve recognition, but did improve decision accuracy; furthermore, fixations within 2° of the nodule were associated with higher recognition accuracy (Carmody, Nodine, & Kundel, 1980). These early studies suggest that eye tracking can be a valuable tool for helping dissociate putative sources of error during medical image interpretation (i.e., search, recognition, and decision-making), given that high-resolution foveal vision appears to be critical for diagnostic interpretation.

在医学图像解释中使用眼球追踪的一些最早的研究是在 X 射线胶片检查中完成的(Kundel & Nodine,1978)。在这项任务中,放射科医生在胸部 x 光片中寻找肺部结节的证据,Kundel 和诺丁对放射科医生是否在视觉搜索中出现错误,以及是否在识别和/或决策中出现错误感兴趣。搜索错误的证据是未能固定结节,如果固定结节之后没有成功地进行识别和诊断,就会出现识别或决定错误。为了进一步区分识别错误和决策错误,Kundel 和 Nodine 区分了放射科医生在结节2.8 ° 内固定大于或小于600ms 的试验。如果固定小于600ms,这被认为是识别错误,如果大于600ms,则被认为是决策错误。前者被认为是未能将结节从背景噪声中分离出来(尽管固定在它上面) ,而后者被认为是成功识别结节,而没有将其适当地映射到诊断标准。他们的结果表明,大约30% 的错误是由于搜索失败造成的。大约25% 的错误是由于识别失败,其余45% 的错误是由于决策失败。因此,解释性错误主要是由识别和决策失败造成的,而不是由搜索失败造成的(Kundel & Nodine,1978)。换句话说,放射科医生会专注于并处理场景中的关键视觉信息,但无法成功地将这些信息映射到已知的模式和/或候选诊断。一项后续研究证实,超过300ms 的固定不能提高识别率,但确实提高了决策准确率,此外,结节2 ° 内的固定与更高的识别准确率有关(Carmody,Nodine,& Kundel,1980)。这些早期的研究表明,鉴于高分辨率的中心凹视觉对于诊断解释似乎是至关重要的,眼球追踪可以成为在医学图像解释(即搜索,识别和决策)期间帮助分离假定的错误来源的有价值的工具。

Over the past four decades since this original research, eye tracking has been expanded to understanding diagnostic interpretation in several medical specializations, including radiology, breast pathology, general surgery, neurology, emergency medicine, anesthesiology, ophthalmology, and cardiology (Balslev et al., 2012; Berbaum et al., 2001; Brunyé et al., 2014; Giovinco et al., 2015; Henneman et al., 2008; Jungk, Thull, Hoeft, & Rau, 2000; Krupinski et al., 2006; Kundel, Nodine, Krupinski, & Mello-Thoms, 2008; Matsumoto et al., 2011; O’Neill et al., 2011; Sibbald, de Bruin, Yu, & van Merrienboer, 2015; Wood, Batt, Appelboam, Harris, & Wilson, 2014). In general, these eye-tracking studies have found evidence of reliable distinctions between three types of error-making in diagnostic interpretation: search errors, recognition errors, and decision errors. Each of these error types carries implications for diagnostic accuracy and, ultimately, patient quality of life and well-being. We review each of these in turn, below.

自从这项最初的研究以来,在过去的四十年中,眼球追踪已经扩展到理解几个医学专业的诊断解释,包括放射学,乳腺病理学,普通外科学,神经病学,急诊医学,麻醉学,眼科学和心脏病学(Balslev 等,2012; Berbaum 等,2001; Brunyé 等,2014; Giovinco 等,2015; Henneman 等,2008; Jungk,Thull,Hoft,& Rau,2000; Krupinski 等,2006;Kundel,Nodine,Krupinski,& Mello-Thoms,2008; Matsumoto et al。 ,2011; o’neill et al。 ,2011; Sibbald,de bbu,Yu,& van Merrienboer,2015; Wood,Batt,Appelboam,Harris,& Wilson,2014).一般来说,这些眼球追踪研究已经发现了诊断解释中三种类型的错误之间可靠区别的证据: 搜索错误、识别错误和决策错误。这些错误类型中的每一种都对诊断的准确性以及最终患者的生活质量和健康状况产生影响。下面,我们依次回顾每一种方法。

Search errors

搜索错误

A search error occurs when the eyes fail to fixate a critical region of a visual scene, rendering a feature undetected; these have also been labeled as scanning errors because the critical feature was not in the scan path (Cain, Adamo, & Mitroff, 2013). For example, a radiologist failing to fixate a lung nodule (Manning, Ethell, Donovan, & Crawford, 2006), a pathologist failing to fixate large nucleoli in pleomorphic cells (Brunyé, Mercan, Weaver, & Elmore, 2017), or a neuro-radiologist failing to fixate a cerebral infarction (Matsumoto et al., 2011). Theoretically, if the diagnostician has not fixated a diagnostically relevant region of a medical image then successful search has not occurred, and without it, recognition and decision-making are not possible.

当眼睛无法固定视觉场景的关键区域时,会出现搜索错误,从而使特征未被检测到; 这些也被标记为扫描错误,因为关键特征不在扫描路径中(Cain,Adamo,& Mitroff,2013)。例如,放射科医师未能固定肺结节(Manning,Ethel,Donovan,& Crawford,2006) ,病理学家未能固定多形性细胞中的大核仁(Brunyé,Mercan,Weaver,& Elmore,2017) ,或神经放射科医师未能固定脑梗塞(Matsumoto 等,2011)。从理论上讲,如果诊断医生没有固定一个医学图像的诊断相关区域,那么成功的搜索就不会发生,没有它,识别和决策就不可能。

Several perceptual and cognitive mechanisms have been proposed to account for why search errors occur, including low target prevalence, satisfaction of search, distraction, and resource depletion. Low target prevalence refers to a situation when a diagnostic feature is especially rare. For example, a malignant tumor appearing in a screening mammography examination has a very low prevalence rate at or below 1% of all cases reviewed (Gur et al., 2004). Low prevalence is associated with higher rates of search failure; previous research has shown that when target prevalence was decreased from 50 to 1%, detection rates fell from approximately 93 to 70%, respectively (Wolfe, Horowitz, & Kenner, 2005). Although much of the research on the low prevalence effect has focused on basic findings with naïve subjects, research has also shown that low prevalence also influences diagnostic accuracy in a medical setting (Egglin & Feinstein, 1996; Evans, Birdwell, & Wolfe, 2013). Most notably, Evans and colleagues compared performance under typical laboratory conditions, where target prevalence is high (50% of cases), and when the same cases were inserted into regular workflow, where target prevalence is low (< 1% of cases) they found that false-negative rates were substantially elevated at low target prevalence (Evans et al., 2013). As a diagnostician searches a medical image, they must make a decision of when to terminate a search (Chun & Wolfe, 1996; Hong, 2005). In the case of low target prevalence, search termination is more likely to occur prior to detecting a target (Wolfe & Van Wert, 2010).

一些感知和认知机制已经被提出来解释为什么搜索错误会发生,包括低目标流行率,搜索满意度,分心和资源枯竭。低目标患病率是指诊断特征特别罕见的情况。例如,在筛查乳腺摄影检查中出现的恶性肿瘤患病率非常低,在所有病例中低于1% (Gur et al。 ,2004)。低发病率与较高的搜索失败率相关; 以前的研究表明,当目标发病率从50% 下降到1% 时,检出率分别从大约93% 下降到70% (Wolfe,Horowitz,& Kenner,2005)。虽然大部分关于低流行率效应的研究集中在天真受试者的基本发现上,但研究也表明,低流行率也影响医疗环境下的诊断准确性(Egglin & Feinstein,1996; Evans,Birdwell,& Wolfe,2013)。最值得注意的是,Evans 及其同事比较了典型实验室条件下的表现,其中目标患病率高(50%) ,并且当相同的病例插入常规工作流程时,目标患病率低(< 1%) ,他们发现在低目标患病率时假阴性率显着升高(Evans 等,2013)。当诊断医师搜索医学图像时,他们必须决定何时终止搜索(Chun & Wolfe,1996; Hong,2005)。在低目标流行率的情况下,搜索终止更有可能发生在检测到目标之前(Wolfe & Van Wert,2010)。

How exactly a search termination decision emerges during a diagnostician’s visual search process is unknown, though it is likely that there are multiple smaller decisions occurring during the search process: as the diagnostician detects individual targets in the medical image, they must decide whether it is the most diagnostically valuable target (and thus terminate search), or whether they believe there is a rare but more valuable target that might be found with continued search (Rich et al., 2008). The risk is that after finding a single target a diagnostician may terminate search prematurely and fail to detect a target with higher value for a correct diagnosis. This phenomenon was originally coined satisfaction of search, when radiologists would become satisfied with their interpretation of a medical image after identifying one lesion, at the expense of identifying a second more important lesion (Berbaum et al., 1990; Smith, 1967). These sorts of errors may be a consequence of Bayesian reasoning based on prior experience: the diagnostician may not deem additional search time justifiable for a target that is exceedingly unlikely to be found (Cain, Vul, Clark, & Mitroff, 2012). More recently, Berbaum and colleagues demonstrated that satisfaction of search alone may not adequately describe the search process (Berbaum et al., 2015; Krupinski, Berbaum, Schartz, Caldwell, & Madsen, 2017). Specifically, detecting a lung nodule on a radiograph did not adversely affect the subsequent detection of additional lung nodules; however, it did alter observers’ willingness to report the detected nodules. The authors suggest that detecting a target during search may not induce search termination, but rather change response thresholds during a multiple-target search.

在诊断医生的视觉搜索过程中,搜索终止决定究竟如何出现尚不清楚,尽管在搜索过程中可能会发生多个较小的决定: 当诊断医生检测到医学图像中的单个目标时,他们必须决定它是否是最有诊断价值的目标(从而终止搜索) ,或者他们是否相信有一个罕见但更有价值的目标,可以通过持续搜索找到(Rich 等人,2008)。风险在于,在找到一个单一的目标后,诊断医生可能会过早地终止搜索,并且无法发现一个对正确诊断有较高价值的目标。这种现象最初被称为对搜索的满意度,当放射科医师在确定一个病变后对医学图像的解释感到满意时,以确定第二个更重要的病变为代价(Berbaum 等,1990; Smith,1967)。这些类型的错误可能是基于先前经验的贝叶斯推理的结果: 诊断专家可能不认为额外的搜索时间对于极不可能被找到的目标是合理的(Cain,Vul,Clark,& Mitroff,2012)。最近,Berbaum 及其同事证明,单独的搜索满意度可能不足以描述搜索过程(Berbaum et al。 ,2015; Krupinski,Berbaum,Schartz,Caldwell,& Madsen,2017)。具体来说,在 X 光片上检测肺结节并不会对随后发现的其他肺结节产生不利影响; 然而,它确实改变了观察者报告检测到的结节的意愿。作者认为,在搜索过程中检测到一个目标可能不会导致搜索终止,而是在多目标搜索过程中改变响应阈值。

Once a diagnostician finds one target, there is no guarantee that it is the critical feature that will assist in rendering an appropriate diagnosis. It is often the case that critical features are passed over because they are not only low prevalence but also low salience; in other words, they might not stand out visually (in terms of their brightness, contrast, or geometry (Itti & Koch, 2000)) relative to background noise. Research with neurologists and pathologists has demonstrated that novice diagnosticians, such as medical residents, tend to detect features with high visual salience sooner and more often than experienced diagnosticians; this focus on highly salient visual features can be at the cost of neglecting the detection of critical features with relatively low visual salience (Brunyé et al., 2014; Matsumoto et al., 2011). In one study, not only did novice pathologists tend to fixate more on visually salient but diagnostically irrelevant regions, they also tended to re-visit those regions nearly three times as often as expert pathologists (Brunyé et al., 2014). As diagnosticians gain experience with a diverse range of medical images, features, and diagnoses, they develop more refined search strategies and richer knowledge that accurately guide visual attention toward diagnostically relevant image regions and away from irrelevant regions, as early as the initial holistic inspection of an image (Kundel et al., 2008). As described in Kundel and colleagues’ model, expert diagnosticians are likely to detect cancer on a mammogram before any visual scanning (search) takes place, referred to a an initial holistic, gestalt-like perception of a medical image (Kundel et al., 2008). This discovery led these authors to reconceptualize the expert diagnostic process as involving an initial recognition of a feature, followed by a search and diagnosis (Kundel & Nodine, 2010); this is in contrast to traditional conceptualizations suggesting that search always preceded recognition (Kundel & Nodine, 1978). Unlike experts, during the initial viewing of a medical image novices are more likely to be distracted by highly salient image features that are not necessary for diagnostic interpretation. The extent to which a medical image contains visually salient features that are irrelevant for accurate interpretation may make it more likely a novice pathologist or neurologist will be distracted by those features and possibly fail to detect critical but lower-salience image features. This might be especially the case when high-contrast histology stains or imaging techniques render diagnostically irrelevant (e.g., scar tissue) regions highly salient. Eye tracking is a critical tool for recognizing and quantifying attention toward distracting image regions and has been instrumental in identifying this source of search failure among relatively novice diagnosticians.

一旦诊断医师找到一个目标,就不能保证它是有助于提供适当诊断的关键特征。通常情况下,关键特征被忽略,因为它们不仅流行率低,而且显著性低; 换句话说,相对于背景噪声,它们可能在视觉上(就其亮度、对比度或几何形状而言)不突出。神经病学家和病理学家的研究表明,新手诊断医师,如医疗住院医师,往往比有经验的诊断医师更早和更频繁地检测具有高视觉显着性的特征; 这种对高度显着的视觉特征的关注可能以忽视视觉显着性相对较低的关键特征的检测为代价(Brunyé 等,2014; Matsumoto 等,2011)。在一项研究中,新手病理学家不仅倾向于更多地关注视觉显着但诊断无关的区域,他们还倾向于重新访问这些区域几乎是专家病理学家的三倍(Brunyé 等,2014)。随着诊断医生获得了多种医学图像,特征和诊断的经验,他们开发了更精确的搜索策略和更丰富的知识,准确地将视觉注意力引导到诊断相关的图像区域,远离不相关的区域,早在对图像进行初始整体检查时(Kundel 等,2008)。正如 Kundel 及其同事的模型所描述的那样,专家诊断专家很可能在进行任何视觉扫描(搜索)之前通过乳房 X 光检查发现癌症,这是指对医学图像的一种初始的整体的、格式塔式的感知(Kundel et al。 ,2008)。这一发现导致这些作者将专家诊断过程重新概念化为涉及特征的初始识别,然后是搜索和诊断(Kundel & Nodine,2010) ; 这与传统的概念化表明搜索总是先于识别(Kundel & Nodine,1978)。与专家不同的是,在初次观看医学图像时,新手更容易被高度显著的图像特征分散注意力,而这些特征对于诊断解释是不必要的。医学图像包含与准确解释无关的视觉突出特征的程度可能使得新手病理学家或神经学家更有可能被这些特征分散注意力,并且可能无法检测到关键但低显着性的图像特征。当高对比度组织学染色或成像技术使诊断无关(例如,疤痕组织)区域高度显著时,这种情况可能尤其明显。眼球追踪是识别和量化注意力分散图像区域的关键工具,并且在相对新手的诊断专家中识别这种搜索失败的来源方面发挥了重要作用。

In a recent taxonomy of visual search errors, Cain and colleagues demonstrated that working memory resources are an important source of errors (Cain et al., 2013). Specifically, when an observer is searching for multiple features (targets), if they identify one feature they may maintain that feature in working memory while searching for another feature. This active maintenance of previously detected features may deplete working memory resources that could otherwise be used to search for lower-salience and prevalence targets. This is evidenced by high numbers of re-fixations in previously detected regions, suggesting an active “refreshing” of the contents of working memory to help maintain item memory (Cain & Mitroff, 2013). This proposal has not been examined with diagnosticians inspecting medical images, though it suggests that physicians with higher working memory capacity may show higher performance when searching for multiple features, offering an interesting avenue for future research. Together, resource depletion, low target prevalence, satisfaction of search, and distraction may account for search errors occurring across a range of disciplines involving medical image interpretation.

在最近的视觉搜索错误分类中,Cain 和他的同事证明了工作记忆资源是错误的一个重要来源(Cain et al。 ,2013)。具体来说,当一个观察者搜索多个特征(目标)时,如果他们确定了一个特征,他们可能会在搜索另一个特征时在工作记忆中保留该特征。这种对先前检测到的特征的积极维护可能会耗尽工作记忆资源,而这些资源本来可以用来搜索低显著性和流行性目标。先前检测到的区域有大量的重新注视,这表明工作记忆内容的积极“刷新”有助于维持项目记忆(Cain & Mitroff,2013)。这项建议还没有被检查医学图像的诊断医生所检查,尽管它表明工作记忆能力较高的医生在搜索多个特征时可能表现出更高的性能,为未来的研究提供了一个有趣的途径。资源枯竭、低目标流行率、搜索满意度以及注意力分散可能共同解释了在涉及医学图像解读的一系列学科中发生的搜索错误。

Recognition errors

识别错误

Eye tracking has been instrumental in demonstrating that fewer than half of interpretive errors are attributed to failed search, suggesting that most interpretive errors arise during recognition and decision-making (Al-Moteri et al., 2017; Carmody et al., 1980; Nodine & Kundel, 1987; Samuel, Kundel, Nodine, & Toto, 1995). Recognition errors occur when the eyes fixate a feature, but the feature is not recognized correctly or not recognized as relevant or valuable for the search task. Recognition is an example of attentional mechanisms working together to dynamically guide attention toward features that may be of diagnostic relevance and mapping them to stored knowledge. One way of parsing eye movements into successful versus failed recognition of diagnostically relevant features is to assess fixation durations on critical image regions (Kundel & Nodine, 1978; Mello-Thoms et al., 2005). In this method, individual fixation durations are parsed into two categories using a quantitative threshold. For example, Kundel and Nodine used a 600-ms threshold, and Mello-Thoms and colleagues used a 1000-ms threshold; fixation durations shorter than the threshold indicated failed recognition, whereas durations lengthier than the threshold indicated successful recognition (Kundel & Nodine, 1978; Mello-Thoms et al., 2005). Thus, if a feature (e.g., a lung nodule) was fixated there was successful search, and if it was fixated for longer than the threshold there was successful recognition. Under the assumption that increased fixation durations indicate successful recognition, if a participant fixates on a particular region for longer than a given threshold then any subsequent diagnostic error must be due to failed decision-making.

眼动跟踪有助于证明不到一半的解释错误归因于搜索失败,表明大多数解释错误出现在识别和决策过程中(Al-Moteri 等,2017; Carmody 等,1980; Nodine & Kundel,1987; Samuel,Kundel,Nodine,& Toto,1995)。当眼睛注视一个特征时,识别错误就会发生,但是该特征没有被正确识别,或者没有被识别为与搜索任务相关或有价值的特征。识别是注意力机制共同工作的一个例子,它动态地引导注意力关注可能具有诊断相关性的特征,并将它们映射到存储的知识。将眼球运动解析为诊断相关特征的成功与失败识别的一种方法是评估关键图像区域的注视持续时间(Kundel & Nodine,1978; Mello-Thoms 等,2005)。在这种方法中,个体的固定持续时间被解析为两个类别使用一个定量的阈值。例如,Kundel 和 Nodine 使用600ms 阈值,Mello-Thoms 及其同事使用1000ms 阈值; 固定持续时间短于阈值表示识别失败,而持续时间长于阈值表示成功识别(Kundel & Nodine,1978; Mello-Thoms 等,2005)。因此,如果一个特征(例如,一个肺结节)被固定,那么搜索就成功了,如果它被固定的时间超过阈值,那么识别就成功了。假设注视持续时间的增加表明识别成功,如果参与者注视特定区域的时间超过给定的阈值,那么任何随后的诊断错误必定是由于决策失败。

Using fixation durations to identify successful recognition is an imperfect approach; it is important to note that lengthier fixation durations are also associated with difficulty disambiguating potential interpretations of a feature (Brunyé & Gardony, 2017). In other words, while previous research assumes that lengthy fixation durations indicate successful recognition, they can also indicate the perceptual uncertainty preceding incorrect recognition. This is because a strategic shift of attention toward a particular feature is evident in oculomotor processes, for instance with longer fixations, regardless of whether recognition has proceeded accurately (Heekeren, Marrett, & Ungerleider, 2008). Thus, one can only be truly certain that successful recognition has occurred (i.e., mapping a perceived feature to an accurate knowledge structure) if converging evidence is gathered during the interpretive process.

使用固定持续时间来识别成功的识别是一种不完美的方法; 重要的是要注意,较长的固定持续时间也与难以消除特征的潜在解释有关(Brunyé & Gardony,2017)。换句话说,虽然以前的研究假设长的注视持续时间表示成功的识别,但它们也可以表明在不正确识别之前的知觉不确定性。这是因为注意力向特定特征的战略性转移在眼动过程中是明显的,例如具有较长的注视,而不管识别是否进行得正确(Heekeren,Marrett,& Ungerleider,2008)。因此,只有在解释过程中收集到收敛的证据,人们才能真正确定成功的识别已经发生(即,将感知特征映射到准确的知识结构)。

Consistent with this line of thinking, Manning and colleagues found that false-positives when examining chest radiographs were typically associated with longer cumulative dwell time than true-positives (Manning et al., 2006). Other methods such as think-aloud protocols and feature annotation may prove especially valuable to complement eye tracking in these situations: when a diagnostician recognizes a feature, they either say it aloud (e.g., “I see cell proliferation”) or annotate the feature with a text input (Pinnock, Young, Spence, & Henning, 2015). These explicit feature recognitions can then be assessed for their accuracy and predictive value toward accurate diagnosis.

与此思路一致,Manning 及其同事发现,在检查胸片时,假阳性通常与比真阳性更长的累积停留时间相关(Manning 等,2006)。其他方法,如有声思考协议和特征注释可能被证明在这些情况下特别有价值,以补充眼球追踪: 当一个诊断医生识别一个特征时,他们要么大声说出来(例如,“我看到细胞增殖”)或用文本输入注释的特征(平诺克,杨,斯宾塞,& 亨宁,2015)。这些明确的特征识别,然后可以评估其准确性和预测价值准确诊断。

In addition to measuring the ballistic movements of the eyes, eye trackers also provide continuous recordings of pupil diameter. Pupil diameter can be valuable for interpreting cognitive states and can be used to elucidate mental processes occurring during medical image interpretation. Pupil diameter is constantly changing as a function of both contextual lighting conditions and internal cognitive states. Alterations of pupil diameter reflecting cognitive state changes are thought to reflect modulation of the locus coeruleus-norepinephrine (LC-NE) system, which indexes shifts from exploration to exploitation states (Aston-Jones & Cohen, 2005; Gilzenrat, Nieuwenhuis, Jepma, & Cohen, 2010). Specifically, when the brain interprets a bottom-up signal (e.g., a salient region that attracts an initial fixation) as highly relevant to a task goal, it will send a top-down signal to selectively orient attention to that region. When that occurs, there is a transient increase in pupil diameter that is thought to reflect a shift from exploring the scene (i.e., searching) to exploiting perceived information that is relevant to the task (Privitera, Renninger, Carney, Klein, & Aguilar, 2010; Usher, Cohen, Servan-Schrieber, Rajkowski, & Aston-Jones, 1999). Recent research has demonstrated that during fixation on a scene feature, the time-course of pupil diameter changes can reveal information about an observer’s confidence in their recognition of the feature (Brunyé & Gardony, 2017). Specifically, features that are highly difficult to resolve and recognize cause a rapid pupil dilation response within a second of fixation on the feature. This opens an exciting avenue for using converging evidence, perhaps from fixation duration, pupil diameter, and think-aloud protocols, to more effectively disentangle the instances when lengthy fixations on image features are associated with successful or unsuccessful recognition. In the future, algorithms that can automatically detect instances of successful or failed recognition during fixation may prove particularly valuable for enabling computer-based feedback for trainees.

除了测量眼球的弹道运动,眼球追踪器还提供瞳孔直径的连续记录。瞳孔直径可以用来解释认知状态,也可以用来解释医学图像解释过程中发生的心理过程。瞳孔直径随着环境光照条件和内部认知状态的变化而不断变化。反映认知状态变化的瞳孔直径变化被认为反映了蓝斑-去甲肾上腺素(LC-NE)系统的调节,其指标从勘探状态转移到开采状态(阿斯顿-琼斯 & 科恩,2005; 吉尔森拉特,Nieuwenhuis,Jepma,& 科恩,2010)。具体来说,当大脑将自下而上的信号(例如,吸引初始注意力的显著区域)解释为与任务目标高度相关时,它将发送一个自上而下的信号,以选择性地将注意力引向该区域。当这种情况发生时,瞳孔直径会短暂增加,这被认为是反映了从探索场景(即搜索)到利用与任务相关的感知信息的转变(Privitera,Renninger,Carney,Klein,& Aguilar,2010; Usher,Cohen,Servan-Schrieber,Rajkowski,& Aston-Jones,1999)。最近的研究表明,在固定一个场景特征的过程中,瞳孔直径变化的时间过程可以揭示一个观察者对他们识别特征的信心信息(Brunyé & Gardony,2017)。具体来说,特征是非常难以解决和识别导致快速瞳孔扩张反应在一秒钟的固定特征。这为使用聚合证据(可能来自注视持续时间、瞳孔直径和有声思考协议)开辟了一条令人兴奋的途径,以更有效地解决长时间注视图像特征与成功或不成功识别相关的实例。在未来,能够在注视过程中自动检测成功或失败的识别实例的算法可能被证明对于实现学员基于计算机的反馈特别有价值。

Decision errors

决策失误

As observers gather information about a scene, including searching and recognizing features as relevant to task goals, they begin to formulate hypotheses regarding candidate diagnoses. In some cases, a hypothesis may exist prior to visual inspection of an image (Ledley & Lusted, 1959). The main function of examining a visual image and recognizing features is to develop and test diagnostic hypotheses (Sox, Blatt, Higgins, & Marton, 1988). Developing and testing hypotheses is a cyclical process that involves identifying features that allow the observer to select a set of candidate hypotheses, gathering data to test each hypothesis, and confirming or disconfirming a hypothesis. If the clinician has confirmed a hypothesis, the search may terminate; search may continue if the clinician identifies potential support for multiple hypotheses (e.g., diagnoses with overlapping features) and must continue in the service of differential diagnosis. If the clinician has disconfirmed one of several hypotheses but has not confirmed a single hypothesis, the cyclical process continues; the process also continues under conditions of uncertainty when no given hypotheses have been ruled in or out (Kassirer, Kopelman, & Wong, 1991). It is also important to keep in mind that several diagnoses fall on a spectrum with categorical delineations, with the goal of identifying the highest diagnostic category present in a given image. For instance, a breast pathologist examining histological features may categorize a case as benign, atypia, ductal (DCIS) or lobular carcinoma in situ, or invasive carcinoma (Lester & Hicks, 2016). Given that the most advanced diagnosis is the most important for prognosis and treatment, even if a less advanced hypothesis is supported (e.g., atypia), the pathologist will also spend time ruling out the more advanced diagnoses (e.g., carcinoma in situ, invasive). This may be especially the case when diagnostic features can only be perceived at high-power magnification levels, rendering the remainder of the image immediately imperceptible and making it necessary to zoom out to consider other regions.

当观察者收集关于场景的信息,包括搜索和识别与任务目标相关的特征时,他们开始对候选诊断提出假设。在某些情况下,一个假设可能存在之前的视觉检查的图像(莱德利 & 卢斯特德,1959年)。检查视觉图像和识别特征的主要功能是开发和测试诊断假说(Sox,Blatt,Higgins,& Marton,1988)。发展和检验假设是一个周期性的过程,包括识别特征,允许观察者选择一组候选假设,收集数据来检验每一个假设,并确认或否认一个假设。如果临床医生已经证实了一个假设,搜索可能会终止,如果临床医生确定了多个假设的潜在支持(例如,具有重叠特征的诊断) ,搜索可能会继续,并且必须继续为鑑别诊断服务。如果临床医生已经证实了几个假设中的一个,但是还没有证实一个假设,那么周期性的过程就会继续; 当没有给定的假设被排除在外时,这个过程也会在不确定的条件下继续进行(Kassirer,Kopelman,& Wong,1991)。同样重要的是要记住,一些诊断属于一个范围与分类描述,目标是确定最高的诊断类别存在于一个给定的图像。例如,检查组织学特征的乳腺病理学家可将病例分类为良性、异型性、导管(dCIS)或原位乳叶癌或浸润性癌(Lester & Hicks,2016)。鉴于最先进的诊断对预后和治疗最为重要,即使支持较不先进的假设(例如非典型性) ,病理学家也会花时间排除更先进的诊断(例如原位癌,侵入性)。特别是当诊断特征只能在高倍放大级别才能感知时,使得图像的其余部分立即无法感知,并使得有必要缩小以考虑其他区域。

In an ideal scenario, critical diagnostic features are detected during search and recognized, which leads the clinician to successfully develop and test hypotheses and produce an accurate diagnosis. In the real world, errors emerge at every step of that process. While decision-related errors may not be readily detected in existing eye-tracking metrics, some recent research suggests that relatively disorganized movements of the eyes over a visual image may indicate higher workload, decision uncertainty, and a higher likelihood of errors (Brunyé, Haga, Houck, & Taylor, 2017; Fabio et al., 2015). Specifically, tracking the entropy of eye movements can indicate relatively disordered search processes that do not follow a systematic pattern. In this case, entropy is conceptualized as the degree of energy dispersal of eye fixations across the screen in a relatively random pattern. Higher fixation entropy might indicate relative uncertainty in the diagnostic decision-making process. Furthermore, tonic pupil diameter increases can indicate a higher mental workload involved in a decision-making task (Mandrick, Peysakhovich, Rémy, Lepron, & Causse, 2016). No studies have examined the entropy of eye movements during medical image interpretation, and to our knowledge only one has examined pupil diameter (Mello-Thoms et al., 2005), revealing an exciting avenue for continuing research. Specifically, continuing research may find value in combining fixation entropy and pupil diameter to identify scenarios in which successful lesion detection and recognition has occurred, but the clinician is having difficulty arriving at an appropriate decision.

在理想的情况下,在搜索和识别过程中检测到关键的诊断特征,从而使临床医生成功地开发和检验假设并产生准确的诊断。在现实世界中,这个过程的每一步都会出现错误。虽然在现有的眼球跟踪指标中可能不容易检测到与决策相关的错误,但最近的一些研究表明,眼睛在视觉图像上相对无序的运动可能表明更高的工作量,决策不确定性和更高的错误可能性(Brunyé,Haga,Houck,& Taylor,2017; Fabio 等,2015)。具体来说,跟踪眼球运动的熵可以表明相对无序的搜索过程不遵循一个系统的模式。在这种情况下,熵被概念化为眼睛注视的能量在屏幕上以一种相对随机的模式扩散的程度。较高的固定熵可能意味着诊断决策过程中的相对不确定性。此外,强直瞳孔直径的增加可以表明参与决策任务的心理工作量较高(Mandrick,Peysakhovich,remy,Lepron,& Causse,2016)。在医学图像解读过程中,还没有研究检测过眼球运动的熵,据我们所知,只有一项研究检测过瞳孔直径(Mello-Thoms et al。 ,2005) ,揭示了一个令人兴奋的继续研究的途径。具体而言,持续的研究可能会发现将固定熵和瞳孔直径相结合的价值,以确定成功的病变检测和识别已经发生的情况,但临床医生难以作出适当的决定。

Implications for medical education

对医学教育的启示

Eye tracking may provide innovative opportunities for medical education, training, and competency assessment (Ashraf et al., 2018). Most existing research in this regard leverages the well-established finding that experts move their eyes differently from novices (Brunyé et al., 2014; Gegenfurtner, Lehtinen, & Säljö, 2011; Krupinski, 2005; Krupinski et al., 2006; Kundel et al., 2008; Lesgold et al., 1988). Thus, the premise is that educators can use eye tracking to demonstrate, train, and assess gaze patterns during medical education, possibly accelerating the transition from novice to expert.

眼球追踪可以为医学教育、培训和能力评估提供创新机会(Ashraf et al。 ,2018)。在这方面,大多数现有的研究利用了已经确立的发现,即专家的眼球运动与新手不同(Brunyé 等,2014; Gegenfurtner,Lehtinen,& Säljö,2011; Krupinski,2005; Krupinski 等,2006; Kundel 等,2008; Lesgold 等,1988)。因此,前提是教育工作者可以在医学教育中使用眼球追踪来演示、训练和评估注视模式,可能加速从新手到专家的过渡。

Competency-based medical education (CBME) is intended to produce health professionals who consistently demonstrate expertise in both practice and certification (Aggarwal & Darzi, 2006). Though the concept of CBME has been around for several decades, formal frameworks for competency training and assessment have been more recently developed by CanMEDS, the Outcome Project of the US Accreditation Council for Graduate Medical Education (ACGME), and the Scottish Doctor (Frank & Danoff, 2007; Nasca, Philibert, Brigham, & Flynn, 2012; Simpson et al., 2002; Swing, 2007). In each of these cases, methods were evaluated and implemented for integrating CBME, including new standards for curriculum, teaching, and assessment. Many programs, however, have struggled to create meaningful, relevant, and repeatable outcome-based assessments for use in graduate medical education, residency, and fellowships (Holmboe, Edgar, & Hamstra, 2016).

以能力为基础的医学教育(CBME)旨在培养在实践和认证方面持续表现出专业知识的卫生专业人员(Aggarwal & Darzi,2006)。虽然 CBME 的概念已经存在了几十年,但是能力培训和评估的正式框架最近由 CanMEDS,美国美国毕业后医学教育评鑑委员会的成果项目(ACGME)和苏格兰医生(弗兰克和丹诺夫,2007; Nasca,菲利伯特,布里格姆和弗林,2012; 辛普森等人,2002; 斯温,2007)。在这些案例中,评估和实施了整合 CBME 的方法,包括课程、教学和评估的新标准。然而,许多项目努力创建有意义的,相关的和可重复的基于结果的评估,用于研究生医学教育,住院医师和奖学金(Holmboe,Edgar,& Hamstra,2016)。

Eye tracking in medical education

医学教育中的眼动跟踪

As students develop proficiency in interpreting visual images, they demonstrate refined eye movements that move more quickly and consistently toward diagnostic regions of interest (Richstone et al., 2010). In other words, their eye movements increasingly resemble those of experts as they progress through training. One possible method for facilitating this progression is by showing students video-based playbacks of expert eye movements, a method called eye-movement modeling examples (EMMEs (Jarodzka et al., 2012)). Eye-movement modeling examples typically involve not only showing a video of expert eye movements, but also the expert’s audio narrative of the interpretive process (Jarodzka, Van Gog, Dorr, Scheiter, & Gerjets, 2013; van Gog, Jarodzka, Scheiter, Gerjets, & Paas, 2009). The idea that EMMEs can assist education leverages a finding from cognitive neuroscience demonstrating that observing another’s actions causes the brain to simulate making that same action (i.e., the brain’s “mirror system”), and helps students integrate the new action into their own repertoire (Calvo-Merino, Glaser, Grèzes, Passingham, & Haggard, 2005; Calvo-Merino, Grèzes, Glaser, Passingham, & Haggard, 2006). EMMEs also ground a student’s education in concrete examples, provide students with unique expert insights that might otherwise be inaccessible, and help students learn explicit strategies for processing the visual image (Jarodzka et al., 2012).

随着学生在解读视觉图像方面的熟练程度的提高,他们表现出精细的眼球运动,能够更快、更一致地朝着感兴趣的诊断区域移动(Richstone et al。 ,2010)。换句话说,随着训练的进行,他们的眼球运动越来越像专家的眼球运动。促进这一进程的一种可能的方法是通过向学生展示基于视频的专家眼球运动的回放,这种方法被称为眼球运动建模实例(EMME (Jarodzka et al。 ,2012))。眼动建模的典型例子不仅包括展示专家眼动的视频,还包括专家对解释过程的音频叙述(Jarodzka,Van Gog,Dorr,Scheiter,& Gerjet,2013; Van Gog,Jarodzka,Scheiter,Gerjet,& Paas,2009)。EMME 可以帮助教育的想法利用了一个来自认知神经科学的发现,证明观察他人的行为会使大脑模拟做出同样的行为(即大脑的“镜像系统”) ,并帮助学生将新的行为整合到他们自己的曲目中(卡尔沃-美利诺,格拉泽,Grèzes,Passingham,& 哈格德,2005; 卡尔沃-美利诺,Grèzes,格拉泽,Passingham,& 哈格德,2006)。EMME 还将学生的教育以具体的例子为基础,为学生提供否则可能无法获得的独特的专家见解,并帮助学生学习处理视觉图像的明确策略(Jarodzka 等,2012)。

Outside of the medical domain, EMMEs have been demonstrated to help novice aircraft inspectors detect more faults during search (Sadasivan, Greenstein, Gramopadhye, & Duchowski, 2005), circuitry board inspectors detect more faults during search (Nalanagula, Greenstein, & Gramopadhye, 2006), programmers debug software faster (Stein & Brennan, 2004), students become better readers (Mason, Pluchino, & Tornatora, 2015), and novices solve puzzles faster (Velichkovsky, 1995). In medical domains involving visual image inspection, the viewed action is the sequence of an expert clinician’s fixations and saccades over the medical image, along with their verbal narration. Few studies have examined the impact of EMMEs in medical learning; note that we differentiate education from training in this context, with education involving the passive viewing of expert eye movements outside of an immediate training context (i.e., not during active practice). In the first study of this kind, novice radiographers viewed either novice or expert eye movements prior to making a diagnostic interpretation of a chest x-ray (Litchfield, Ball, Donovan, Manning, & Crawford, 2010). Viewing expert or novice eye movements improved a novice’s ability to locate pulmonary nodules relative to a free search, as long as the depicted eye movements showed a successful nodule search. This result suggests that novices can indeed leverage another’s eye movements to more effectively guide their own search behavior. More recently, medical students were shown case videos of infant epilepsy, in one of three conditions (Jarodzka et al., 2012). In the control condition, there was expert narration during video playback. Two experimental conditions displayed the narrated video with overlaid expert eye movements; in one condition, the eye movements were indicated by a small circle, and in the other condition, there was a “spotlight” around the circle that blurred image regions that were outside of the expert’s focus. Results demonstrated increased diagnostic performance of students after viewing the spotlight condition, suggesting that this specific condition was most effective at conveying expert visual search patterns. Thus, some research suggests that passively viewing an expert’s eye gaze can be advantageous to medical education.

在医学领域之外,EMME 已经被证明可以帮助新手飞机检查员在搜索过程中发现更多的错误(Sadasivan,Greenstein,Gramopadhye,& Duchowski,2005) ,电路板检查员在搜索过程中发现更多的错误(Nalanagula,Greenstein,& Gramopadhye,2006) ,程序员更快地调试软件(Stein & Brennan,2004) ,学生成为更好的读者(Mason,Pluchino,& Tornatora,2015) ,新手解谜更快(Velichkovsky,1995)。在涉及视觉图像检查的医学领域,视觉行为是专家临床医生对医学图像的注视和扫视以及他们的语言叙述的顺序。很少有研究检查 EMME 在医学学习中的影响; 注意到我们在这种情况下将教育与培训区分开来,教育涉及在即时培训环境之外被动观察专家的眼球运动(即不是在主动实践中)。在第一个此类研究中,新手放射技师在对胸部 X 光片进行诊断解释之前观察新手或专家的眼球运动(Litchfield,Ball,Donovan,Manning,& Crawford,2010)。观看专家或新手的眼球运动相对于自由搜索提高了新手定位肺结节的能力,只要所描绘的眼球运动显示一个成功的结节搜索。这一结果表明,新手确实可以利用他人的眼球运动来更有效地指导自己的搜索行为。最近,医学院学生观看了三种情况之一的婴儿癫痫病例视频(Jarodzka et al。 ,2012)。在控制条件下,在视频回放过程中有专家叙述。两个实验条件显示了覆盖着专家眼球运动的叙述视频,一个条件是用一个小圆圈表示眼球运动,另一个条件是在圆圈周围有一个“聚光灯”,模糊了专家焦点之外的图像区域。结果表明,观看聚光灯条件后,学生的诊断性能有所提高,表明这种特定条件在传达专家视觉搜索模式方面最为有效。因此,一些研究表明,被动观看专家的眼睛凝视可以有利于医学教育。

While previewing an expert’s eye movements can facilitate interpretive performance on the same or very similar cases, it is unclear whether EMMEs are supporting strategy development that will transfer to dissimilar cases. Transfer describes the ability to apply knowledge, skills and abilities to novel contexts and tasks that have not been previously experienced (Bransford, Brown, & Cocking, 2000). Transfer can be relatively near-transfer versus far-transfer (Barnett & Ceci, 2002), and is considered a critical trademark of successful learning (Simon, 1983). An example of near-transfer might be a pathologist learning the features and rules for diagnosing DCIS on one case or from text-book examples, and transferring that knowledge and skill to a biopsy with similar features that clearly indicate DCIS (Roads, Xu, Robinson, & Tanaka, 2018). An example of relatively far-transfer would be successfully applying knowledge and skill to a novel biopsy with a unique cellular architecture and challenging features that are less clearly indicative of DCIS and are perhaps borderline between atypical ductal hyperplasia (ADH) and DCIS. More research is needed to understand whether EMMEs promote only near-transfer, or whether multiple EMME experiences can promote relatively far-transfer by promoting perceptual differentiation of features, accurate feature recognition, and more accurate and efficient mapping of features to candidate diagnoses. In other words, can EMMEs move beyond providing explicit hints and cues that enable interpretation and diagnosis in highly similar contexts and cases, to accelerating rule and strategy learning that enhances performance on highly dissimilar contexts and cases (Ball & Litchfield, 2017)? Second, it is worth pointing out that some research has suggested that people may intentionally alter their patterns of eye movements if they know that their eye movements are being monitored or that videos of their eye movements will be replayed to others (Neider, Chen, Dickinson, Brennan, & Zelinsky, 2010; Velichkovsky, 1995). While any such effects appear to be both rare and subtle, they do present a challenge to interpreting whether the effects of EMMEs are at least partially due to the intent of the expert viewer as opposed to being a natural representation of their viewing patterns in normal clinical practice (Ball & Litchfield, 2017).

虽然预测专家的眼球运动可以促进对相同或非常相似案例的解释性能,但目前还不清楚 EMME 是否支持将转移到不同案例的战略发展。迁移描述了将知识、技能和能力应用到以前从未经历过的新环境和任务中的能力(Bransford,Brown,& Cocking,2000)。迁移可以是相对近迁移与远迁移(Barnett & Ceci,2002) ,并被认为是成功学习的关键标志(Simon,1983)。接近转移的一个例子可能是病理学家学习诊断 DCIS 的特征和规则,或者从教科书的例子中学习,并将该知识和技能转移到具有明确指示 DCIS 的类似特征的活检中(Roads,Xu,Robinson,& Tanaka,2018)。相对远距离转移的例子将成功地将知识和技能应用于具有独特细胞结构和具有挑战性的特征的新型活检,这些特征不太清楚地指示 DCIS,并且可能在非典型导管增生(ADH)和 DCIS 之间。需要更多的研究来了解 EMME 是否只促进近似转移,或者多个 EMME 经验是否可以通过促进特征的感知分化,准确的特征识别以及更准确和有效地将特征映射到候选诊断来促进相对远距离的转移。换句话说,电磁兼容机制能否超越提供明确的提示和线索,在高度相似的情况下进行解释和诊断,加速规则和策略学习,提高在高度不同的情况下的表现(Ball & 利奇菲尔德,2017) ?其次,值得指出的是,一些研究表明,如果人们知道自己的眼球运动正在被监视,或者他们的眼球运动视频会被重播给其他人,他们可能会故意改变自己的眼球运动模式(Neider,Chen,Dickinson,Brennan,& Zelinsky,2010; Velichkovsky,1995)。虽然任何这样的效应似乎都是罕见和微妙的,但它们确实对解释电磁脉冲的影响是否至少部分是由于专家观众的意图,而不是正常临床实践中观看模式的自然表现提出了挑战(Ball & 利奇菲尔德,2017)。

Eye tracking in medical training

医学训练中的眼动跟踪

As opposed to a novice passively viewing expert eye-gaze behavior, some studies have examined eye gaze as a training tool. As noted previously, we distinguish education from training by noting that training involves active practice of knowledge and skills, with or without feedback (Kern, Thomas, & Hughes, 1998). In most research to date, eye gaze has been used to provide immediate feedback and guidance for a novice during the active exploration of a visual stimulus. This research leverages several phenomena from the cognitive and instructional sciences. First, cueing attention toward relevant features during a training activity can promote more selective attention to cued areas and help observers remember the cued information and allocate less mental energy to the non-cued areas (De Koning, Tabbers, Rikers, & Paas, 2009). For instance, subtle visual cues, such as a momentary flash of light in a specific scene region, can selectively orient attention to that region for further inspection (Danziger, Kingstone, & Snyder, 1998). Second, watching expert eye movements can help observers recognize and learn organizational strategies for viewing and interpreting visual images, understand the expert’s intent, identify the organizational structure of the images, and better organize perceived information into mental schemas (Becchio, Sartori, Bulgheroni, & Castiello, 2008; Jarodzka et al., 2013; Lobmaier, Fischer, & Schwaninger, 2006). For instance, because experts tend to move their eyes and navigate visual images differently than novices, viewing expert eye movements and patterns of navigation behavior may help observers develop more efficient search strategies. Third, well-organized expert eye movements can help an observer recognize relations within and between images, helping them discriminate similar features and possibly promote transfer to novel cases (Kieras & Bovair, 1984). For instance, an expert may saccade intentionally between features that help the observer effectively discriminate them, possibly helping them form a more thorough understanding of how to distinguish features and associated diagnoses. It is unknown whether this refined knowledge would subsequently enable successful transfer to cases with structures and features at least partially overlapping with the learned case, suggesting an avenue for future research.

与被动观看专家眼睛凝视行为的新手不同,一些研究将眼睛凝视作为一种训练工具。如前所述,我们区分教育和培训,注意到培训包括积极实践的知识和技能,有或没有反馈(Kern,Thomas,& Hughes,1998)。到目前为止,在大多数研究中,眼睛凝视已经被用来在视觉刺激的主动探索过程中为新手提供即时反馈和指导。这项研究利用了认知科学和教学科学中的一些现象。首先,在训练活动中对相关特征的提示注意可以促进对提示区域的更多选择性注意,并帮助观察者记住提示信息,减少对非提示区域的心理能量分配(De Koning,Tabbers,Rikers,& Paas,2009)。例如,微妙的视觉线索,例如在特定场景区域的瞬间闪光,可以选择性地将注意力引导到该区域以便进一步检查(Danziger,Kingstone,& Snyder,1998)。其次,观察专家的眼球运动可以帮助观察者识别和学习观看和解释视觉图像的组织策略,理解专家的意图,识别图像的组织结构,并更好地将感知信息组织成心理图式(Becchio,Sartori,bulheroni,& Castiello,2008; Jarodzka 等,2013; Lobmaier,Fischer,& Schwaninger,2006)。例如,因为专家和新手的眼球移动和导航方式不同,观察专家的眼球移动和导航行为模式可以帮助观察者制定更有效的搜索策略。第三,组织良好的专家眼动可以帮助观察者识别图像内部和图像之间的关系,帮助他们区分相似的特征,并可能促进转移到新的情况(Kiera & Bovair,1984)。例如,专家可能故意在帮助观察者有效区分特征的特征之间扫视,可能帮助他们更彻底地理解如何区分特征和相关诊断。目前尚不清楚这种经过改进的知识是否能够随后成功地转移到结构和特点至少与所学案例有部分重叠的案例,从而为今后的研究提供了一个途径。

One popular way to conceptualize the utility of cueing attention toward relevant scene regions is the Theory of Hints (Kirsh, 2009). In this theory, when people attempt to solve problems in the real world, they rely not only upon existing knowledge (including heuristics and biases) but also the effective use of any available mental aids offered by the context. In addition to explicit verbal guidance from an instructor, or explicit feedback on worked examples, hints can also come in the form of another’s eye movements (Ball & Litchfield, 2017), which can implicitly (i.e., subconsciously) or explicitly orient attention and provide information to an observer (Thomas & Lleras, 2009a, b). As evidence for relatively implicit attention guidance, novice lung x-ray interpretation can improve when they receive implicit cueing based on an expert’s eye movements (Ball & Litchfield, 2017). In accordance with the Theory of Hints, this guidance likely provided not only a cue to orient attention toward a particular scene region, but also increased the likelihood that the area would be considered in their diagnostic interpretation. Specifically, expert cueing can help a novice calibrate the relevance and importance of a region (Litchfield et al., 2010), which can be complemented by an expert’s verbal narration. Thus, it seems that cueing an observer with expert eye movements and narration not only guides attention but can also help the student assess the expert’s intentionality and incorporate that information into their emergent interpretation. As additional evidence of this phenomenon, when expert eye gaze is superimposed during a simulated laparoscopic surgery task, novices are not only faster to locate critical diagnostic regions, but also more likely to incorporate that region into their diagnosis and ultimately reduce errors (Chetwood et al., 2012). Similarly, when novice trainees have expert eye gaze during a simulated robotic surgical task, they tended to be faster and more productive in identifying suspicious nodules (Leff et al., 2015). In both cases, cueing a trainee with expert eye movements not only gets them to fixate in a desired region, but also seems to help them understand expert intent, behave more like an expert, and develop a more accurate diagnostic interpretation.

一个流行的概念化提示注意对相关场景区域的效用的方法是暗示理论(Kirsh,2009)。在这个理论中,当人们试图在现实世界中解决问题时,他们不仅依赖于现有的知识(包括启发式和偏见) ,而且还依赖于有效利用上下文提供的任何可用的心理辅助。除了来自教师的明确的口头指导,或者对工作范例的明确的反馈,暗示也可以以另一个人的眼球运动的形式出现(Ball & 利奇菲尔德,2017) ,这种运动可以隐含地(即下意识地)或者明确地引导注意力,并向观察者提供信息(Thomas & leras,2009a,b)。作为相对内隐注意力指导的证据,当新手接受基于专家眼球运动的内隐暗示时,他们的肺 x 射线解读能力会得到改善(Ball & 利奇菲尔德,2017)。根据提示理论,这种指导可能不仅提供了将注意力引向特定场景区域的线索,而且还增加了在他们的诊断解释中考虑该区域的可能性。具体来说,专家提示可以帮助新手校准一个区域的相关性和重要性(Litchfield et al。 ,2010) ,这可以通过专家的口头叙述来补充。因此,用专家的眼球运动和叙述来暗示观察者,似乎不仅能引导注意力,而且还能帮助学生评估专家的意图,并将这些信息纳入他们的突发解释。作为这种现象的额外证据,当专家的眼睛注视在模拟腹腔镜手术任务中叠加时,新手不仅能更快地定位关键的诊断区域,而且更有可能将该区域纳入他们的诊断并最终减少错误(chtwood et al。 ,2012)。同样,当新手学员在模拟机器人手术任务期间具有专业的眼睛注视时,他们往往在识别可疑结节方面更快和更有效(Leff 等,2015)。在这两种情况下,暗示受训者有专业的眼球运动不仅能让他们注视所需的区域,而且似乎还能帮助他们理解专家的意图,表现得更像专家,并发展出更准确的诊断解释。

Eye tracking in competency assessment

能力评估中的眼动跟踪

In addition to cueing attention during image interpretation, eye tracking can also be used as a feedback mechanism following case interpretation. As we noted above, medical training frequently involves explicit feedback by instructors on exams and worked examples. But there are few methods for providing feedback regarding the dynamic interpretive process; for instance, how a microscope was panned and zoomed, which features were inspected, and precisely where in the process difficulties may have arisen (Bok et al., 2013; 2016; Kogan, Conforti, Bernabeo, Iobst, & Holmboe, 2011; Wald, Davis, Reis, Monroe, & Borkan, 2009). Identifying concrete metrics for use in competency assessment is critical for understanding and guiding professional development from novices to experts (Dreyfus & Dreyfus, 1986; Green et al., 2009). Indeed, a “lack of effective assessment methods and tools” is noted as a primary challenge for implementing the Milestones initiative in internal medicine education (Holmboe, Call, & Ficalora, 2016; Holmboe, Edgar, & Hamstra, 2016). The Milestones initiative is intended to provide concrete educational milestones for use in assessment of medical competencies during graduate and post-graduate medical education (Swing et al., 2013). The earliest research examining eye tracking for feedback in medicine leveraged the concept of perceptual feedback, which involves showing an observer the regions they tended to focus on during an image interpretation (Kundel, Nodine, & Krupinski, 1990). This procedure was shown to improve decision-making by providing a clinician with a second opportunity to review suspicious image regions and revise their diagnosis; this procedure might be especially advantageous given that most people do not remember where they looked during a search (Võ, Aizenman, & Wolfe, 2016).

除了在图像解释过程中提示注意外,眼球追踪还可以作为案例解释后的反馈机制。正如我们上面提到的,医学培训经常涉及教师对考试和工作范例的明确反馈。但是,对于动态解释过程提供反馈的方法很少,例如,显微镜是如何平移和放大的,检查了哪些特征,以及在这个过程中哪些地方可能出现了困难(Bok et al。 ,2013; 2016; Kogan,Conforti,Bernabeo,Iobst,& Holmboe,2011; Wald,Davis,Reis,Monroe,& Borkan,2009)。确定能力评估中使用的具体指标对于理解和指导从新手到专家的专业发展至关重要(Dreyfus & Dreyfus,1986; Green et al。 ,2009)。事实上,“缺乏有效的评估方法和工具”被认为是实施内科教育里程碑倡议的主要挑战(Holmboe,Call,& Ficalora,2016; Holmboe,Edgar,& Hamstra,2016)。里程碑计划旨在提供具体的教育里程碑,用于评估研究生和研究生医学教育期间的医疗能力(Swing et al。 ,2013)。最早的研究检查眼球追踪反馈在医学利用感知反馈的概念,包括显示观察者的区域,他们倾向于集中在一个图像解释(Kundel,Nodine,& Krupinski,1990)。这个程序被证明可以通过为临床医生提供第二次机会来审查可疑的图像区域并修改其诊断来改善决策; 这个程序可能特别有利,因为大多数人不记得他们在搜索期间看了哪里(Võ,Aizenman,& Wolfe,2016)。

Leveraging the concept of using one’s own eye movements as a feedback tool, one recent study suggests that eye tracking may be especially valuable for clinical feedback with emergency medicine residents (Szulewski et al., 2018). In that study, eye movements were tracked in emergency medicine residents during objective structured clinical examinations in a simulation environment. During a subsequent faculty debriefing, residents were led through an individualized debrief that included a review of their eye movements during the clinical examination, with reference to scene features focused on their associated decision-making processes. Results demonstrated that all residents deemed the inclusion of eye tracking in the debriefing as a valuable feedback tool for learning, making them more likely to actively reflect on their learning experience, constructively critique themselves and compare themselves to experts, and plan responses for future clinical scenarios (Szulewski et al., 2018). Thus, eye tracking appears to be a valuable tool for augmenting qualitative feedback of trainee performance with concrete examples and guidance to help them attend to appropriate features and incorporate them into diagnoses.

利用自己的眼球运动作为反馈工具的概念,最近的一项研究表明,眼球追踪可能对急诊医学住院医师的临床反馈特别有价值(Szulewski 等,2018)。在这项研究中,在模拟环境中对急诊医学住院医师在客观结构化临床检查期间的眼球运动进行了跟踪。在随后的教师汇报中,住院医师被引导进行个性化汇报,包括在临床检查期间回顾他们的眼球运动,并参考场景特征,重点关注他们相关的决策过程。结果表明,所有居民都认为在汇报中纳入眼球追踪是一种有价值的学习反馈工具,使他们更有可能积极反思自己的学习经验,建设性地批评自己并与专家进行比较,并计划未来临床情景的反应(Szulewski 等,2018)。因此,眼球追踪似乎是一个有价值的工具,可以通过具体的例子和指导,增强对学员表现的定性反馈,帮助他们注意到适当的特征,并将其纳入诊断。

Future research directions

未来的研究方向

As eye trackers become increasingly available to consumers, lower cost, portable, and easier to use, research on principled methods for using eye tracking for competency assessment is expected to increase (Al-Moteri et al., 2017). It is worth noting that eye trackers with high temporal and spatial resolution and coverage range (e.g., across large or multiple displays) can still be quite cost prohibitive. As eye trackers develop more widespread use, however, one can readily envision both automated and instructor-guided feedback techniques to help quantify competency and provide grounded examples for individualized feedback. In mammography, recent research demonstrates that tracking eye movements and using machine-learning techniques can predict most diagnostic errors prior to their occurrence, making it possible to automatically provide cueing or feedback to trainees during image inspection (Voisin et al., 2013). In diagnostic pathology, automated feedback may be possible by parsing medical images into diagnostically relevant versus irrelevant regions of interest (ROIs) using expert annotations and/or automated machine-vision techniques (Brunyé et al., 2014; Mercan et al., 2016; Nagarkar et al., 2016). Once these ROIs are established and known to the eye-tracking system, fixations can be parsed as falling within or outside of ROIs. This method could be used to understand the spatial allocation of attention over a digital image (e.g., a radiograph, histology slide, angiography), and the time-course of that allocation.

随着眼动追踪器越来越多地面向消费者,成本更低,便携,使用更容易,对使用眼动追踪进行能力评估的原则性方法的研究预计将增加(Al-Moteri 等,2017)。值得注意的是,具有高时间和空间分辨率和覆盖范围(例如,跨大型或多个显示器)的眼动跟踪器仍然是相当昂贵的。然而,随着眼球追踪器的广泛应用,人们可以很容易地想象出自动反馈和教师指导的反馈技术,以帮助量化能力,并为个性化反馈提供实例。在乳腺 X 线摄影术中,最近的研究表明,跟踪眼球运动和使用机器学习技术可以在发生之前预测大多数诊断错误,使得在图像检查期间自动向学员提供提示或反馈成为可能(Voisin 等,2013)。在诊断病理学中,可以通过使用专家注释和/或自动机器视觉技术(Brunyé 等,2014; Mercan 等,2016; Nagarkar 等,2016)将医学图像解析为诊断相关的与不相关的感兴趣区域(ROI)。一旦这些感兴趣区域被建立并且被眼球追踪系统所知道,注视就可以被解析为在感兴趣区域之内或之外。这种方法可以用来理解注意力在数字图像上的空间分配(例如,X 光片,组织学幻灯片,血管造影) ,以及这种分配的时间过程。

While eye tracking provides valuable insights into the distribution of visual attention over a scene, it is important to realize that eye trackers are restricted to monitoring foveal vision. The fovea is a small region in the center of the retina that processes light from the center of the visual field, with a dense concentration of cone receptors that provide high visual acuity (Holmqvist et al., 2011). One popular theoretical assumption is that eye and head movements strategically position the retina to a more advantageous state for gathering information, such as moving your head and eyes toward the source of a sound to reveal its nature and relevance (Xu-Wilson, Zee, & Shadmehr, 2009). Thus, some of what we consider overt visual attention should theoretically be captured by tracking eye movements. On the other hand, it is also well-established that visual attention can be shifted and sustained covertly, allowing one to fixate the eyes on an ostensibly uninteresting or irrelevant feature while covertly attending to another (Liversedge & Findlay, 2000; Treisman & Gelade, 1980). Thus, it remains possible that some of a diagnostician’s interpretive process may occur through peripheral vision (parafoveal vision), limiting our interpretation of eye-tracking patterns made during medical image inspection.

虽然眼球追踪技术为视觉注意力在场景中的分布提供了有价值的见解,但重要的是要认识到,眼球追踪仅限于监测中心凹视觉。中央窝是视网膜中心的一个小区域,处理来自视野中心的光线,具有密集的锥体受体,提供高视力(Holmqvist 等,2011)。一个流行的理论假设是,眼睛和头部的运动策略性地将视网膜置于一个更有利于收集信息的状态,例如将你的头和眼睛朝向声音的来源,以揭示其本质和相关性(Xu-Wilson,Zee,& Shadmehr,2009)。因此,一些我们认为是明显的视觉注意力理论上应该通过跟踪眼球运动来捕捉。另一方面,视觉注意力可以秘密地转移和维持,允许一个人注视一个表面上无趣或无关紧要的特征,同时秘密地关注另一个特征(Liversedge & Findlay,2000; Treisman & Gelade,1980)。因此,诊断医生的一些解释过程仍然可能通过外围视觉(中央凹旁视觉)发生,限制了我们对医学图像检查中的眼球追踪模式的解释。

Eye trackers are designed to track eye gaze as a series of fixations and saccades; in other words, they are designed to track foveal attention. This means that they are quite good at tracking overt central visual attention, but they are not intended for tracking covert peripheral visual attention (Holmqvist et al., 2011). However, we also know that visual attention can be covertly shifted to other areas of a visual scene without a subsequent overt fixation on that region (Liversedge & Findlay, 2000; Treisman & Gelade, 1980). This is typically considered a major downfall of eye tracking: that many real-world visual tasks likely involve both covert and overt visual attention, though eye tracking can only measure the latter. However, more recent research has demonstrated that microsaccades reflect shifts in covert attention (Meyberg, Werkle-Bergner, Sommer, & Dimigen, 2015; Yuval-Greenberg, Merriam, & Heeger, 2014). Microsaccades are very small saccades that are less than 1° of visual arc and occur very frequently during fixations (about two to three times per second) (Martinez-Conde, Otero-Millan, & MacKnik, 2013). These microsaccades tend to be directional, for instance moving slightly to the left or right of a current fixation point; research has recently demonstrated that these slight directional movements of the eye indicate the orientation of covert attention (Yuval-Greenberg et al., 2014). For example, if you are staring at a point on a screen but monitoring an upper-right area of the periphery for a change, then microsaccades are likely to show a directional shift toward the upper right. Microsaccades are likely to serve many purposes, such as preparing the eye for a subsequent saccade to a peripheral region (Juan, Shorter-Jacobi, & Schall, 2004), but can also provide meaningful metrics of covert attention. With a clinician, it is possible that while they fixated on a given number of regions they also considered additional image regions for fixation (but never visited them). In other words, microsaccades may provide more fine-grained understanding of the strategic search process within individual fixations and allow a more nuanced understanding of which regions might have been ruled-out or ruled-in for subsequent inspection.

眼球追踪器被设计成通过一系列的注视和扫视来追踪眼球的凝视; 换句话说,它们被设计成追踪中心凹的注意力。这意味着他们非常善于追踪中央视觉注意力,但是他们并不打算追踪隐蔽的周边视觉注意力(Holmqvist et al。 ,2011)。然而,我们也知道,视觉注意力可以秘密地转移到视觉场景的其他区域,而不会随后对该区域进行明显的固定(Liversedge & Findlay,2000; Treisman & Gelade,1980)。这通常被认为是眼球追踪的一大败笔: 许多现实世界的视觉任务可能同时涉及隐性和显性视觉注意力,尽管眼球追踪只能测量后者。然而,最近的研究表明,微眼跳反映了隐蔽注意力的转移(Meyberg,Werkle-Bergner,Sommer,& Dimigen,2015; Yuval-Greenberg,Merriam,& Heeger,2014)。微扫视是非常小的扫视,小于1 ° 的视觉弧度,并且在固定期间非常频繁地发生(大约每秒2至3次)(Martinez-Conde,Otero-Millan,& MacKnik,2013)。这些微跳动往往是方向性的,例如轻微地移动到当前注视点的左侧或右侧; 最近的研究表明,这些轻微的方向性眼球运动表明了隐蔽注意力的方向(Yuval-Greenberg 等,2014)。例如,如果你正盯着屏幕上的一个点,但是在监视周围的右上方区域以寻找变化,那么微扫视很可能会显示向右上方的方向移动。微眼扫视可能有很多用途,比如为随后扫视周边区域做准备(Juan,Shorter-Jacobi,& Schall,2004) ,但也可以提供隐蔽注意力的有意义的指标。对于临床医生来说,当他们固定在一定数量的区域时,他们也可能考虑其他的图像区域来固定(但从来没有访问过它们)。换句话说,微扫视可以提供对个体注视中的战略搜索过程的更细粒度的理解,并允许对哪些区域可能被排除或排除在后续检查之外的更细微的理解。

Eye tracking also carries value for understanding longitudinal aspects of competency progression in medical education. While diagnostic performance is routinely evaluated through credentialing and certification, we have very little insight into the underlying interpretive process or the process of skills development over time. For instance, within the domain of diagnostic pathology, we know of only one study that examined longitudinal changes in pathology residents’ visual expertise (Krupinski et al., 2013). Unfortunately, this prior study is limited by its size and breadth (four residents at a single training location), the restriction of observers’ ability to zoom or pan the medical image, and a reliance on the same experimental images each year. Thus, most of our understanding of how image interpretation and diagnostic accuracy and efficiency emerge during professional development is restricted to insights from cross-sectional designs. But we also know that expertise development of medical students and post-graduate resident trainees is a long-term, continuous, and non-linear process. Eye tracking provides an innovative opportunity to enable a large-scale examination of how interpretive and diagnostic skills develop through multi-year residencies and into professional practice. Our current research is examining this exciting possibility.

眼球追踪对于理解医学教育中能力发展的纵向方面也有价值。虽然诊断性能通常通过认证和认证进行评估,但我们对潜在的解释过程或随着时间的推移的技能发展过程了解甚少。例如,在诊断病理学领域,我们知道只有一项研究检查了病理学居民视觉专业知识的纵向变化(Krupinski 等,2013)。不幸的是,这项先前的研究受到其规模和广度的限制(四名居民在同一训练地点) ,观察者放大或平移医学图像的能力的限制,以及每年对相同实验图像的依赖。因此,我们对专业发展过程中图像解释、诊断准确性和效率如何产生的大部分理解仅限于从横截面设计中获得的见解。但是我们也知道,医学生和研究生住院实习生的专业知识发展是一个长期的、连续的、非线性的过程。眼球追踪提供了一个创新的机会,使大规模检查解释和诊断技能如何发展通过多年的住院医师和专业实践。我们目前的研究正在检验这种令人兴奋的可能性。

We have focused primarily on competency development through education and training, and performance differences between novices and experts. However, it is worth pointing out that each individual student and clinician brings a unique set of individual differences to clinical diagnostics that undoubtedly influences the processes of visual search and decision-making. Individual differences include variables such as personality traits and cognitive abilities, and a substantial body of research demonstrates that these variables constantly influence real-world behavior (Motowildo, Borman, & Schmit, 1997). For instance, recent research has demonstrated that experienced radiologists show superior perceptual abilities to novices, as measured with the Vanderbilt Chest Radiograph Test (Sunday, Donnelly, & Gauthier, 2017). Here we consider one individual difference that warrants more consideration in the domains of medical image interpretation and decision-making: working-memory capacity. Generally, working memory refers to the cognitive system involved in maintaining and manipulating task-relevant information while a task is performed (Miyake & Shah, 1999). Working-memory capacity describes the notion that working memory is a limited capacity system: it has finite resources for processing and storage, and each person has a different resource pool that can be drawn from to successfully perform a task (Kane & Engle, 2002, 2003). To measure working memory capacity, one popular task (the operation span task) involves participants solving arithmetic problems while also trying to memorize words (Turner & Engle, 1989). In this manner, the task demands working-memory storage (to memorize the words) while also processing distracting arithmetic problems. The ability to maintain performance on a task in the face of distraction is a hallmark characteristic of individuals with high working-memory capacity. In our discussion of search errors, we noted that working memory may be critical for helping an observer maintain previously viewed features in memory while exploring the remainder of an image and associating subsequently identified features with features stored in working memory (Cain et al., 2013; Cain & Mitroff, 2013). In this case, higher working-memory capacity may be particularly important when there are multiple targets (rather than a single target) to be identified in an image. Furthermore, in our discussion of decision errors, we noted that some theories suggest that candidate hypotheses must be maintained in memory while evidence is accumulated during image inspection (Patel et al., 2005; Patel & Groen, 1986; Patel, Kaufman, & Arocha, 2002). Other theories suggest that hypotheses are formed early on and then tested during image inspection (Ledley & Lusted, 1959); it is important to point out that novices and experts may reason very differently during case interpretation, and one or both of these approaches may prove appropriate for different observers. Some research demonstrates that individual differences in working memory capacity predict hypothesis generation and verification processes in a task involving customer order predictions (Dougherty & Hunter, 2003). Thus, in both search and decision-making there appear to be critical roles for working-memory capacity in predicting clinician performance. This possibility has not yet been examined in the context of medical image interpretation and diagnosis, and it is unclear how working-memory capacity might influence clinician eye movements, though it is an exciting direction for future research.

我们主要关注通过教育和培训提高能力,以及新手和专家之间的绩效差异。然而,值得指出的是,每个学生和临床医生带来了一套独特的个体差异的临床诊断,这无疑影响视觉搜索和决策的过程。个体差异包括人格特质和认知能力等变量,大量研究表明,这些变量不断影响现实世界的行为(Motowildo,Borman,& Schmit,1997)。例如,最近的研究表明,经验丰富的放射科医生对新手显示出更好的感知能力,范德比尔特胸部X光检查测试(Sunday,Donnelly,& Gauthier,2017)。在这里,我们考虑一个个体差异,值得更多的考虑在医学图像解释和决策领域: 工作记忆能力。一般来说,工作记忆指的是在完成任务时维护和操纵任务相关信息的认知系统(Miyake & Shah,1999)。工作记忆容量描述了工作记忆是一个有限容量系统的概念: 它有有限的处理和存储资源,每个人都有不同的资源池,可以从中提取成功地完成一项任务(Kane & Engle,2002,2003)。为了测量工作记忆容量,一个流行的任务(操作跨度任务)涉及参与者解决算术问题,同时试图记忆单词(Turner & Engle,1989)。以这种方式,任务需要工作记忆存储(记忆单词) ,同时也处理分心的算术问题。在面对注意力分散的情况下保持工作表现的能力是具有高工作记忆能力的个体的一个标志性特征。在我们关于搜索错误的讨论中,我们注意到工作记忆可能对于帮助观察者维持记忆中以前看到的特征至关重要,同时探索图像的其余部分并将随后确定的特征与存储在工作记忆中的特征相关联(Cain 等,2013; Cain & Mitroff,2013)。在这种情况下,当图像中有多个目标(而不是单个目标)需要识别时,更高的工作记忆容量可能尤为重要。此外,在我们关于决策错误的讨论中,我们注意到一些理论表明候选假设必须保留在记忆中,而在图像检查期间积累证据(Patel 等,2005; Patel & Groen,1986; Patel,Kaufman,& Arocha,2002)。其他理论认为,假设是在早期形成,然后在图像检查(Ledley & Lsted,1959) ; 重要的是要指出,新手和专家可能会在案例解释非常不同的推理,其中一个或两个方法可能证明适合不同的观察者。一些研究表明,工作记忆容量的个体差异可以预测客户订单预测任务中的假设生成和验证过程(Dougherty & Hunter,2003)。因此,在搜索和决策过程中,工作记忆能力在预测临床医生的工作表现中起着至关重要的作用。这种可能性还没有在医学图像解释和诊断的背景下进行检查,工作记忆能力如何影响临床医生的眼球运动尚不清楚,尽管它是未来研究的一个令人兴奋的方向。

In our review of the literature, we also noted that most studies using eye tracking during medical image interpretation use static images. These include lung x-rays, histology slides, and skin lesions. This is not entirely surprising, as many medical images are indeed static, and interpreting eye movements over dynamic scenes can be very complex and time-consuming (Jacob & Karn, 2003; Jarodzka, Scheiter, Gerjets, & van Gog, 2010). There are also cases where images that are usually navigated (panned, zoomed) are artificially restricted, increasing the risk that results are no longer relevant to routine clinical practice. As modern technologies emerge in diagnostic medicine, this disconnect becomes increasingly disadvantageous. Indeed, many medical images are becoming more complex and dynamic; for example, interpreting live and replayed coronary angiograms, simulated dynamic patients during training, or navigating multiple layers of volumetric chest x-rays (Drew, Võ, & Wolfe, 2013; Rubin, 2015). Continued innovations in software for integrating dynamic visual scenes and eye movements will enable this type of research: for instance techniques that parse dynamic video stimuli based on navigation behavior (pause, rewind, play) to identify critical video frames (Yu, Ma, Nahrstedt, & Zhang, 2003). Some other techniques are being developed to provide rudimentary tagging and tracking of identifiable objects in a scene (Steciuk & Zwierno, 2015); such a technique might prove valuable for tracking a region of diagnostic interest that moves across a scene during playback (e.g., during coronary angiogram review).

在我们的文献回顾中,我们也注意到大多数在医学图像解释中使用眼动跟踪的研究使用静态图像。这些包括肺部 X 光片,组织切片和皮肤损伤。这并不完全令人惊讶,因为许多医学图像确实是静态的,在动态场景上解释眼球运动可能是非常复杂和耗时的(Jacob & Karn,2003; Jarodzka,Scheiter,Gerjet,& van Gog,2010)。还有一些情况下,通常被导航(平移,放大)的图像被人为地限制,增加了结果不再与常规临床实践相关的风险。随着现代技术在诊断医学中的出现,这种脱节变得越来越不利。事实上,许多医学图像变得更加复杂和动态; 例如,解释实时和重播的冠状动脉造影图像,模拟训练中的动态患者,或导航多层体积胸部 X 射线(Drew,Võ,& Wolfe,2013; Rubin,2015)。集成动态视觉场景和眼球运动的软件的持续创新将使这类研究成为可能: 例如,基于导航行为(暂停、倒带、播放)解析动态视频刺激以识别关键视频帧的技术(Yu,Ma,Nahrstedt,& Zhang,2003)。其他一些技术正在开发中,以提供基本的标记和跟踪场景中可识别的对象(Steciuk & Zwierno,2015) ; 这种技术可能证明对于跟踪在回放期间(例如,在冠状动脉造影检查期间)在场景中移动的诊断感兴趣区域是有价值的。

It is also worth pointing out that many hospitals are introducing mandatory consultative expert second opinions for quality assurance purposes. For instance, Johns Hopkins Hospital and the University of Iowa Hospitals and Clinics introduced mandatory second opinions for surgical pathology (Kronz, Westra, & Epstein, 1999; Manion, Cohen, & Weydert, 2008). Not only are these mandates seen as valuable for the institutions involved (e.g., for reducing malpractice suits), but clinicians also perceive them as important for improving diagnostic accuracy (Geller et al., 2014). However, having an earlier physician’s interpretation available during diagnosis may unintentionally bias the second physician’s diagnostic process. Indeed even a subtle probabilistic cue (e.g., a red dot that suggests an upcoming image contains a blast cell) can produce response bias in experienced diagnosticians (Trueblood et al., 2018). Thus, while viewing an expert’s behavior may prove advantageous in certain conditions, future research must isolate the parameters that may dictate its success and balance the potential trade-off between guiding eye movements and potentially biasing interpretation. Furthermore, second opinions can also induce diagnostic disagreements among expert clinicians and necessitate time and expense for resolving disagreement and reaching a consensus diagnosis. Eye tracking may prove to be an invaluable arbiter for these sorts of disputes, allowing consultative physicians to view the eye movements of the physician who rendered the primary diagnosis. This practice may assist in helping the consultative physician understand which features were focused on, which features were missed, and understanding how the original physician arrived at their interpretation. Eye tracking could thus augment traditional text annotations to allow consultative physicians to see the case “through the eyes” of the other physician, possibly reducing disagreement or facilitating consensus through shared understanding. Similar strategies might be applied to peer cohorts or medical students and residents, allowing them to learn from each other’s search patterns and successes and failures. On the other hand, this approach could introduce bias in the second physician and unintentionally increase agreement; if the first physician arrived at an incorrect interpretation, such agreement could be detrimental, demonstrating the importance of continuing research in this regard (Gandomkar, Tay, Brennan, Kozuch, & Mello-Thoms, 2018).

同样值得指出的是,许多医院为了质量保证的目的正在引入强制性的咨询专家第二意见。例如,约翰·霍普金斯医院和衣阿华大学医院和诊所对外科病理学引入了强制性的第二意见(Kronz,Westra,& Epstein,1999; Manion,Cohen,& Weydert,2008)。这些要求不仅对于所涉及的机构(例如减少医疗事故诉讼)是有价值的,而且临床医生也认为它们对于提高诊断准确性是重要的(Geller 等,2014)。然而,在诊断过程中有一个更早的医生的解释可能会无意中偏向第二个医生的诊断过程。事实上,即使是一个微妙的概率提示(例如,一个红点表明即将到来的图像包含一个原始细胞)也可以在经验丰富的诊断医生中产生反应偏差(Trueblood 等,2018)。因此,虽然观察专家的行为可能证明在某些条件下是有利的,未来的研究必须隔离可能决定其成功的参数,并在指导眼球运动和潜在的偏见解释之间的潜在权衡。此外,第二种意见还可能导致专家临床医生之间的诊断分歧,并需要时间和费用来解决分歧和达成共识诊断。眼球追踪可能被证明是这类纠纷的一个无价的仲裁者,使咨询医师能够看到作出初步诊断的医生的眼球运动。这种做法可能有助于咨询医生了解哪些特征是重点,哪些特征被忽略,并了解原来的医生是如何得出他们的解释。因此,眼球追踪可以增强传统的文本注释,使咨询医师能够“通过其他医生的眼睛”看到病例,可能通过共同理解减少分歧或促进共识。类似的策略可以应用于同龄人群或医学院学生和住院医师,使他们能够从彼此的搜索模式和成功与失败中学习。另一方面,这种方法可能会在第二位医师中引入偏见,并无意中增加一致性; 如果第一位医师得出了不正确的解释,这种一致性可能是有害的,表明在这方面继续研究的重要性(Gendokar,Tay,Brennan,Kozuch,& Mello-Thoms,2018)。

Conclusion

结论

Medical image interpretation is a highly complex skill that influences not only diagnostic interpretations but also patient quality of life and survivability. Eye tracking is an innovative tool that is becoming increasingly commonplace in medical research and holds the potential to revolutionize trainee and clinician experiences.

医学图像解读是一项高度复杂的技术,不仅影响诊断解读,而且影响患者的生活质量和生存能力。眼球追踪是一个创新的工具,正在成为日益普遍的医学研究,并具有革命性的潜力,学员和临床医生的经验。

Abbreviations

缩写

ADH:

Atypical ductal hyperplasia

导管不典型增生

CBME:

Competency-based medical education

基于能力的医学教育

DCIS:

Ductal carcinoma in situ

导管原位癌

EMME:
艾米:

Eye-movement modeling examples

眼动建模实例

LC-NE:
立法会:

Locus coeruleus-norepinephrine

蓝斑-去甲肾上腺素

ROI:
投资回报率:

Region of interest

感兴趣的地区

SMI REDm:

SensoMotoric Instruments’ Remote Eye-tracking Device – mobile

SMI 的远距离眼球追踪器-流动装置

VNPI:

Van Nuys Prognostic Indicator

范奈斯预后指标

References

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
太诱人啦!绝对吸引你的眼球!
关于google-earth中的比例尺、图像精度和eye?alt的问题
Eyetrack III - What You Most Need to Know
'歪头杀'为什么看起来萌萌哒?科学的解释在这…
保持专注的心理机制
[深度学习论文笔记]Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服