4.6 Article

A systematic evaluation of human expert agreement on optical coherence tomography biomarkers using multiple devices

Journal

EYE
Volume 37, Issue 12, Pages 2573-2579

Publisher

SPRINGERNATURE
DOI: 10.1038/s41433-022-02376-w

Keywords

-

Categories

Ask authors/readers for more resources

The objective of this study was to assess agreement in evaluating OCT variables in leading macular diseases among OCT-certified graders. Even in optimized conditions, there is disease-dependent variability in biomarker evaluation, particularly for IRF in nAMD and DMO. Our findings highlight the variability in human expert OCT grading performance and the need for AI-based automated feature analyses.
Objectives To assess the agreement in evaluating optical coherence tomography (OCT) variables in the leading macular diseases such as neovascular age-related macular degeneration (nAMD), diabetic macular oedema (DMO) and retinal vein occlusion (RVO) among OCT-certified graders. Methods SD-OCT volume scans of 356 eyes were graded by seven graders. The grading included presence of intra- and subretinal fluid (IRF, SRF), pigment epithelial detachment (PED), epiretinal membrane (ERM), conditions of the vitreomacular interface (VMI), central retinal thickness (CRT) at the foveal centre-point (CP) and central millimetre (CMM), as well as height and location of IRF/SRF/PED. Kappa statistics (kappa) and intraclass correlation coefficient (ICC) were used to report categorical grading and measurement agreement. Results The overall agreement on the presence of IRF/SRF/PED was kappa = 0.82/0.85/0.81; kappa of VMI condition was 0.77, that of ERM presence 0.37. ICC for CRT measurements at CP and CMM was excellent with an ICC of 1.00. Height measurements of IRF/SRF/PED showed robust consistency with ICC = 0.85-0.93. There was substantial to almost perfect agreement in locating IRF/SRF/PED with kappa = 0.67-0.86. Between diseases, kappa of IRF/SRF presence was 0.69/0.80 for nAMD, 0.64/0.83 for DMO and 0.86/0.89 for RVO. Conclusion Even in the optimized setting, featuring certified graders, standardized image acquisition and the use of a professional reading platform, there is a disease dependent variability in biomarker evaluation that is most pronounced for IRF in nAMD as well as DMO. Our findings highlight the variability in the performance of human expert OCT grading and the need for AI-based automated feature analyses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available