4.7 Article

On evaluating brain tissue classifiers without a ground truth

期刊

NEUROIMAGE
卷 36, 期 4, 页码 1207-1224

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.neuroimage.2007.04.031

关键词

evaluation; validation; image segmentation; agreement; gold standard

资金

  1. NCRR NIH HHS [P41 RR13218, P41 RR013218-010008, P41 RR013218] Funding Source: Medline
  2. NIBIB NIH HHS [U54 EB005149-010012, U54 EB005149] Funding Source: Medline
  3. NIMH NIH HHS [K02 MH001110-07, R01 MH50747, R01 MH040799, R01 MH050740-07, R01 MH040799-13, R01 MH40799, K02 MH01110, R01 MH050740] Funding Source: Medline

向作者/读者索取更多资源

In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also rind that sonic information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers. (C) 2007 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据