☆ 4.6 Article

Do In-Training Evaluation Reports Deserve Their Bad Reputations? A Study of the Reliability and Predictive Ability of ITER Scores and Narrative Comments

ACADEMIC MEDICINE (2013)

期刊

ACADEMIC MEDICINE

卷 88, 期 10, 页码 1539-1544

出版社

LIPPINCOTT WILLIAMS & WILKINS

DOI: 10.1097/ACM.0b013e3182a36c3d

关键词

类别

Education, Scientific Disciplines Health Care Sciences & Services

资金

Edward J. Stemmler, MD, Medical Education Research Fund of the National Board of Medical Examiners

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Purpose Although scores on in-training evaluation reports (ITERs) are often criticized for poor reliability and validity, ITER comments may yield valuable information. The authors assessed across-rotation reliability of ITER scores in one internal medicine program, ability of ITER scores and comments to predict postgraduate year three (PGY3) performance, and reliability and incremental predictive validity of attendings' analysis of written comments. Method Numeric and narrative data from the first two years of ITERs for one cohort of residents at the University of Toronto Faculty of Medicine (2009-2011) were assessed for reliability and predictive validity of third-year performance. Twenty-four faculty attendings rank-ordered comments (without scores) such that each resident was ranked by three faculty. Mean ITER scores and comment rankings were submitted to regression analyses; dependent variables were PGY3 ITER scores and program directors' rankings. Results Reliabilities of ITER scores across nine rotations for 63 residents were 0.53 for both postgraduate year one (PGY1) and postgraduate year two (PGY2). Interrater reliabilities across three attendings' rankings were 0.83 for PGY1 and 0.79 for PGY2. There were strong correlations between ITER scores and comments within each year (0.72 and 0.70). Regressions revealed that PGY1 and PGY2 ITER scores collectively explained 25% of variance in PGY3 scores and 46% of variance in PGY3 rankings. Comment rankings did not improve predictions. Conclusions ITER scores across multiple rotations showed decent reliability and predictive validity. Comment ranks did not add to the predictive ability, but correlation analyses suggest that trainee performance can be measured through these comments.

Do In-Training Evaluation Reports Deserve Their Bad Reputations? A Study of the Reliability and Predictive Ability of ITER Scores and Narrative Comments

期刊

ACADEMIC MEDICINE

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Do In-Training Evaluation Reports Deserve Their Bad Reputations? A Study of the Reliability and Predictive Ability of ITER Scores and Narrative Comments

期刊

ACADEMIC MEDICINE

出版社

LIPPINCOTT WILLIAMS & WILKINS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文