4.6 Article

Do In-Training Evaluation Reports Deserve Their Bad Reputations? A Study of the Reliability and Predictive Ability of ITER Scores and Narrative Comments

Journal

ACADEMIC MEDICINE
Volume 88, Issue 10, Pages 1539-1544

Publisher

LIPPINCOTT WILLIAMS & WILKINS
DOI: 10.1097/ACM.0b013e3182a36c3d

Keywords

-

Funding

  1. Edward J. Stemmler, MD, Medical Education Research Fund of the National Board of Medical Examiners

Ask authors/readers for more resources

Purpose Although scores on in-training evaluation reports (ITERs) are often criticized for poor reliability and validity, ITER comments may yield valuable information. The authors assessed across-rotation reliability of ITER scores in one internal medicine program, ability of ITER scores and comments to predict postgraduate year three (PGY3) performance, and reliability and incremental predictive validity of attendings' analysis of written comments. Method Numeric and narrative data from the first two years of ITERs for one cohort of residents at the University of Toronto Faculty of Medicine (2009-2011) were assessed for reliability and predictive validity of third-year performance. Twenty-four faculty attendings rank-ordered comments (without scores) such that each resident was ranked by three faculty. Mean ITER scores and comment rankings were submitted to regression analyses; dependent variables were PGY3 ITER scores and program directors' rankings. Results Reliabilities of ITER scores across nine rotations for 63 residents were 0.53 for both postgraduate year one (PGY1) and postgraduate year two (PGY2). Interrater reliabilities across three attendings' rankings were 0.83 for PGY1 and 0.79 for PGY2. There were strong correlations between ITER scores and comments within each year (0.72 and 0.70). Regressions revealed that PGY1 and PGY2 ITER scores collectively explained 25% of variance in PGY3 scores and 46% of variance in PGY3 rankings. Comment rankings did not improve predictions. Conclusions ITER scores across multiple rotations showed decent reliability and predictive validity. Comment ranks did not add to the predictive ability, but correlation analyses suggest that trainee performance can be measured through these comments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available