4.3 Review

A systematic review of methods for evaluating rating quality in language assessment

Journal

LANGUAGE TESTING
Volume 35, Issue 2, Pages 161-192

Publisher

SAGE PUBLICATIONS LTD
DOI: 10.1177/0265532216686999

Keywords

Language assessment; rater effects; rater-mediated assessment; rating quality; raters

Ask authors/readers for more resources

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on large-scale rater-mediated language assessments. Results from the review of 259 methodological and applied studies reveal an emphasis on inter-rater reliability as evidence of rating quality that persists across methodological and applied studies, studies primarily focused on rating quality and studies not primarily focused on rating quality, and across multiple language constructs. Additional findings suggest discrepancies in rating designs used in empirical research and practical concerns in performance assessment systems. Taken together, the findings from this study highlight the reliance upon aggregate-level information that is not specific to individual raters or specific facets of an assessment context as evidence of rating quality in rater-mediated assessments. In order to inform the interpretation and use of ratings, as well as the improvement of rater-mediated assessment systems, rating quality indices are needed that go beyond group-level indicators of inter-rater reliability, and provide diagnostic evidence of rating quality specific to individual raters, students, and other facets of the assessment system. These indicators are available based on modern measurement techniques, such as Rasch measurement theory and other item response theory approaches. Implications are discussed as they relate to validity, reliability/precision, and fairness for rater-mediated assessments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available