☆ 4.6 Article

Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items

ACADEMIC MEDICINE (2022)

Journal

ACADEMIC MEDICINE

Volume 97, Issue 5, Pages 718-722

Publisher

LIPPINCOTT WILLIAMS & WILKINS

DOI: 10.1097/ACM.0000000000004567

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study statistically identified and qualitatively reviewed USMLE Step 1 exam questions using differential item functioning (DIF) methodology. The results showed that item-level bias did not contribute to the group score differences beyond what can be explained by prior academic performance variables.

Purpose Previous studies have examined and identified demographic group score differences on United States Medical Licensing Examination (USMLE) Step examinations. It is necessary to explore potential etiologies of such differences to ensure fairness of examination use. Although score differences are largely explained by preceding academic variables, one potential concern is that item-level bias may be associated with remaining group score differences. The purpose of this 2019-2020 study was to statistically identify and qualitatively review USMLE Step 1 exam questions (items) using differential item functioning (DIF) methodology. Method Logistic regression DIF was used to identify and classify the effect size of DIF on Step 1 items meeting minimum sample size criteria. After using DIF to flag items statistically, subject matter expert (SME) review was used to identify potential reasons why items may have performed differently between racial and gender groups, including characteristics such as content, format, wording, context, or stimulus materials. USMLE SMEs reviewed items to identify the group difference they believed was present, if any; articulate a rationale behind the group difference; and determine whether that rationale would be considered construct relevant or construct irrelevant. Results All identified DIF rationales were relevant to the constructs being assessed and therefore did not reflect item bias. Where SME-generated rationales aligned with statistical differences (flags), they favored self-identified women on items tagged to women's health content categories and were judged to be construct relevant. Conclusions This study did not find evidence to support the hypothesis that group-level performance differences beyond those explained by prior academic performance variables are driven by item-level bias. Health professions examination programs have an obligation to assess for group differences, and when present, investigate to what extent, if any, measurement bias plays a role.

Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items

Journal

ACADEMIC MEDICINE

Publisher

LIPPINCOTT WILLIAMS & WILKINS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items

Journal

ACADEMIC MEDICINE

Publisher

LIPPINCOTT WILLIAMS & WILKINS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper