4.7 Article

Consistency checks to improve measurement with the Hamilton Rating Scale for Depression (HAM-D)

Journal

JOURNAL OF AFFECTIVE DISORDERS
Volume 302, Issue -, Pages 273-279

Publisher

ELSEVIER
DOI: 10.1016/j.jad.2022.01.105

Keywords

HAM-D17; Hamilton Rating Scale for Depression; Consistency of measurement; NEWMEDS; Careless ratings; Inconsistent ratings

Funding

  1. Innovative Medicine Initiative Joint Undertaking [115008]
  2. European Union
  3. Elie Wiesel Chair at Bar Ilan University

Ask authors/readers for more resources

This study explores the impact of measurement imprecisions on outcome assessment in the treatment of depression and proposes flags for logical and statistical consistency checks. The results show that nearly 30% of the depression treatments have inconsistent scoring and statistical outliers, which should be reviewed and addressed to improve the reliability and validity of clinical trial data.
Background: Symptom manifestations in mood disorders can be subtle. Cumulatively, small imprecisions in measurement can limit our ability to measure treatment response accurately. Logical and statistical consistency checks between item responses (i.e., cross-sectionally) and across administrations (i.e., longitudinally) can contribute to improving measurement fidelity. Methods: The International Society for CNS Clinical Trials and Methodology convened an expert Working Group that assembled flags indicating consistency/inconsistency ratings for the Hamilton Rating Scale for Depression (HAM-D17), a widely-used rating scale in studies of depression. Proposed flags were applied to assessments derived from the NEWMEDS data repository of 95,468 HAM-D administrations from 32 registration trials of antidepressant medications and to Monte Carlo-simulated data as a proxy for applying flags under conditions of known inconsistency. Results: Two types of flags were derived: logical consistency checks and statistical outlier-response pattern checks. Almost thirty percent of the HAMD administrations had at least one logical scoring inconsistency flag. Seven percent had flags judged to suggest that a thorough review of rating is warranted. Almost 22% of the administrations had at least one statistical outlier flag and 7.9% had more than one. Most of the administrations in the Monte Carlo-simulated data raised multiple flags. Limitations: Flagged ratings may represent less-common presentations of administrations done correctly. Conclusions: Application of flags to clinical ratings may aid in detecting imprecise measurement. Reviewing and addressing these flags may improve reliability and validity of clinical trial data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available