4.5 Article

Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification

Journal

IMPLEMENTATION SCIENCE
Volume 9, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1748-5908-9-49

Keywords

Motivational interviewing; Provider fidelity; Statistical text classification

Funding

  1. National Institute on Alcoholism and Alcohol Abuse (NIAAA) [R01/AA018673]
  2. NIAAA [R01/AA016979, R01/AA014741]
  3. National Institute on Drug Abuse [R01/DA025833, R01/DA026014]

Ask authors/readers for more resources

Background: Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy-motivational interviewing (MI). Method: Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes. Results: Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 - 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code. Conclusion: To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility of statistical text classification in bridging this methodological gap.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available