4.6 Article

Supplementing claims data analysis using self-reported data to develop a probabilistic phenotype model for current smoking status

Journal

JOURNAL OF BIOMEDICAL INFORMATICS
Volume 97, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2019.103264

Keywords

Probabilistic phenotype; Patient-level prediction; Risk; Smoking; Imputation; Claims data

Ask authors/readers for more resources

Objectives: Smoking status is poorly record in US claims data. IBM MarketScan Commercial is a claims database that can be linked to an additional health risk assessment with self-reported smoking status for a subset of 1,966,174 patients. We investigate whether this subset could be used to learn a smoking status phenotype model generalizable to all US claims data that calculates the probability of being a current smoker. Methods: 251,643 (12.8%) had self-reported their smoking status as 'current smoker'. A regularized logistic regression model, the Current Risk of Smoking Status (CROSS), was trained using the subset of patients with self-reported smoking status. CROSS considered 53,027 candidate covariates including demographics and conditions/drugs/measurements/procedures/observations recorded in the prior 365 days, The CROSS phenotype model was validated across multiple other claims data. Results: The internal validation showed the CROSS model achieved an area under the receiver operating characteristic curve (AUC) of 0.76 and the calibration plots indicated it was well calibrated. The external validation across three US claims databases obtained AUCs ranging between 0.82 and 0.87 showing the model appears to be transportable across Claims data. Conclusion: CROSS predicts current smoking status based on the claims records in the prior year. CROSS can be readily implemented to any US insurance claims mapped to the OMOP common data model and will be a useful way to impute smoking status when conducting epidemiology studies where smoking is a known Lunfounder but smoking status is not recorded. CROSS is available from https://github.com/OHDSI/StudyProtocolSandbox/ tree/master/SmokingModel.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available