4.6 Article

Toward automated e-cigarette surveillance: Spotting e-cigarette proponents on Twitter

期刊

JOURNAL OF BIOMEDICAL INFORMATICS
卷 61, 期 -, 页码 19-26

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2016.03.006

关键词

Electronic cigarettes; Text mining; Text classification

资金

  1. National Center for Research Resources
  2. National Center for Advancing Translational Sciences, US National Institutes of Health (NIH) [UL1TR000117]
  3. Kentucky Lung Cancer Research Program [PO2-415-1400004000-1]

向作者/读者索取更多资源

Background: Electronic cigarettes (e-cigarettes or e-cigs) are a popular emerging tobacco product. Because e-cigs do not generate toxic tobacco combustion products that result from smoking regular cigarettes, they are sometimes perceived and promoted as a less harmful alternative to smoking and also as means to quit smoking. However, the safety of e-cigs and their efficacy in supporting smoking cessation is yet to be determined. Importantly, the federal drug administration (FDA) currently does not regulate e-cigs and as such their manufacturing, marketing, and sale is not subject to the rules that apply to traditional cigarettes. A number of manufacturers, advocates, and e-cig users are actively promoting e-cigs on Twitter. Objective: We develop a high accuracy supervised predictive model to automatically identify e-cig proponents on Twitter and analyze the quantitative variation of their tweeting behavior along popular themes when compared with other Twitter users (or tweeters). Methods: Using a dataset of 1000 independently annotated Twitter profiles by two different annotators, we employed a variety of textual features from latest tweet content and tweeter profile biography to build predictive models to automatically identify proponent tweeters. We used a set of manually curated key phrases to analyze e-cig proponent tweets from a corpus of over one million e-cig tweets along well known e-cig themes and compared the results with those generated by regular tweeters. Results: Our model identifies e-cig proponents with 97% precision, 86% recall, 91% F-score, and 96% overall accuracy, with tight 95% confidence intervals. We find that as opposed to regular tweeters that form over 90% of the dataset, e-cig proponents are a much smaller subset but tweet two to five times more than regular tweeters. Proponents also disproportionately (one to two orders of magnitude more) highlight e-cig flavors, their smoke-free and potential harm reduction aspects, and their claimed use in smoking cessation. Conclusions: Given FDA is currently in the process of proposing meaningful regulation, we believe our work demonstrates the strong potential of informatics approaches, specifically machine learning, for automated e-cig surveillance on Twitter. (C) 2016 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据