Journal
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS
Volume 10, Issue 5, Pages 2325-2334Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSS.2022.3186883
Keywords
Social networking (online); Diseases; Blogs; Frequency modulation; Annotations; Task analysis; Labeling; Health mention (HM) classification; public health surveillance; social media
Ask authors/readers for more resources
People on social media using disease and symptom words to discuss their health can introduce biases in data-driven public health applications. This study presents a new dataset called RHMD, which consists of 10,015 manually annotated Reddit posts. The dataset is labeled with four categories and provides a comprehensive performance analysis of baseline methods. The release of this dataset is expected to facilitate the development of new methods for detecting health mentions in user-generated text.
People on social media share their thoughts and experiences using diseases and symptoms words other than to mention their health, which can introduce biases in data-driven public health applications. For the advancement of HMC research, in this study, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD is composed of 10 015 manually annotated Reddit posts that include 15 common disease or symptom terms and are labeled with four labels: personal health mentions (HMs), nonpersonal HMs, figurative HMs, and hyperbolic HMs. Empirical evaluation using recently proposed methods demonstrates the challenge of labeling user-generated text across these four types. Contributions to this work include the public release of a robustly annotated Reddit dataset (RHMD) for HM tasks and a comprehensive performance analysis of baseline methods. We expect the release of the dataset, and the evaluations will help facilitate the development of new methods for detecting HMs in the user-generated text. The dataset is available at.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available