☆ 4.5 Article

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2023)

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

卷 2023, 期 -, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/database/baac108

关键词

类别

Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study presents the outcomes of BioCreative VII (Task 3) competition, which focused on extracting medication names from Twitter user's publicly available tweets. Detecting health-related tweets is challenging due to informal language and the vast majority of tweets unrelated to health. The task required addressing extreme class imbalance to find tweets mentioning medications. A total of 65 teams registered and 16 teams submitted systems. The study analyzed the corpus, methods, and results, with a focus on learning from class-imbalanced data.

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at . The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

出版社

OXFORD UNIV PRESS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition

期刊

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

出版社

OXFORD UNIV PRESS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文