4.7 Article

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

期刊

BIOINFORMATICS
卷 25, 期 23, 页码 3174-3180

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btp548

关键词

-

资金

  1. National Institutes of Health [1r21rr024933-01a1, 5r01lm009836-02]
  2. University of Wisconsin-Milwaukee's RGI

向作者/读者索取更多资源

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naive Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据