4.4 Article Proceedings Paper

Improving rare disease classification using imperfect knowledge graph

期刊

出版社

BMC
DOI: 10.1186/s12911-019-0938-1

关键词

Rare disease diagnosis; Knowledge graph; Machine learning; Text classification; Extremely imbalanced data

资金

  1. National Science Foundation [1633370]
  2. National Library of Medicine [2R01LM010681-05]
  3. China Scholarship Council
  4. Kilgour Research Grant Award by UNC SILS

向作者/读者索取更多资源

Background: Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect. Method: We develop a text classification algorithm that represents a document as a combination of a bag of words and a bag of knowledge terms, where a knowledge term is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories. Results: On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged F-1 score) and ranking-based metric (mean reciprocal rank) are used in evaluation. Conclusion: Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据