4.7 Article

Developing Embedded Taxonomy and Mining Patients' Interests From Web-Based Physician Reviews: Mixed-Methods Approach

期刊

出版社

JMIR PUBLICATIONS, INC
DOI: 10.2196/jmir.8868

关键词

labeled-LDA; physicians; topic modeling; topic taxonomy; Web-based review

资金

  1. National Natural Science Foundation of China [71371005, 71471064, 91646205]

向作者/读者索取更多资源

Background: Web-based physician reviews are invaluable gold mines that merit further investigation. Although many studies have explored the text information of physician reviews, very few have focused on developing a systematic topic taxonomy embedded in physician reviews. The first step toward mining physician reviews is to determine how the natural structure or dimensions is embedded in reviews. Therefore, it is relevant to develop the topic taxonomy rigorously and systematically. Objective: This study aims to develop a hierarchical topic taxonomy to uncover the latent structure of physician reviews and illustrate its application for mining patients' interests based on the proposed taxonomy and algorithm. Methods: Data comprised 122,716 physician reviews, including reviews of 8501 doctors from a leading physician review website in China (haodf.com), collected between 2007 and 2015. Mixed methods, including a literature review, data-driven-based topic discovery, and human annotation were used to develop the physician review topic taxonomy. Results: The identified taxonomy included 3 domains or high-level categories and 9 subtopics or low-level categories. The physician-related domain included the categories of medical ethics, medical competence, communication skills, medical advice, and prescriptions. The patient-related domain included the categories of the patient profile, symptoms, diagnosis, and pathogenesis. The system-related domain included the categories of financing and operation process. The F-measure of the proposed classification algorithm reached 0.816 on average. Symptoms (Cohen d=1.58, Delta u=0.216, t=229.75, and P<.001) are more often mentioned by patients with acute diseases, whereas communication skills (Cohen d=-0.29, Delta u=-0.038, t=-42.01, and P<.001), financing (Cohen d=-0.68, Delta u=-0.098, t=-99.26, and P<.001), and diagnosis and pathogenesis (Cohen d=-0.55, Delta u=-0.078, t=-80.09, and P<.001) are more often mentioned by patients with chronic diseases. Patients with mild diseases were more interested in medical ethics (Cohen d=0.25, Delta u 0.039, t=8.33, and P<.001), operation process (Cohen d=0.57, Delta u 0.060, t=18.75, and P<.001), patient profile (Cohen d=1.19, Delta u 0.132, t=39.33, and P<.001), and symptoms (Cohen d=1.91, Delta u=0.274, t=62.82, and P<.001). Meanwhile, patients with serious diseases were more interested in medical competence (Cohen d=-0.99, Delta u=-0.165, t=-32.58, and P<.001), medical advice and prescription (Cohen d=-0.65, Delta u=-0.082, t=-21.45, and P<.001), financing (Cohen d=-0.26, Delta u=-0.018, t=-8.45, and P<.001), and diagnosis and pathogenesis (Cohen d=-1.55, Delta u=-0.229, t=-50.93, and P<.001). Conclusions: This mixed-methods approach, integrating literature reviews, data-driven topic discovery, and human annotation, is an effective and rigorous way to develop a physician review topic taxonomy. The proposed algorithm based on Labeled-Latent Dirichlet Allocation can achieve impressive classification results for mining patients' interests. Furthermore, the mining results reveal marked differences in patients' interests across different disease types, socioeconomic development levels, and hospital levels.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据