4.7 Article

A framework to extract biomedical knowledge from gluten-related tweets: The case of dietary concerns in digital era

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE
Volume 118, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.artmed.2021.102131

Keywords

Social media; Sociome profiling; Text mining; Graph mining; Machine learning; Health for informatics

Funding

  1. Associate Laboratory for Green Chemistry-LAQV
  2. Portuguese Foundation for Science and Technology (FCT [UIDB/50006/2020, UIDB/04469/2020]
  3. BioTecNorte operation [NORTE010145FEDER000004]
  4. European Regional Development Fund under the scope of Norte2020Programa Operacional Regional do Norte
  5. Xunta de Galicia (Centro singular de investigacion de Galicia accreditation 2019-2022)
  6. European Union (European Regional Development Fund - ERDF)- Ref [ED431G2019/06]
  7. Conselleria de Educacion, Universidades e Formacion Profesional (Xunta de Galicia) [ED431C2018/55GRC]
  8. Competitive Reference Group
  9. post-doctoral fellowship [ED481B2019032]
  10. Xunta de Galicia
  11. Universidade de Vigo/CISUG

Ask authors/readers for more resources

The importance of big data and its potential growth have been emphasized by the explosion of information on the Internet in recent years. This paper presents a new methodology for processing and analyzing big data knowledge produced by social media platforms, focusing on health issues and social support. By combining various techniques, the proposed methodology aims to reduce irrelevant messages, eliminate lexical noise, infer demographic data, detect communities, and gain insights into shared resources. The practical relevance of this methodology has been demonstrated in a study focusing on dietary trends and public health discussions on Twitter.
Big data importance and potential are becoming more and more relevant nowadays, enhanced by the explosive growth of information volume that is being generated on the Internet in the last years. In this sense, many experts agree that social media networks are one of the internet areas with higher growth in recent years and one of the fields that are expected to have a more significant increment in the coming years. Similarly, social media sites are quickly becoming one of the most popular platforms to discuss health issues and exchange social support with others. In this context, this work presents a new methodology to process, classify, visualise and analyse the big data knowledge produced by the sociome on social media platforms. This work proposes a methodology that combines natural language processing techniques, ontology-based named entity recognition methods, machine learning algorithms and graph mining techniques to: (i) reduce the irrelevant messages by identifying and focusing the analysis only on individuals and patient experiences from the public discussion; (ii) reduce the lexical noise produced by the different ways in how users express themselves through the use of domain ontologies; (iii) infer the demographic data of the individuals through the combined analysis of textual, geographical and visual profile information; (iv) perform a community detection and evaluate the health topic study combining the semantic processing of the public discourse with knowledge graph representation techniques; and (v) gain information about the shared resources combining the social media statistics with the semantical analysis of the web contents. The practical relevance of the proposed methodology has been proven in the study of 1.1 million unique messages from >400,000 distinct users related to one of the most popular dietary fads that evolve into a multibillion-dollar industry, i.e., gluten-free food. Besides, this work analysed one of the least research fields studied on Twitter concerning public health (i.e., the allergies or immunology diseases as celiac disease), discovering a wide range of health-related conclusions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available