4.2 Review

Challenges and opportunities beyond structured data in analysis of electronic health records

Publisher

WILEY
DOI: 10.1002/wics.1549

Keywords

electronic health records; machine learning; statistical methods; unstructured data

Funding

  1. Helse Nord RHF [HNF1395-18]
  2. Tromso Forskningsstiftelse (Tromso Research Foundation) [A33027]

Ask authors/readers for more resources

Electronic health records contain valuable information and analyzing unstructured data from clinical text and images is complex. Challenges include data quality, privacy issues, and explaining machine learning results. Potential solutions include developing synthetic data generation methods and advancing privacy-preserving techniques.
Electronic health records (EHR) contain a lot of valuable information about individual patients and the whole population. Besides structured data, unstructured data in EHRs can provide extra, valuable information but the analytics processes are complex, time-consuming, and often require excessive manual effort. Among unstructured data, clinical text and images are the two most popular and important sources of information. Advanced statistical algorithms in natural language processing, machine learning, deep learning, and radiomics have increasingly been used for analyzing clinical text and images. Although there exist many challenges that have not been fully addressed, which can hinder the use of unstructured data, there are clear opportunities for well-designed diagnosis and decision support tools that efficiently incorporate both structured and unstructured data for extracting useful information and provide better outcomes. However, access to clinical data is still very restricted due to data sensitivity and ethical issues. Data quality is also an important challenge in which methods for improving data completeness, conformity and plausibility are needed. Further, generalizing and explaining the result of machine learning models are important problems for healthcare, and these are open challenges. A possible solution to improve data quality and accessibility of unstructured data is developing machine learning methods that can generate clinically relevant synthetic data, and accelerating further research on privacy preserving techniques such as deidentification and pseudonymization of clinical text. This article is categorized under: Applications of Computational Statistics > Health and Medical Data/Informatics

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available