3.8 Article

Prediction of SARS-CoV-2-positivity from million-scale complete blood counts using machine learning

Journal

COMMUNICATIONS MEDICINE
Volume 2, Issue 1, Pages -

Publisher

SPRINGERNATURE
DOI: 10.1038/s43856-022-00129-0

Keywords

-

Ask authors/readers for more resources

Using a large dataset, Zuin et al. developed machine learning models for COVID-19 diagnosis based on CBCs, achieving high performance in validation. The study highlights the importance of incorporating information about other respiratory diseases to ensure robust results.
BackgroundThe Complete Blood Count (CBC) is a commonly used low-cost test that measures white blood cells, red blood cells, and platelets in a person's blood. It is a useful tool to support medical decisions, as intrinsic variations of each analyte bring relevant insights regarding potential diseases. In this study, we aimed at developing machine learning models for COVID-19 diagnosis through CBCs, unlocking the predictive power of non-linear relationships between multiple blood analytes.MethodsWe collected 809,254 CBCs and 1,088,385 RT-PCR tests for SARS-Cov-2, of which 21% (234,466) were positive, from 900,220 unique individuals. To properly screen COVID-19, we also collected 120,807 CBCs of 16,940 individuals who tested positive for other respiratory viruses. We proposed an ensemble procedure that combines machine learning models for different respiratory infections and analyzed the results in both the first and second waves of COVID-19 cases in Brazil.ResultsWe obtain a high-performance AUROC of 90 + % for validations in both scenarios. We show that models built solely of SARS-Cov-2 data are biased, performing poorly in the presence of infections due to other RNA respiratory viruses.ConclusionsWe demonstrate the potential of a novel machine learning approach for COVID-19 diagnosis based on a CBC and show that aggregating information about other respiratory diseases was essential to guarantee robustness in the results. Given its versatile nature, low cost, and speed, we believe that our tool can be particularly useful in a variety of scenarios-both during the pandemic and after. Plain Language SummaryThe complete blood count (CBC) is a medical laboratory test that provides information about cells in a person's blood and is extensively used to support medical decisions. This study explored the ability of a computer-based approach to automatically identify active COVID-19 infections by using CBC exams. We collected a large dataset with over one million CBC exams and the matching tests currently used to detect SARS-Cov-2 or other respiratory viruses. Our results demonstrate both the potential of this approach for diagnosing SARS-Cov-2 infection by using only CBC data, and also that considering information about other respiratory diseases in the methodology is essential to guarantee that results can be trusted. This automated computational approach can be useful in a variety of contexts during the COVID-19 pandemic and after since it is fast, low-cost, and versatile. Zuin et al. use a large dataset of blood count exams to predict SARS-CoV-2 PCR results with machine learning. The model performs well and is superior to those that do not take into account infection with other RNA respiratory viruses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available