4.7 Article

Leveraging Structured Biological Knowledge for Counterfactual Inference: A Case Study of Viral Pathogenesis

Journal

IEEE TRANSACTIONS ON BIG DATA
Volume 7, Issue 1, Pages 25-37

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2021.3050680

Keywords

Biological system modeling; Proteins; Mathematical model; Stochastic processes; Biological processes; Data models; COVID-19; Biological expression language; structural causal model; counterfactual inference; causal biological knowledge graph; systems biology; SARS-CoV-2

Funding

  1. PNNL Mathematics and Artificial Reasoning Systems Laboratory Directed Research and Development Initiative

Ask authors/readers for more resources

Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems, but specifying structural causal models can be difficult in practice, requiring substantial domain expertise. Some application domains have qualitative structured causal knowledge, and this article proposes a general approach for querying a causal biological knowledge graph and converting the qualitative result into a quantitative structural causal model.
Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This article proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available