4.7 Article

Multi-way association extraction and visualization from biological text documents using hyper-graphs: Applications to genetic association studies for diseases

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE
Volume 49, Issue 3, Pages 145-154

Publisher

ELSEVIER
DOI: 10.1016/j.artmed.2010.03.002

Keywords

Hyper-graphs; Representative graphs; A Priori algorithm; Vector space model; Genetic associations; Lung cancer; Colorectal cancer

Funding

  1. NIH BISTI [NIH-NIGMS P20 GM66402]
  2. NSF/NSDL [NSF-DUE-0333623]

Ask authors/readers for more resources

Objectives: Biological research literature, as in many other domains of human endeavor, represents a rich, ever growing source of knowledge. An important form of such biological knowledge constitutes associations among biological entities such as genes, proteins, diseases, drugs and chemicals, etc. There has been a considerable amount of recent research in extraction of various kinds of binary associations (e.g., gene-gene, gene-protein, protein-protein, etc.) using different text mining approaches. However, an important aspect of such associations (e.g., gene A activates protein B) is identifying the context in which such associations occur (e.g., gene A activates protein B in the context of disease C in organ D under the influence of chemical E). Such contexts can be represented appropriately by a multi-way relationship involving more than two objects (e.g., objects A, B, C, D, E) rather than usual binary relationship (objects A and B). Methods: Such multi-way relations naturally lead to a hyper-graph representation of the knowledge rather than a binary graph. The hyper-graph based multi-way knowledge extraction from biological text literature represents a computationally difficult problem (due to its combinatorial nature) which has not received much attention from the Bioinformatics research community. In this paper, we describe and compare two different approaches to such multi-way hyper-graph extraction: one based on an exhaustive enumeration of all multi-way hyper-edges and the other based on an extension of the well-known A Priori algorithm for structured data to the case unstructured textual data. We also present a representative graph based approach towards visualizing these genetic association hyper-graphs. Results: Two case studies are conducted for two biomedical problems (related to the diseases of lung cancer and colorectal cancer respectively), illustrating that the latter approach (using the text-based A Priori method) identifies the same hyper-edges as the former approach (the exhaustive method), but at a much less computational cost. The extracted hyper-relations are presented in the paper as cognition-rich representative graphs, representing the corresponding hyper-graphs. Conclusions: The text-based A Priori algorithm is a practical, useful method to extract hyper-graphs representing multi-way associations among biological objects. These hyper-graphs and their visualization using representative graphs can provide important contextual information for understanding gene-gene associations relevant to specific diseases. (C) 2010 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available