4.7 Article

HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction

Journal

BIOINFORMATICS
Volume 37, Issue 23, Pages 4526-4533

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab485

Keywords

-

Funding

  1. Transition grant 'UNIMI Partneriat H2020' [PSR2015-1720GVALE_01]
  2. PSR 2019 project 'Machine Learning and Big Data Analysis for Bioinformatics' - University of Milano [PSR2019_DIP_010_GVALE]

Ask authors/readers for more resources

The study introduces a hierarchical ensemble method for automated protein function prediction, improving predictions of flat classifiers and competing with the latest hierarchy-aware learning methods.
Motivation: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). 'Hierarchy-unaware' classifiers, also known as 'flat' methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while 'hierarchy-aware' approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. Results: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide 'TPR-safe' predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available