☆ 4.7 Article

GOThresher: a program to remove annotation biases from protein function annotation datasets

BIOINFORMATICS (2023)

Journal

BIOINFORMATICS

Volume 39, Issue 1, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btad048

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

With the rise in genomic data from sequencing technologies, the functions of many gene products remain unknown. High-throughput experiments are being conducted to address this gap, but the resulting annotations are biased towards less informative Gene Ontology terms. GOThresher, a Python tool, is introduced to identify and remove biases in protein function annotation databases, which is crucial for accurate understanding of protein function and training unbiased machine learning methods.

Motivation: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets.Results: We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases.

GOThresher: a program to remove annotation biases from protein function annotation datasets

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

GOThresher: a program to remove annotation biases from protein function annotation datasets

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper