4.7 Article

A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2020.2968882

Keywords

Deep learning; Servers; Markov processes; Analytical models; Queueing analysis; Mathematical model; Computer architecture; Proteins; Deep learning; protein function; protein-protein interaction; protein sequence; protein domain

Funding

  1. National Natural Science Foundation of China [61832019, 61622213, 61728211, G20190018001]
  2. Hunan Provincial Science and Technology Program [2018WK4001]
  3. Hunan Graduate Research and Innovation Project [CX20190082]

Ask authors/readers for more resources

Understanding protein functions is crucial in biology and medicine, but many proteins lack functional annotations. DeepGOA utilizes protein sequences and PPI networks to predict protein functions, combining word embeddings, deep learning, and neural network algorithms. Experimental results demonstrate its superior performance compared to other methods.
Knowledge of protein functions plays an important role in biology and medicine. With the rapid development of high-throughput technologies, a huge number of proteins have been discovered. However, there are a great number of proteins without functional annotations. A protein usually has multiple functions and some functions or biological processes require interactions of a plurality of proteins. Additionally, Gene Ontology provides a useful classification for protein functions and contains more than 40,000 terms. We propose a deep learning framework called DeepGOA to predict protein functions with protein sequences and protein-protein interaction (PPI) networks. For protein sequences, we extract two types of information: sequence semantic information and subsequence-based features. We use the word2vec technique to numerically represent protein sequences, and utilize a Bi-directional Long and Short Time Memory (Bi-LSTM) and multi-scale convolutional neural network (multi-scale CNN) to obtain the global and local semantic features of protein sequences, respectively. Additionally, we use the InterPro tool to scan protein sequences for extracting subsequence-based information, such as domains and motifs. Then, the information is plugged into a neural network to generate high-quality features. For the PPI network, the Deepwalk algorithm is applied to generate its embedding information of PPI. Then the two types of features are concatenated together to predict protein functions. To evaluate the performance of DeepGOA, several different evaluation methods and metrics are utilized. The experimental results show that DeepGOA outperforms DeepGO and BLAST.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available