4.2 Article

iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach

出版社

HINDAWI LTD
DOI: 10.1155/2021/7681497

关键词

-

资金

  1. National Natural Science Foundation of China [61772028]
  2. key research and development plan of Zhejiang Province [2021C02039]
  3. Natural Science Foundation of Shanghai [17ZR1412500]

向作者/读者索取更多资源

This study proposed a novel feature extraction scheme that represents functional domain information as words and proteins as sentences, using natural language processing to access protein features. A multilabel classifier built with random forest showed good performance in tenfold cross-validation, outperforming other methods and confirming the effectiveness of protein features generated by the proposed scheme.
Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to its functions. Determination of membrane protein types is a hot topic in recent years. A plenty of computational methods have been proposed so far. Some of them used functional domain information to encode proteins. However, this procedure was still crude. In this study, we designed a novel feature extraction scheme to obtain informative features of proteins from their functional domain information. Such scheme termed domains as words and proteins, represented by its domains, as sentences. The natural language processing approach, word2vector, was applied to access the features of domains, which were further refined to protein features. Based on these features, RAndom k-labELsets with random forest as the base classifier was employed to build the multilabel classifier, namely, iMPT-FDNPL. The tenfold cross-validation results indicated the good performance of such classifier. Furthermore, such classifier was superior to other classifiers based on features derived from functional domains via one-hot scheme or derived from other properties of proteins, suggesting the effectiveness of protein features generated by the proposed scheme.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据