4.7 Article

iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 4, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac265

关键词

anti-coronavirus peptide; word2vec; bioinformatics; deep learning; transformer; random forest

资金

  1. Japan Society for the Promotion of Science [22H03688]
  2. Korean government (MSIT) [2021R1A2C1014338]

向作者/读者索取更多资源

The COVID-19 pandemic has resulted in millions of deaths worldwide, emphasizing the urgency to develop anti-coronavirus drugs. In this study, the authors aimed to predict anti-coronavirus peptides (ACVPs) using a dataset of known antiviral peptides (AVPs) and a small collection of ACVPs. Through exhaustive searches, they found that the random forest classifier with word2vec (W2V) consistently outperformed other machine learning methods. The proposed method, named iACVP, consistently provides better prediction performance compared to existing state-of-the-art methods.
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据