☆ 4.7 Article

iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model

BRIEFINGS IN BIOINFORMATICS (2022)

期刊

BRIEFINGS IN BIOINFORMATICS

卷 23, 期 4, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbac265

关键词

anti-coronavirus peptide; word2vec; bioinformatics; deep learning; transformer; random forest

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

Japan Society for the Promotion of Science [22H03688]
Korean government (MSIT) [2021R1A2C1014338]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The COVID-19 pandemic has resulted in millions of deaths worldwide, emphasizing the urgency to develop anti-coronavirus drugs. In this study, the authors aimed to predict anti-coronavirus peptides (ACVPs) using a dataset of known antiviral peptides (AVPs) and a small collection of ACVPs. Through exhaustive searches, they found that the random forest classifier with word2vec (W2V) consistently outperformed other machine learning methods. The proposed method, named iACVP, consistently provides better prediction performance compared to existing state-of-the-art methods.

The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.

iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文