4.6 Article

Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach

期刊

BMC BIOINFORMATICS
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-022-04999-y

关键词

HIV-1 protease; Cleavage sites prediction; Asymmetric bagging; Biased SVM; Ensemble learning

资金

  1. Natural Science Foundation of Xinjiang Uygur Autonomous Region [2021D01D05]
  2. Tianshan Youth Project-Outstanding Youth Science and Technology Talents of Xinjiang [2020Q005]
  3. Chinese Academy of Sciences

向作者/读者索取更多资源

In this study, an ensemble learning algorithm called EM-HIV is proposed for predicting HIV-1 PR cleavage sites. By training a set of weak learners with the asymmetric bagging strategy, EM-HIV can alleviate the impact of data imbalance and noisy data. The algorithm utilizes multiple features from substrate sequences and outperforms state-of-the-art prediction algorithms.
Background The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. Results In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据