☆ 4.6 Article

Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning

FRONTIERS IN GENETICS (2021)

期刊

FRONTIERS IN GENETICS

卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA

DOI: 10.3389/fgene.2021.658078

关键词

HIV-1 protease; cleavage site prediction; positive-unlabeled learning; biased SVM; substrate specificity

类别

Genetics & Heredity

资金

National Natural Science Foundation of China [61602352]
Pioneer Hundred Talents Program of Chinese Academy of Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Understanding HIV-1 protease substrate specificity is crucial for HIV infection prevention. A novel positive-unlabeled learning algorithm, PU-HIV, has been proposed for effective prediction of HIV-1 protease cleavage sites, demonstrating superior performance compared to existing prediction models in terms of AUC, PR-AUC, and F-measure. With PU-HIV, previously unknown substrate sites can be identified, offering valuable insights for designing novel HIV-1 protease inhibitors for HIV treatment.

Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.

Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning

期刊

FRONTIERS IN GENETICS

出版社

FRONTIERS MEDIA SA

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning

期刊

FRONTIERS IN GENETICS

出版社

FRONTIERS MEDIA SA

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文