4.6 Article

HostNet: improved sequence representation in deep neural networks for virus-host prediction

期刊

BMC BIOINFORMATICS
卷 24, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-023-05582-9

关键词

Virus-Host Prediction; Sequence Representation; Vectorization; Deep Learning-based Sequence Modeling

向作者/读者索取更多资源

The article presents HostNet, a deep learning framework for predicting virus hosts from genomic sequences. HostNet utilizes a Transformer-CNN-BiGRU architecture and two enhanced sequence representation modules to overcome the challenges of data deficiency and imbalance. The results show that HostNet outperforms the state-of-the-art deep learning-based method in host prediction accuracy and F1 score. The improved sequence representation modules significantly enhance HostNet's training generalization, performance in challenging classes, and stability.
BackgroundThe escalation of viruses over the past decade has highlighted the need to determine their respective hosts, particularly for emerging ones that pose a potential menace to the welfare of both human and animal life. Yet, the traditional means of ascertaining the host range of viruses, which involves field surveillance and laboratory experiments, is a laborious and demanding undertaking. A computational tool with the capability to reliably predict host ranges for novel viruses can provide timely responses in the prevention and control of emerging infectious diseases. The intricate nature of viral-host prediction involves issues such as data imbalance and deficiency. Therefore, developing highly accurate computational tools capable of predicting virus-host associations is a challenging and pressing demand.ResultsTo overcome the challenges of virus-host prediction, we present HostNet, a deep learning framework that utilizes a Transformer-CNN-BiGRU architecture and two enhanced sequence representation modules. The first module, k-mer to vector, pre-trains a background vector representation of k-mers from a broad range of virus sequences to address the issue of data deficiency. The second module, an adaptive sliding window, truncates virus sequences of various lengths to create a uniform number of informative and distinct samples for each sequence to address the issue of data imbalance. We assess HostNet's performance on a benchmark dataset of Rabies lyssavirus and an in-house dataset of Flavivirus. Our results show that HostNet surpasses the state-of-the-art deep learning-based method in host-prediction accuracies and F1 score. The enhanced sequence representation modules, significantly improve HostNet's training generalization, performance in challenging classes, and stability.ConclusionHostNet is a promising framework for predicting virus hosts from genomic sequences, addressing challenges posed by sparse and varying-length virus sequence data. Our results demonstrate its potential as a valuable tool for virus-host prediction in various biological contexts. Virus-host prediction based on genomic sequences using deep neural networks is a promising approach to identifying their potential hosts accurately and efficiently, with significant impacts on public health, disease prevention, and vaccine development.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据