3.8 Proceedings Paper

Quantifying Hierarchical Conflicts in Homology Statements

期刊

COMPARATIVE GENOMICS (RECOMB-CG 2022)
卷 13234, 期 -, 页码 146-167

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-031-06220-9_9

关键词

Homology; Syntenic block; T-Star Packing; Assignment problem

资金

  1. NIAID [R01AI105185]
  2. [ANR-20-CE48-0001]

向作者/读者索取更多资源

This article introduces a test method to measure the hierarchical relationship between two sets of homology relationships provided by different software. The test can be used to check the feasibility of agglomerative syntenic block software and provide a mapping reference for downstream analysis. The research finds that it is rare for two collections of homology relationships to be perfectly hierarchically related, so an optimization problem is proposed to measure the distance between them, and a heuristic solution is given.
A fundamental step in any comparative whole genome analysis is the annotation of homology relationships between segments of the genomes. Traditionally, this annotation has been based on coding segments, where orthologous genes are inferred and then syntenic blocks are computed by agglomerating sets of homologous genes into homologous regions. More recently, whole genomes, including intergenic regions, are being aligned de novo as whole genome alignments (WGA). In this article we develop a test to measure to what extent sets of homology relationships given by two different software are hierarchically related to one another, where matched segments from one software may contain matched segments from the other and vice versa. Such a test should be used as a sanity check for an agglomerative syntenic block software, and provides a mapping between the blocks that can be used for further downstream analyses. We show that, in practice, it is rare that two collections of homology relationships are perfectly hierarchically related. Therefore we present an optimization problem to measure how far they are from being so. We show that this problem, which is a generalization of the assignment problem, is NP-Hard and give a heuristic solution and implementation. We apply our distance measure to data from the Alignathon competition, as well as to Mycobacterium tuberculosis, showing that many factors affect how hierarchically related two collections are, including sensitivities to guide trees and the use or omission of an outgroup. These findings inform practitioners on the pitfalls of homology relationship inference, and can inform development of more robust inference tools.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据