4.4 Article

Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?

期刊

MOLECULAR INFORMATICS
卷 40, 期 7, 页码 -

出版社

WILEY-V C H VERLAG GMBH
DOI: 10.1002/minf.202060017

关键词

chemoinformatics; drug design; ranking; similarity indices; differential consistency analysis; Tanimoto index; molecular fingerprints

资金

  1. University of Florida
  2. National Research, Development and Innovation Office of Hungary (NKFIH) [K 119269]
  3. Janos Bolyai Research Scholarship of the Hungarian Academy of Sciences

向作者/读者索取更多资源

The study investigates the conditions under which two comparative measures provide equivalent results on a given set of molecules. A novel method (Differential Consistency Analysis) is introduced to study the consistency between comparative measures, revealing that using a reference with less variation in similarity or representing molecules in a size-independent way can improve consistency. The presented derivations are applicable to all binary similarity coefficients introduced so far, regardless of molecular representations.
Similarity measures are widely used in various areas from taxonomy to cheminformatics. To this end, a large number of similarity and distance measures (or, collectively, comparative measures) have been introduced, with only a few studies directed to revealing their inner relationships. We present a thorough analytical study of the conditions leading to two comparative measures providing equivalent results over a given set of molecules. A key part of this work is the introduction of a novel way to study the consistency between comparative measures: the differential consistency analysis (DCA). This tool reveals how the consistency can be established in an analytical way with minimal (or no) assumptions. We found that the consensus between Tanimoto and the Cosine coefficients improved by choosing a reference whose similarity to the rest of the molecules varies less, or by representing the molecules in a way that does not depend strongly on their size (i. e. bit frequency in the chosen fingerprint representation). The presented derivations are just some generic examples; DCA can be applied widely and for all binary similarity coefficients introduced so far, independently from the molecular representations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据