4.4 Article

Extended continuous similarity indices: theory and application for QSAR descriptor selection

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN
卷 36, 期 3, 页码 157-173

出版社

SPRINGER
DOI: 10.1007/s10822-022-00444-7

关键词

Similarity; QSAR; Extended similarity; Descriptors

资金

  1. National Research, Development and Innovation Office of Hungary (OTKA) [K_20 134260, PD_20 134416]
  2. University of Florida
  3. Hungarian Academy of Sciences: Janos Bolyai Research Scholarship
  4. Ministry for Innovation and Technology of Hungary [UNKP-21-5]

向作者/读者索取更多资源

Extended similarity indices improve the efficiency of binary string comparison and have applications in various fields. However, the current indices are limited to binary or categorical inputs. We propose a further generalization to apply the indices to numerical data with continuous components and discuss different implementation methods and their analytical properties.
Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N-2) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据