期刊
出版社
IEEE
DOI: 10.1109/CSCI54926.2021.00127
关键词
multidimensional scaling; principal component analysis; t-distributed stochastic neighbor embedding; single nucleotide polymorphisms; population structure analysis
This study compares two visualization techniques, MDS and t-SNE, for detecting population substructures in genetic data. While both methods successfully reveal substructures in 2D, the MDS-based method is better at preserving relative similarity between populations.
Single Nucleotide Polymorphisms (S N Ps) present an important component of a genome's information and have been extensively used in genetics for population structure analysis. SNP data visualization assists in detecting population substructures. However, SNP sequences include thousands or millions of data points. One way to visualize SNP data is through dimensionality reduction. Principal Component Analysis (PCA) has been traditionally used for reducing dimensionality to 2D or 3D with reasonably acceptable outcomes. However, visualizing complex population structures requires more advanced techniques. Recently, t-Distributed Stochastic Neighbor Embedding (t-SNE) has been used for SNP visualization. In this work, a Multidimensional Scaling (MDS)-based method is presented and compared with t-SNE. Although both techniques successfully reveal population substructures in 2D, the MDS-based method better preserves the relative similarity between populations.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据