4.6 Article

Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data

Journal

IEEE ACCESS
Volume 11, Issue -, Pages 13594-13604

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3243573

Keywords

Data visualization; Principal component analysis; Sociology; Dimensionality reduction; Smoothing methods; Topology; Periodic structures; multidimensional scaling; PCA; population structure; single nucleotide polymorphisms

Ask authors/readers for more resources

Population structure can be revealed using Single Nucleotide Polymorphisms (SNPs). Principal Component Analysis (PCA) has been widely used for visualizing SNP data, but other dimensionality reduction methods may be more successful. However, these techniques often struggle with preserving the global structure in SNP data or have high computational cost. In this study, a method called Multidimensional Scaling (MDS) of smoothed PCA-transformed data (MSSPD) is proposed, which successfully reveals population structures in 2D maps and is computationally efficient compared to other methods.
Population structure can be revealed using Single Nucleotide Polymorphisms (SNPs) which are genetic variations found in the DNA sequences of individuals. Due to the large number of SNPs, visualization of SNP data is often achieved through dimensionality reduction. Although Principal Component Analysis (PCA) has been extensively used for SNP data visualization, some other dimensionality reduction methods have been shown to be more successful in revealing complex population structures. Nevertheless, these techniques often suffer from reduced ability to preserve the global structure in the SNP data, namely the relative genetic distance between subpopulations, or from high computational cost. In this work, a method which uses Multidimensional Scaling (MDS) of smoothed PCA-transformed data (MSSPD) is proposed. MSSPD successfully reveals population structures in 2D maps, while being more effective than other techniques in preserving the global structure. In terms of computational efficiency, MSSPD is comparable to the fastest SNP visualization methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available