4.6 Article

Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical -omics Data with Multiple Explanatory Variables

Journal

ACS OMEGA
Volume 8, Issue 24, Pages 22042-22054

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acsomega.3c01613

Keywords

-

Ask authors/readers for more resources

Biological volatilome analysis is complex due to the large number of compounds and differences in peak areas within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques, but these methods often assume statistical assumptions which are violated in biological data. This study explores the impact of different statistical models and log transformation on volatilome dimensionality reduction. The analysis of Shingleback lizard volatilomes demonstrates the importance of considering multiple explanatory variables and the effects of log transformation in downstream analyses.
Biological volatilome analysis is inherently complexdue to theconsiderable number of compounds (i.e., dimensions) and differencesin peak areas by orders of magnitude, between and within compoundsfound within datasets. Traditional volatilome analysis relies on dimensionalityreduction techniques which aid in the selection of compounds thatare considered relevant to respective research questions prior tofurther analysis. Currently, compounds of interest are identifiedusing either supervised or unsupervised statistical methods whichassume the data residuals are normally distributed and exhibit linearity.However, biological data often violate the statistical assumptionsof these models related to normality and the presence of multipleexplanatory variables which are innate to biological samples. In anattempt to address deviations from normality, volatilome data canbe log transformed. However, whether the effects of each assessedvariable are additive or multiplicative should be considered priorto transformation, as this will impact the effect of each variableon the data. If assumptions of normality and variable effects arenot investigated prior to dimensionality reduction, ineffective orerroneous compound dimensionality reduction can impact downstreamanalyses. It is the aim of this manuscript to assess the impact ofsingle and multivariable statistical models with and without the logtransformation to volatilome dimensionality reduction prior to anysupervised or unsupervised classification analysis. As a proof ofconcept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution andfrom captivity and were assessed. Shingleback volatilomes are suspectedto be influenced by multiple explanatory variables related to habitat(Bioregion), sex, parasite presence, total body volume, and captivestatus. This work determined that the exclusion of relevant multipleexplanatory variables from analysis overestimates the effect of Bioregionand the identification of significant compounds. The log transformationincreased the number of compounds that were identified as significant,as did analyses that assumed that residuals were normally distributed.Among the methods considered in this work, the most conservative formof dimensionality reduction was achieved through analyzing untransformeddata using Monte Carlo tests with multiple explanatory variables.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available