4.6 Article

Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical -omics Data with Multiple Explanatory Variables

期刊

ACS OMEGA
卷 8, 期 24, 页码 22042-22054

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acsomega.3c01613

关键词

-

向作者/读者索取更多资源

Biological volatilome analysis is complex due to the large number of compounds and differences in peak areas within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques, but these methods often assume statistical assumptions which are violated in biological data. This study explores the impact of different statistical models and log transformation on volatilome dimensionality reduction. The analysis of Shingleback lizard volatilomes demonstrates the importance of considering multiple explanatory variables and the effects of log transformation in downstream analyses.
Biological volatilome analysis is inherently complexdue to theconsiderable number of compounds (i.e., dimensions) and differencesin peak areas by orders of magnitude, between and within compoundsfound within datasets. Traditional volatilome analysis relies on dimensionalityreduction techniques which aid in the selection of compounds thatare considered relevant to respective research questions prior tofurther analysis. Currently, compounds of interest are identifiedusing either supervised or unsupervised statistical methods whichassume the data residuals are normally distributed and exhibit linearity.However, biological data often violate the statistical assumptionsof these models related to normality and the presence of multipleexplanatory variables which are innate to biological samples. In anattempt to address deviations from normality, volatilome data canbe log transformed. However, whether the effects of each assessedvariable are additive or multiplicative should be considered priorto transformation, as this will impact the effect of each variableon the data. If assumptions of normality and variable effects arenot investigated prior to dimensionality reduction, ineffective orerroneous compound dimensionality reduction can impact downstreamanalyses. It is the aim of this manuscript to assess the impact ofsingle and multivariable statistical models with and without the logtransformation to volatilome dimensionality reduction prior to anysupervised or unsupervised classification analysis. As a proof ofconcept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution andfrom captivity and were assessed. Shingleback volatilomes are suspectedto be influenced by multiple explanatory variables related to habitat(Bioregion), sex, parasite presence, total body volume, and captivestatus. This work determined that the exclusion of relevant multipleexplanatory variables from analysis overestimates the effect of Bioregionand the identification of significant compounds. The log transformationincreased the number of compounds that were identified as significant,as did analyses that assumed that residuals were normally distributed.Among the methods considered in this work, the most conservative formof dimensionality reduction was achieved through analyzing untransformeddata using Monte Carlo tests with multiple explanatory variables.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据