Statistics & Probability

Article Statistics & Probability

Robust analogs to the coefficient of variation

Chandima N. P. G. Arachchige, Luke A. Prendergast, Robert G. Staudte

Summary: This study explores the feasibility of using quantile-based measures of relative dispersion as an alternative to the coefficient of variation (CV). The findings suggest that using the interquartile range or the median absolute deviation as robust estimators can yield similar results to the CV, with better robustness to outliers and skewed distributions.

JOURNAL OF APPLIED STATISTICS (2022)

Article Biochemical Research Methods

Designing Uncorrelated Address Constrain for DNA Storage by DMVO Algorithm

Ben Cao, Xue Li, Xiaokang Zhang, Bin Wang, Qiang Zhang, Xiaopeng Wei

Summary: A large amount of data is being produced every second, and DNA is considered a feasible storage solution due to its high storage density and long-term stability. However, errors are easily made during DNA sequencing and synthesis. To reduce the error rate, a novel address constraint method is proposed, and a DMVO algorithm is used to construct a set of DNA coding. Compared to previous work, the coding set obtained by the DMVO algorithm is larger in size and of higher quality.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Engineering, Environmental

Development of new machine learning model for streamflow prediction: case studies in Pakistan

Rana Muhammad Adnan, Reham R. Mostafa, Ahmed Elbeltagi, Zaher Mundher Yaseen, Shamsuddin Shahid, Ozgur Kisi

Summary: A novel hybrid method utilizing the GBO algorithm to adjust ANFIS hyperparameters was developed for accurate estimation of streamflow in mountainous river basins. The GBO algorithm enhanced ANFIS prediction accuracy compared to other benchmark methods, resulting in significant improvements in predicting monthly streamflows and peak streamflow accuracy. The ANFIS-GBO model also showed higher strength in estimating streamflows from nearby station data as input, outperforming standalone ANFIS models.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2022)

Article Statistics & Probability

Employing long short-term memory and Facebook prophet model in air temperature forecasting

Toni Toharudin, Resa Septiani Pontoh, Rezzy Eko Caraka, Solichatus Zahroh, Youngjo Lee, Rung Ching Chen

Summary: Accurate prediction of air temperature is crucial for weather forecasting. Neural network methods like LSTM and Facebook's Prophet model can adapt to unexpected fluctuations and factors like trends and seasonality in data. This study uses both LSTM and Prophet models to forecast daily air temperatures in Bandung for the next five years. The results show that each model performs better in different temperature ranges, with no significant difference in the RMSE values.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2023)

Review Engineering, Environmental

Exposure and health: A progress update by evaluation and scientometric analysis

Roshini Praveen Kumar, Steffi Joseph Perumpully, Cyril Samuel, Sneha Gautam

Summary: Efforts are being made worldwide to reduce exposure to air pollution, particularly in developing countries. This study applies a scientometric approach to analyze the current state and trends in air pollution exposure and health research, and identify research solutions and gaps. The analysis reveals the leading role of countries like China and the USA in this field. Topics such as the impact of pollution on climate and health, chemical characteristics and management practices, and health effects of exposure have been extensively researched in the past 5 years.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2023)

Article Engineering, Environmental

A comparative study of mutual information-based input variable selection strategies for the displacement prediction of seepage-driven landslides using optimized support vector regression

Junwei Ma, Yankun Wang, Xiaoxu Niu, Sheng Jiang, Zhiyang Liu

Summary: This study proposes an input variable selection method based on mutual information and incorporates it into an optimized support vector regression model for predicting the displacement of seepage-driven landslides. The experimental results show that the optimized model based on mutual information can significantly improve prediction accuracy and stability.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2022)

Article Biochemical Research Methods

Multi-Modality Fusion & Inductive Knowledge Transfer Underlying Non-Sparse Multi-Kernel Learning and Distribution Adaption

Yuanpeng Zhang, Kaijian Xia, Yizhang Jiang, Pengjiang Qian, Weiwei Cai, Chengyu Qiu, Khin Wee Lai, Dongrui Wu

Summary: With the development of sensors, there is an increasing amount of multimodal data in biomedical and bioinformatics fields. This study proposes a feature-level multimodal fusion model using multi-kernel learning and transfer learning, specifically designed for insufficient training samples. The model achieved better performance compared to baselines in multiple scenarios evaluated using epilepsy EEG data.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Economics

The triple difference estimator

Andreas Olden, Jarle Moen

Summary: Triple difference is a commonly used estimator in empirical economics research, but its theoretical foundation and assumptions are not universally agreed upon. This paper provides a comprehensive presentation of the triple difference estimator and shows that a causal interpretation can be achieved with just one parallel trend assumption.

ECONOMETRICS JOURNAL (2022)

Article Mathematics, Interdisciplinary Applications

Evaluating SEM Model Fit with Small Degrees of Freedom

Dexin Shi, Christine DiStefano, Alberto Maydeu-Olivares, Taehun Lee

Summary: Research has shown that the performance of root mean square error of approximation (RMSEA) is suboptimal for assessing structural equation models with small degrees of freedom (df). This study compares the performance of standardized root mean square residual (SRMR) and comparative fit index (CFI) in small df models with different factor loadings, sample sizes, and model misspecifications. The results indicate that in comparison to RMSEA, SRMR and CFI are less influenced by df and can provide more useful information in differentiating models with varying degrees of misfit. Researchers are advised to exercise caution when interpreting RMSEA for models with small df and rely more on SRMR and CFI.

MULTIVARIATE BEHAVIORAL RESEARCH (2022)

Article Statistics & Probability

Estimating the change in soccer's home advantage during the Covid-19 pandemic using bivariate Poisson regression

Luke S. Benz, Michael J. Lopez

Summary: In the context of the Covid-19 pandemic, researchers have compared soccer games played in front of empty stadia to those played with fans to understand the impact of spectators. They argue that using the Poisson distribution and bivariate Poisson regression models provides more accurate estimates of the home advantage than linear regression models. Their findings show that the presence or absence of fans can have mixed effects on the home advantage in different soccer leagues, suggesting a complex causal mechanism.

ASTA-ADVANCES IN STATISTICAL ANALYSIS (2023)

Article Statistics & Probability

High-Dimensional Vector Autoregressive Time Series Modeling via Tensor Decomposition

Di Wang, Yao Zheng, Heng Lian, Guodong Li

Summary: This article proposes a method to rearrange transition matrices of the model into a tensor form, restricting parameter space through tensor decomposition to improve model interpretability and estimation efficiency. Different algorithms are introduced for optimizing estimation in different dimensional cases.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2022)

Article Biochemical Research Methods

Structure-aware protein-protein interaction site prediction using deep graph convolutional network

Qianmu Yuan, Jianwen Chen, Huiying Zhao, Yaoqi Zhou, Yuedong Yang

Summary: The deep graph convolutional network (GraphPPIS) for predicting protein-protein interacting sites significantly improves prediction performance and captures spatial correlation better. The results highlight the importance of spatially neighboring residues for interacting site prediction.

BIOINFORMATICS (2022)

Article Mathematical & Computational Biology

Robust Design and Analysis of Clinical Trials With Nonproportional Hazards: A Straw Man Guidance From a Cross-Pharma Working Group

Satrajit Roychoudhury, Keaven M. Anderson, Jiabu Ye, Pralay Mukhopadhyay

Summary: Loss of power and clear description of treatment differences are important issues in clinical trial design and analysis, especially in nonproportional hazard scenarios. The current ICH E9 (R1) addendum suggests designing trials with clinically relevant estimands, while a combination test proposed by a cross-pharma working group provides robust power under different alternative hypotheses.

STATISTICS IN BIOPHARMACEUTICAL RESEARCH (2023)

Article Biochemical Research Methods

LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning

Wei Lan, Dehuan Lai, Qingfeng Chen, Ximin Wu, Baoshan Chen, Jin Liu, Jianxin Wang, Yi-Ping Phoebe Chen

Summary: It has been demonstrated that long noncoding RNA (lncRNA) plays crucial roles in human diseases. This study proposes a computational model called LDICDL to identify lncRNA-disease associations using collaborative deep learning. The model utilizes an automatic encoder to denoise multiple lncRNA and disease feature information, and predicts potential associations through matrix decomposition.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biochemical Research Methods

Make Interactive Complex Heatmaps in R

Zuguang Gu, Daniel Huebschmann

Summary: This work introduces InteractiveComplexHeatmap, a new R package that adds interactivity to the commonly used ComplexHeatmap package. With an easy-to-use interface, InteractiveComplexHeatmap allows for exporting static complex heatmaps to interactive Shiny web applications with just one additional line of code. It also provides flexible functionalities for integrating interactive heatmap widgets to build more complex and customized Shiny web applications.

BIOINFORMATICS (2022)

Article Economics

Structural Breaks in Interactive Effects Panels and the Stock Market Reaction to COVID-19

Yiannis Karavias, Paresh Kumar Narayan, Joakim Westerlund

Summary: Dealing with structural breaks is crucial in economic research. This article introduces a new toolbox for detecting structural breaks, which is applicable to various panel data and robust to heterogeneity. Applying this toolbox to panel data of 61 countries, a structural break dated to the first week of April is detected, indicating the impact of COVID-19 on stock returns. The reaction of the markets was negative before the break but zero thereafter, possibly influenced by quantitative easing programs announced by central banks.

JOURNAL OF BUSINESS & ECONOMIC STATISTICS (2023)

Article Biochemical Research Methods

DLAB: deep learning methods for structure-based virtual screening of antibodies

Constantin Schneider, Andrew Buchanan, Bruck Taddese, Charlotte M. Deane

Summary: DLAB can be used to improve antibody-antigen docking and structure-based virtual screening, enhancing pose ranking for antibody docking experiments and selection of accurate and correctly ranked antibody-antigen pairings.

BIOINFORMATICS (2022)

Article Statistics & Probability

Cross validation for uncertain autoregressive model

Zhe Liu, Xiangfeng Yang

Summary: This paper proposes three types of cross validation methods to choose the lag order in uncertain time series models, and derives corresponding calculation methods under the framework of uncertainty theory.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2022)

Article Computer Science, Theory & Methods

Finite-time stability of fractional-order fuzzy cellular neural networks with time delays

Feifei Du, Jun-Guo Lu

Summary: The finite-time stability of fractional-order fuzzy cellular neural networks with time delays is investigated. A new fractional-order Gronwall inequality with time delay is developed for the stability analysis of fractional-order delayed systems. A less conservative criterion for the finite-time stability of fractional-order fuzzy cellular neural networks with time delays is derived based on this inequality. Two examples are given to demonstrate the effectiveness and less conservativeness of the proposed results.

FUZZY SETS AND SYSTEMS (2022)

Article Economics

Standard Synthetic Control Methods: The Case Of Using All Preintervention Outcomes Together With Covariates

Ashok Kaul, Stefan Kloessner, Gregor Pfeifer, Manuel Schieler

Summary: Using all outcome lags as predictors renders other covariates irrelevant, impacting estimation results and policy conclusions. Restricting the usage of outcome lags can result in other covariates obtaining positive weights.

JOURNAL OF BUSINESS & ECONOMIC STATISTICS (2022)