Statistics & Probability

Article Automation & Control Systems

Partial least trimmed squares regression

Zhonghao Xie, Xi'an Feng, Xiaojing Chen

Summary: This paper proposes a robust method for PLS based on the idea of least trimmed squares (LTS), which effectively deals with high-dimensional regressors. By formulating the LTS problem as a concave maximization problem, the complexity of solving LTS is simplified. The results from simulation and real data sets demonstrate the effectiveness and robustness of the proposed approach.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2022)

Article Automation & Control Systems

iAFPs-EnC-GA: Identifying antifungal peptides using sequential and evolutionary descriptors based multi-information fusion and ensemble learning approach

Ashfaq Ahmad, Shahid Akbar, Muhammad Tahir, Maqsood Hayat, Farman Ali

Summary: Fungal infections are a global health concern, and existing treatments have severe side effects. This study presents an intelligent learning approach to accurately predict antifungal peptides by exploring sequential and evolutionary features. The proposed iAFPs-EnC-GA model achieves a high prediction accuracy and has the potential to play a key role in drug development and academic research.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2022)

Article Biochemical Research Methods

HyperAttentionDTI: improving drug-protein interaction prediction by sequence-based deep learning with attention mechanism

Qichang Zhao, Haochen Zhao, Kai Zheng, Jianxin Wang

Summary: This study introduces a bio-inspired model called HyperAttentionDTI, based on convolutional neural network and attention mechanism, for predicting drug-target interactions. By learning feature matrices of drugs and proteins using deep CNNs, and employing attention mechanism to simulate complex interactions among atoms and amino acids, the model achieves significantly improved performance compared to state-of-the-art baselines.

BIOINFORMATICS (2022)

Article Engineering, Environmental

Hybrid deep learning method for a week-ahead evapotranspiration forecasting

A. A. Masrur Ahmed, Ravinesh C. Deo, Qi Feng, Afshin Ghahramani, Nawin Raj, Zhenliang Yin, Linshan Yang

Summary: This study introduces a new hybrid deep learning approach, combining convolutional neural network and gated recurrent unit, with the use of ant colony optimization to improve crop evapotranspiration forecasting model. The results demonstrate excellent forecasting capability of the hybrid model in accurately predicting daily ETo with high efficiency.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2022)

Article Engineering, Environmental

Modelling daily reference evapotranspiration based on stacking hybridization of ANN with meta-heuristic algorithms under diverse agro-climatic conditions

Ahmed Elbeltagi, Nand Lal Kushwaha, Jitendra Rajput, Dinesh Kumar Vishwakarma, Luc Cimusa Kulimushi, Manish Kumar, Jingwen Zhang, Chaitanya B. Pande, Pandurang Choudhari, Sarita Gajbhiye Meshram, Kusum Pandey, Parveen Sihag, Navsal Kumar, Ismail Abd-Elaty

Summary: This study investigated the performance of five AI-based models for ET0 estimation and found that ANN-M5P and ANN-Bagging algorithms performed well in different models.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2022)

Article Statistics & Probability

Communication-Efficient Accurate Statistical Estimation

Jianqing Fan, Yongyi Guo, Kaizheng Wang

Summary: This article presents two communication-efficient accurate statistical estimators implemented through iterative algorithms for distributed optimization. The algorithms adapt to the similarity among loss functions on node machines and converge rapidly when each node machine has large enough sample size. The proposed method achieves statistical efficiency in finite steps in typical statistical applications.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2023)

Article Mathematics

Differential geometric approach of Betchov-Da Rios soliton equation

Yanlin Li, Melek Erdogdu, Ayse Yavuz

Summary: In this paper, we investigate the differential geometric properties of the soliton surface associated with the Betchov-Da Rios equation. We provide derivative formulas for the Frenet frame of the unit speed curve and discuss the linear map of Weingarten type in the tangent space of the surface. We also obtain the necessary and sufficient conditions for the soliton surface to be a minimal surface and examine an application of the soliton surface associated with the Betchov-Da Rios equation.

HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS (2023)

Article Biochemical Research Methods

Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning

Yunxiao Ren, Trinad Chakraborty, Swapnil Doijad, Linda Falgenhauer, Jane Falgenhauer, Alexander Goesmann, Anne-Christin Hauschild, Oliver Schwengers, Dominik Heider

Summary: This study evaluated logistic regression, support vector machine, random forest, and convolutional neural network for predicting antimicrobial resistance, demonstrating that random forests and convolutional neural networks generally outperform logistic regression and support vector machine in predicting antimicrobial resistance with high accuracy.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps

Mitchell R. Vollger, Peter Kerpedjiev, Adam M. Phillippy, Evan E. Eichler

Summary: StainedGlass is a new visualization tool that can depict the identity and orientation of multi-megabase tandem repeat structures at a genome-wide scale, aiding in the inference of evolutionary history for complex regions of genomes.

BIOINFORMATICS (2022)

Article Statistics & Probability

SPlit: An Optimal Method for Data Splitting

V. Roshan Joseph, Akhil Vakayil

Summary: In this article, an optimal method named SPlit for splitting a dataset into training and testing sets is proposed, which is based on the support points algorithm and can be applied to both regression and classification problems. The implementation on real datasets shows substantial improvement compared to the commonly used random splitting procedure.

TECHNOMETRICS (2022)

Article Biochemical Research Methods

A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network

Zhen Shen, Qinhu Zhang, Kyungsook Han, De-Shuang Huang

Summary: Attention mechanism is used to find important information in the sequence, with a focus on regions of the RNA sequence that can bind to proteins. This study extracts correlation features and evaluates the importance of different sites in the RNA sequence using LSTM and attention mechanism. The results show that this method outperforms traditional and deep learning-based methods, and explores the effects of other parameters on model performance.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biochemical Research Methods

Identifying Protein Subcellular Locations With Embeddings-Based node2loc

Xiaoyong Pan, Lei Chen, Min Liu, Zhibin Niu, Tao Huang, Yu-Dong Cai

Summary: Identifying protein subcellular locations is important in protein function prediction. This study presents a network embedding-based method, node2loc, which effectively predicts subcellular locations by learning distributed embeddings of proteins in a protein-protein interaction network and utilizing a recurrent neural network. The results demonstrate the method's superior performance compared to baseline methods and its ability to classify protein subcellular locations.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Statistics & Probability

Unbiased variable importance for random forests

Markus Loecher

Summary: This article proposes a simple solution to address the misleading issue of Gini importance in random forests. Instead of computing the loss reduction based on the in-bag training samples, the authors suggest using the out-of-bag samples to calculate the loss reduction, which can be viewed as an over-fitting problem.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS (2022)

Article Statistics & Probability

Beta ridge regression estimators: simulation and application

Mohamed R. Abonazel, Ibrahim M. Taha

Summary: The paper proposes ridge estimators for the beta regression model to address the problem of multicollinearity and improve estimation efficiency. By comparing the performance of ridge estimators to the ML estimator through simulation and empirical application, it is found that the proposed estimators outperform the ML estimator in terms of mean squared error and mean absolute error.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2023)

Article Statistics & Probability

Modified ridge-type for the Poisson regression model: simulation and application

Adewale F. Lukman, Benedicta Aladeitan, Kayode Ayinde, Mohamed R. Abonazel

Summary: A new estimator is proposed in this study to estimate the regression coefficients for the Poisson regression model when multicollinearity is a challenge. The theoretical comparison, simulation study, and application results demonstrate that the proposed estimator outperforms other estimators in terms of performance.

JOURNAL OF APPLIED STATISTICS (2022)

Article Engineering, Environmental

Sensitivity of normalized difference vegetation index (NDVI) to land surface temperature, soil moisture and precipitation over district Gautam Buddh Nagar, UP, India

Manish Sharma, Pargin Bangotra, Alok Sagar Gautam, Sneha Gautam

Summary: This study investigated the trends in MODIS/TERRA derived NDVI and its correlation with LST, SM, and precipitation in Gautam Buddh Nagar, India from 2005 to 2018. The research found that NDVI showed higher correlation with LST compared to SM and precipitation, indicating it is more sensitive to temperature changes. Additionally, NDVI values were highest during the winter season, followed by monsoon, post-monsoon, and pre-monsoon seasons.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2022)

Article Statistics & Probability

Statistical models of near-accident event and pedestrian behavior at non-signalized intersections

Xun Shen, Pongsathorn Raksincharoensak

Summary: This study proposes an innovative framework for modeling the statistical properties of near-accident events and pedestrian behavior at non-signalized intersections using Poisson process and logistic regression. The models are validated through generative simulation, aiming to assist in the development of traffic simulators or safety control designs that take into account pedestrian-vehicle interactions.

JOURNAL OF APPLIED STATISTICS (2022)

Article Statistics & Probability

On the estimation of Bell regression model using ridge estimator

Muhammad Amin, Muhammad Nauman Akram, Abdul Majid

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2023)

Article Statistics & Probability

An effective deep residual network based class attention layer with bidirectional LSTM for diagnosis and classification of COVID-19

Denis A. Pustokhin, Irina V. Pustokhina, Phuoc Nguyen Dinh, Son Van Phan, Gia Nhu Nguyen, Gyanendra Prasad Joshi, K. Shankar

Summary: This paper presents a new RCAL-BiLSTM model based on ResNet and Class Attention Layer for COVID-19 diagnosis. The model incorporates bilateral filtering preprocessing, feature extraction using ResNet and Bi-LSTM, and softmax-based classification. Experimental results on the Chest-X-Ray dataset demonstrate the superior performance of the RCAL-BiLSTM model.

JOURNAL OF APPLIED STATISTICS (2023)

Article Biochemical Research Methods

ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2

Emil K. Gustavsson, David Zhang, Regina H. Reynolds, Sonia Garcia-Ruiz, Mina Ryten

Summary: ggtranscript is a fast and flexible tool for visualizing and comparing transcripts, inheriting the functionality and familiarity of ggplot2 for easy use.

BIOINFORMATICS (2022)