Article
Biochemical Research Methods
Givanna H. Putri, Simon Anders, Paul Theodor Pyl, John E. Pimanda, Fabio Zanini
Summary: HTSeq 2.0 provides an expanded application programming interface, including a new representation for sparse genomic data, enhancements for htseq-count to accommodate single-cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes, and Python 3 support.
Article
Statistics & Probability
Yaowu Liu, Zilin Li, Xihong Lin
Summary: In this article, a minimax optimal ridge-type set test (MORST) is proposed for testing a global hypothesis. MORST has a higher power compared to classical tests when the signals are weak or moderate, with only a slight increase in computation. Extensive simulations demonstrate the robustness of MORST, and it performs well in analyzing real data.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Statistics & Probability
Yaqing Chen, Zhenhua Lin, Hans-Georg Muller
Summary: This paper proposes a distribution-to-distribution regression model based on the Wasserstein metric for analyzing random object data that do not belong to vector spaces. By utilizing the geometric properties of the tangent bundles of the space of random measures, the distributions are mapped to tangent spaces, enabling regression modeling for distribution data. Through simulations and asymptotic convergence rate derivation, the performance of the model in predicting distributions and estimating regression operators is verified.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Statistics & Probability
Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, Chudi Zhong
Summary: This work highlights the fundamental principles of interpretable machine learning and identifies 10 technical challenge areas in this field, including optimizing sparse models, scoring systems, and adding constraints for better interpretability. It serves as a useful starting point for statisticians and computer scientists interested in interpretable machine learning.
STATISTICS SURVEYS
(2022)
Article
Biochemical Research Methods
Chaoran Chen, Sarah Nadeau, Michael Yared, Philippe Voinov, Ning Xie, Cornelius Roemer, Tanja Stadler
Summary: The CoV-Spectrum website provides support for identifying and tracking new SARS-CoV-2 variants, with flexible mutation search capabilities and analysis on various data sources to understand characteristics and transmission of different variants.
Article
Computer Science, Artificial Intelligence
V. Roshan Joseph
Summary: When splitting data for training and testing, the optimal ratio should be root p : 1, where p represents the number of parameters in a linear regression model.
STATISTICAL ANALYSIS AND DATA MINING
(2022)
Article
Biochemical Research Methods
Nadav Brandes, Dan Ofer, Yam Peleg, Nadav Rappoport, Michal Linial
Summary: Self-supervised deep language modeling has achieved unprecedented success with natural language tasks, and the authors introduce a new deep language model called ProteinBERT specifically designed for proteins, which efficiently handles long sequences and achieves near or even better performance than other methods, providing an effective framework for rapid training of protein predictors.
Article
Economics
Stefano DellaVigna, Elizabeth Linos
Summary: Nudge interventions have been widely implemented in both academic studies and government units. However, there are significant differences in the impact of nudges between these two settings. This study compares data from 126 randomized controlled trials (RCTs) conducted by Nudge Units and academic journals, and identifies three factors contributing to the differences: statistical power, characteristics of the interventions, and selective publication. The findings suggest that selective publication and low statistical power are the major contributors to the disparities, while variation in nudge characteristics explains the remaining differences.
Article
Economics
Clement de Chaisemartin, Xavier D'Haultfoeuille
Summary: Linear regressions with period and group fixed effects are commonly used to estimate the effects of policies. However, recent research has shown that these regressions may produce misleading estimates if the effects of policies vary between different groups or over time. Therefore, alternative estimators robust to heterogeneous effects have been proposed in a growing literature. This survey uses these alternative estimators to reexamine a previous study by Wolfers ().
ECONOMETRICS JOURNAL
(2023)
Article
Biochemical Research Methods
Pierre-Alain Chaumeil, Aaron J. Mussig, Philip Hugenholtz, Donovan H. Parks
Summary: This study presents an updated version of GTDB-Tk that uses a divide-and-conquer approach to reduce memory requirements while minimizing classification impact.
Article
Biochemical Research Methods
Yafeng Zhao, Zhen Chen, Xuan Gao, Wenlong Song, Qiang Xiong, Junfeng Hu, Zhichao Zhang
Summary: The study explores the use of DoubleGAN to generate images of unhealthy plant leaves in order to balance unbalanced datasets. The WGAN is used to generate a pretrained model and the SRGAN is used to generate high-resolution images. Compared to DCGAN, the images generated by DoubleGAN are clearer and achieve higher recognition accuracy.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Tuan Nguyen, Giang T T Nguyen, Thin Nguyen, Duc-Hau Le
Summary: This study proposes a novel method called GraphDRP based on graph convolutional networks for drug response prediction and finds that graph representation can improve prediction performance.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2022)
Article
Mathematical & Computational Biology
Amy D. Willis, Bryan D. Martin
Summary: Comparing ecological communities across environmental gradients is challenging, especially when there are many different taxonomic groups. Traditional diversity estimation methods, such as maximum likelihood estimates of the parameters of a multinomial model, have strict assumptions and do not account for ecological networks. In this article, the authors leverage models from the compositional data literature to estimate diversity indices, such as Shannon, Simpson, Bray-Curtis, and Euclidean. They find that their method performs best in strongly networked communities with many taxa, as shown in a case study on the microbiome of seafloor basalts.
Article
Computer Science, Theory & Methods
Liang Cao, Deyin Yao, Hongyi Li, Wei Meng, Renquan Lu
Summary: This paper investigates the formation control issue of nonlinear multiagent systems with asymmetric input saturation and unmeasured states. A high-gain fuzzy observer is constructed to estimate the unavailable states, and a leader-follower formation control strategy is proposed. Two new dynamic event triggering mechanisms and dynamic rules of threshold parameters are established to reduce the communication between controller and actuator. Furthermore, a modified auxiliary system is developed to counteract the adverse effect of asymmetric input saturation.
FUZZY SETS AND SYSTEMS
(2023)
Article
Statistics & Probability
Patricia Mendes dos Santos, Marcelo Angelo Cirillo
Summary: This study improves a conventional construct validation indicator by using adaptive regressions and finds that the adaptive linear regression method is efficient for correctly specified models in formative structural models.
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION
(2023)
Article
Economics
Dennis Egger, Johannes Haushofer, Edward Miguel, Paul Niehaus, Michael Walker
Summary: This study examines the effects of large economic stimuli on individuals and the overall economy, and provides meaningful insights through an experiment conducted in rural Kenya. The results show that cash transfers have significant impacts on consumption and assets for recipients, and also generate positive spillover effects on non-recipient households and firms, with minimal inflation.
Article
Statistics & Probability
Alberto Abadie, Jann Spiess
Summary: Nearest-neighbor matching is a useful tool for creating balance between treatment and control groups in observational studies, reducing the dependence on parametric modeling assumptions. Ignoring the matching step can lead to invalid standard errors, especially if matching is conducted with replacement or if the regression model is misspecified.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Biochemical Research Methods
Brennan Abanades, Guy Georges, Alexander Bujotzek, Charlotte M. Deane
Summary: In this study, the researchers developed a deep learning-based tool called ABlooper for predicting the structure of CDR loops in antibodies. ABlooper accurately predicts the structure of CDR-H3 loops, which are known for their sequence and structural variability. The tool provides high accuracy predictions and confidence estimates for each prediction.
Article
Biochemical Research Methods
Shixiang Wang, Yi Xiong, Longfei Zhao, Kai Gu, Yin Li, Fei Zhao, Jianfeng Li, Mingjie Wang, Haitao Wang, Ziyu Tao, Tao Wu, Yichao Zheng, Xuejun Li, Xue-Song Liu
Summary: UCSC Xena platform offers processed cancer omics data, while UCSCXenaShiny is an R Shiny package that allows users to quickly search, download, and explore the data. This tool provides important research opportunities for cancer researchers and clinicians with limited programming experience.
Article
Statistics & Probability
Mats J. Stensrud, Jessica G. Young, Vanessa Didelez, James M. Robins, Miguel A. Hernan
Summary: The presence of competing events complicates the definition of causal effects in time-to-event settings. This study proposes separable effects to examine the causal effect of a treatment on an event of interest. The separable direct effect is the treatment effect on the event of interest not mediated by its effect on the competing event. The separable indirect effect is the treatment effect on the event of interest only through its effect on the competing event. The assumption that the treatment can be decomposed into two distinct components is necessary for identifying the separable effects.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)