4.3 Article

On the role of benchmarking data sets and simulations in method comparison studies

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Statistics & Probability

Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study

Theresa Ullmann et al.

Summary: When researchers publish new cluster algorithms, they often present their method as superior, which may not be confirmed in independent benchmark studies. This study demonstrates how authors consciously or unconsciously paint their algorithm's performance in an over-optimistic light, using the example of the recently published cluster algorithm Rock.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2023)

Review Mathematical & Computational Biology

Dealing with confounding in observational studies: A scoping review of methods evaluated in simulation studies with single-point exposure

Anita Natalia Varga et al.

Summary: The aim of this article is to review the methods available for dealing with confounding in analyzing the effect of health care treatments with single-point exposure in observational data. The results show that there are significant differences in performance between different methods, and the performance of a specific method is highly dependent on the estimator used.

STATISTICS IN MEDICINE (2023)

Article Genetics & Heredity

A statistical boosting framework for polygenic risk scores based on large-scale genotype data

Hannah Klinkhammer et al.

Summary: Polygenic risk scores (PRS) evaluate individual genetic liability and are important in clinical risk stratification. This study develops an efficient algorithm, snpboost, for fitting multivariable models to genetic data for improved PRS predictive performance. By iteratively working on smaller batches of variants most correlated with residuals, snpboost increases computational efficiency without sacrificing prediction accuracy. Results show competitive prediction accuracy and efficiency compared to other commonly used methods, making snpboost a valuable tool for constructing PRS.

FRONTIERS IN GENETICS (2023)

Article Economics

Improving the statistical power of economic experiments using adaptive designs

Sebastian Jobjoernsson et al.

Summary: This paper discusses the important issue of ensuring sufficient power to reject hypotheses in economic experiments and introduces methods for testing multiple hypotheses simultaneously in adaptive, two-stage designs to improve experiment power. The paper provides a concise overview of relevant theory and demonstrates the method in three different applications, including a simulation study and analysis of previous experiment data sets. Simulation results highlight the potential for reducing sample sizes while maintaining the power to reject at least one hypothesis and controlling the overall Type I error probability.

EXPERIMENTAL ECONOMICS (2023)

Article Computer Science, Artificial Intelligence

Leakage and the reproducibility crisis in machine-learning-based science

Sayash Kapoor et al.

Summary: Machine-learning (ML) methods have gained prominence in quantitative sciences, but face methodological pitfalls like data leakage. This study systematically investigates reproducibility issues in ML-based science and identifies 17 fields where leakage has been found, affecting 294 papers and leading to overoptimistic conclusions. A detailed taxonomy of eight types of leakage is introduced, and researchers are encouraged to test for each type by filling out model info sheets. A reproducibility study shows that complex ML models do not significantly outperform traditional statistical models when errors are corrected.

PATTERNS (2023)

Article Mathematical & Computational Biology

A benchmark for dose-finding studies with unknown ordering

Pavel Mozgunov et al.

Summary: The article introduces a new benchmark evaluation method that takes into account the uncertainty in dose ordering, providing a sharper upper bound on performance. This approach can be applied to trials with multiple endpoints with discrete or continuous distributions.

BIOSTATISTICS (2022)

Article Health Care Sciences & Services

Which test for crossing survival curves? A user's guideline

Ina Dormuth et al.

Summary: The exchange of knowledge between statisticians and clinicians is crucial for developing new methodology and applying it in clinical trials. This study focuses on the detection of survival differences in clinical trials with crossing hazards and compares various tests for this purpose.

BMC MEDICAL RESEARCH METHODOLOGY (2022)

Article Social Sciences, Mathematical Methods

Comparing the real-world performance of exponential-family random graph models and latent order logistic models for social network analysis

Duncan A. Clark et al.

Summary: The study found that Latent Order Logistic models (LOLOG) generally exhibit qualitative agreement with Exponential-family random graph models (ERGM) and provide at least as good model fit. Additionally, LOLOG models are typically faster and easier to fit to data, avoiding the degeneracy tendency often seen in ERGMs.

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY (2022)

Article Computer Science, Artificial Intelligence

Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results

Christina Niessl et al.

Summary: The importance of neutral benchmark studies in comparing computational methods has been increasingly recognized in recent years. The flexibility in design and analysis choices can lead to biased interpretations of benchmark results and researchers may engage in questionable research practices. An illustrative example demonstrates the impact of design and analysis options on benchmark results, highlighting the need for computational researchers and the scientific community to consider this issue for more reliable results.

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Article Statistics & Probability

Is there a role for statistics in artificial intelligence?

Sarah Friedrich et al.

Summary: Statistics plays a significant role in both the theoretical and practical understanding of artificial intelligence (AI) and in its future development. It contributes to methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations, as well as evaluation of uncertainty in results. Integrating statistical aspects into AI teaching is crucial for schools and universities.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2022)

Article Statistics & Probability

Subgroup identification in individual participant data meta-analysis using model-based recursive partitioning

Cynthia Huber et al.

Summary: The article introduces a procedure called metaMOB for subgroup identification in IPD meta-analysis using the GLMM tree algorithm. The study found that metaMOB outperformed GLMM trees and MOB in terms of false discovery rates, accuracy of identified subgroups, and accuracy of estimated treatment effect.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2022)

Article Mathematical & Computational Biology

Against the one method fits all data sets philosophy for comparison studies in methodological research

Carolin Strobl et al.

Summary: This paper challenges the idea of striving to identify the best performing method in methodological comparison studies. The author argues that this research question assumes certain unwarranted assumptions and suggests a more informative research question that can be fruitfully investigated.

BIOMETRICAL JOURNAL (2022)

Article Statistics & Probability

Let's practice what we preach: Planning and interpreting simulation studies with design and analysis of experiments

Hugh Chipman et al.

Summary: This article outlines the use of design and analysis of experiments (DAE) methods for planning and analyzing simulation studies. It also demonstrates the application of Taguchi robust parameter design for studying the robustness of methods to various uncontrollable population parameters.

CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE (2022)

Article Mathematical & Computational Biology

Comparing linear discriminant analysis and supervised learning algorithms for binary classification-A method comparison study

Ricarda Graf et al.

Summary: This study compares linear discriminant analysis (LDA) with several supervised learning algorithms and investigates the extent to which LDA's predictive performance relies on the assumption of multivariate normality. The results show that, while LDA is often outperformed by random forest (RF) in terms of overall performance for bimodal data, LDA still demonstrates strong discriminative ability. However, the model calibration of LDA tends to be worse compared to RF. Therefore, LDA is still recommended for this type of application.

BIOMETRICAL JOURNAL (2022)

Article Pharmacology & Pharmacy

Scientific and regulatory evaluation of mechanistic in silico drug and disease models in drug development: Building model credibility

Flora T. Musuamba et al.

Summary: This white paper proposes a risk-informed evaluation framework for mechanistic model credibility evaluation in drug development, with discussion on concepts such as context of use, regulatory impact and risk-based analysis to ensure common understanding. The feasibility of the approach is demonstrated through application to three real case examples, using a credibility matrix being tested as a quick-start tool by regulators.

CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY (2021)

Article Biochemical Research Methods

Popularity and performance of bioinformatics software: the case of gene set analysis

Chengshu Xie et al.

Summary: Background Gene Set Analysis (GSA) is considered the preferred method for functional interpretation of omics results. This paper examines the popularity and performance of GSA methodologies and software published over the past 20 years. Results show discrepancies between the most popular and the best performing GSA methods, raising questions about current tool selection procedures and the quality of functional interpretation in biomedical studies.

BMC BIOINFORMATICS (2021)

Article Agriculture, Multidisciplinary

Introducing digital twins to agriculture

Christos Pylianidis et al.

Summary: Digital twins are increasingly adopted by various industries, yet it is uncertain whether agriculture is making efforts to embrace this technology. This research investigates the added-value of digital twins for agriculture through a mixed-method approach, providing insights into adoption levels and suggesting a roadmap for implementation based on digital twin characteristics in agriculture and other disciplines.

COMPUTERS AND ELECTRONICS IN AGRICULTURE (2021)

Article Pharmacology & Pharmacy

Estimating and comparing adverse event probabilities in the presence of varying follow-up times and competing events

Regina Stegherr et al.

Summary: Safety analyses of adverse events play a crucial role in evaluating the benefit-risk of therapies. Different estimators for AE probabilities may lead to bias in group comparisons. Proper selection of the AE probability estimator is essential for accurate safety assessments.

PHARMACEUTICAL STATISTICS (2021)

Article Computer Science, Information Systems

Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL

Nils Strodthoff et al.

Summary: The article discusses the progress in automatic ECG analysis and the application of deep learning-based classification algorithms in this field. It proposes the PTB-XL dataset as a benchmark resource for ECG analysis algorithms and discusses potential research directions.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2021)

Review Immunology

Digital Twins for Multiple Sclerosis

Isabel Voigt et al.

Summary: The development of digital twins offers important individualized innovation for the management of Multiple Sclerosis (MS), utilizing artificial intelligence to analyze multiple disease parameters and assist healthcare professionals in handling patient data, enhancing diagnosis, monitoring, and therapy to achieve personalized and effective care.

FRONTIERS IN IMMUNOLOGY (2021)

Article Health Care Sciences & Services

Comparison of six statistical methods for interrupted time series studies: empirical evaluation of 190 published series

Simon L. Turner et al.

Summary: The research showed that different statistical methods used in Interrupted Time Series studies could lead to significant variations in level and slope change point estimates, standard errors, confidence interval widths, and p-values. Statistical significance often differed across pairwise comparisons of methods, with disagreement ranging from 4 to 25%. Estimates of autocorrelation also varied depending on the method and length of the series.

BMC MEDICAL RESEARCH METHODOLOGY (2021)

Review Multidisciplinary Sciences

Industrial applications of digital twins

Yuchen Jiang et al.

Summary: A digital twin is a virtual replica continuously updated with data from its physical counterpart, serving as the pillar of Industry 4.0 and the foundation of future innovation. This article focuses on the use of digital twins in smart manufacturing, emphasizing plantwide optimization. The main capabilities of digital twins (mirroring, shadowing, threading) are discussed, with a perspective on the future provided.

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES (2021)

Article Medicine, Research & Experimental

Survival analysis for AdVerse events with VarYing follow-up times (SAVVY)-estimation of adverse event risks

Regina Stegherr et al.

Summary: The SAVVY project aims to improve AE analyses in clinical trials by using survival techniques, finding that commonly used estimators may have biases. It is recommended to use the Aalen-Johansen estimator with an appropriate definition of CEs for more accurate quantification of AE risk.

TRIALS (2021)

Article Biotechnology & Applied Microbiology

On the optimistic performance evaluation of newly introduced bioinformatic methods

Stefan Buchka et al.

Summary: Many research articles claim that new data analysis methods outperform existing ones, but the veracity of such claims is questionable. This manuscript discusses the consequences of optimistic bias in evaluating novel data analysis methods, and quantitatively investigates this bias using an example from epigenetic analysis.

GENOME BIOLOGY (2021)

Article Mathematical & Computational Biology

Survival analysis for AdVerse events with VarYing follow-up times (SAVVY): Rationale and statistical concept of a meta-analytic study

Regina Stegherr et al.

Summary: The SAVVY project aims to assess the impact of methodology on safety evaluation conclusions through a meta-analytical study, providing theoretical rationale and implementation examples of statistical methods in a unified notation.

BIOMETRICAL JOURNAL (2021)

Article Infectious Diseases

Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial

Philippe Gautret et al.

INTERNATIONAL JOURNAL OF ANTIMICROBIAL AGENTS (2020)

Article Computer Science, Interdisciplinary Applications

More powerful logrank permutation tests for two-sample survival data

Marc Ditzhaus et al.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION (2020)

Article Mathematical & Computational Biology

Blinded continuous information monitoring of recurrent event endpoints with time trends in clinical trials

Tobias Muetze et al.

STATISTICS IN MEDICINE (2020)

Article Medicine, Research & Experimental

Causal inference methods for small non-randomized studies: Methods and an application in COVID-19

Sarah Friedrich et al.

CONTEMPORARY CLINICAL TRIALS (2020)

Article Medicine, General & Internal

Introduction to statistical simulations in health research

Anne-Laure Boulesteix et al.

BMJ OPEN (2020)

Article Mathematics, Interdisciplinary Applications

The wild bootstrap for multivariate Nelson-Aalen estimators

Tobias Bluhmki et al.

LIFETIME DATA ANALYSIS (2019)

Article Medicine, Research & Experimental

Combined test versus logrank/Cox test in 50 randomised trials

Patrick Royston et al.

TRIALS (2019)

Article Mathematical & Computational Biology

Bootstrapping complex time-to-event data without individual patient data, with a view toward time-dependent exposures

Tobias Bluhmki et al.

STATISTICS IN MEDICINE (2019)

Article Mathematical & Computational Biology

Exposure density sampling: Dynamic matching with respect to a time-dependent exposure

Kristin Ohneberg et al.

STATISTICS IN MEDICINE (2019)

Article Computer Science, Artificial Intelligence

A survey of 25 years of evaluation

Kenneth Ward Church et al.

NATURAL LANGUAGE ENGINEERING (2019)

Article Health Care Sciences & Services

Likelihood-based random-effects meta-analysis with few studies: empirical and simulation studies

Svenja E. Seide et al.

BMC MEDICAL RESEARCH METHODOLOGY (2019)

Article Mathematical & Computational Biology

Using simulation studies to evaluate statistical methods

Tim P. Morris et al.

STATISTICS IN MEDICINE (2019)

Review Biotechnology & Applied Microbiology

Essential guidelines for computational method benchmarking

Lukas M. Weber et al.

GENOME BIOLOGY (2019)

Proceedings Paper Engineering, Biomedical

Ultrasound segmentation using U-Net: learning from simulated data and testing on real data

Bahareh Behboodi et al.

2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) (2019)

Review Biotechnology & Applied Microbiology

Guidelines for benchmarking of optimization-based approaches for fitting mathematical models

Clemens Kreutz

GENOME BIOLOGY (2019)

Article Statistics & Probability

OpenML: An R package to connect to the machine learning platform OpenML

Giuseppe Casalicchio et al.

COMPUTATIONAL STATISTICS (2019)

Article Computer Science, Theory & Methods

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Janek Thomas et al.

STATISTICS AND COMPUTING (2018)

Letter Mathematical & Computational Biology

On the necessity and design of studies comparing statistical methods

Anne-Laure Boulesteix et al.

BIOMETRICAL JOURNAL (2018)

Article Mathematical & Computational Biology

A studentized permutation test for three-arm trials in the 'gold standard' design

Tobias Muetze et al.

STATISTICS IN MEDICINE (2017)

Article Statistics & Probability

Permuting longitudinal data in spite of the dependencies

Sarah Friedrich et al.

JOURNAL OF MULTIVARIATE ANALYSIS (2017)

Article Statistics & Probability

NONPARAMETRIC ESTIMATION OF PREGNANCY OUTCOME PROBABILITIES

Sarah Friedrich et al.

ANNALS OF APPLIED STATISTICS (2017)

Article Computer Science, Interdisciplinary Applications

A wild bootstrap approach for nonparametric repeated measurements

Sarah Friedrich et al.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2017)

Article Pharmacology & Pharmacy

Statistical issues in the analysis of adverse events in time-to-event data

Arthur Allignol et al.

PHARMACEUTICAL STATISTICS (2016)

Article Pharmacology & Pharmacy

Biometrical issues in the analysis of adverse events within the benefit assessment of drugs

Ralf Bender et al.

PHARMACEUTICAL STATISTICS (2016)

Article Pharmacology & Pharmacy

Analysing adverse events by time-to-event models: the CLEOPATRA study

Tanja Proctor et al.

PHARMACEUTICAL STATISTICS (2016)

Editorial Material Pharmacology & Pharmacy

Statistical methods for the analysis of adverse event data

Meinhard Kieser

PHARMACEUTICAL STATISTICS (2016)

Article Mathematical & Computational Biology

Hartung-Knapp method is not always conservative compared with fixed-effect meta-analysis

Anna Wiksten et al.

STATISTICS IN MEDICINE (2016)

Article Statistics & Probability

A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies

Anne-Laure Boulesteix et al.

AMERICAN STATISTICIAN (2015)

Editorial Material Biochemical Research Methods

Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research

Anne-Laure Boulesteix

PLOS COMPUTATIONAL BIOLOGY (2015)

Article Computer Science, Interdisciplinary Applications

Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases

Jessica M. Franklin et al.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2014)

Article Statistics & Probability

Benchmarking local classification methods

Bernd Bischl et al.

COMPUTATIONAL STATISTICS (2013)

Article Multidisciplinary Sciences

A Plea for Neutral Comparison Studies in Computational Sciences

Anne-Laure Boulesteix et al.

PLOS ONE (2013)

Article Mathematical & Computational Biology

Simulating biologically plausible complex survival data

Michael J. Crowther et al.

STATISTICS IN MEDICINE (2013)

Article Health Care Sciences & Services

Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves

Patricia Guyot et al.

BMC MEDICAL RESEARCH METHODOLOGY (2012)

Article Statistics & Probability

A Data-Generation Process for Data with Specified Risk Differences or Numbers Needed to Treat

Peter C. Austin

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION (2010)

Article Health Care Sciences & Services

Aspects of Modernizing Drug Development Using Clinical Scenario Planning and Evaluation

Norbert Benda et al.

DRUG INFORMATION JOURNAL (2010)

Article Medicine, General & Internal

Three techniques for integrating data in mixed methods studies

Alicia O'Cathain et al.

BMJ-BRITISH MEDICAL JOURNAL (2010)

Article Mathematical & Computational Biology

Evaluating the Impact of Prior Assumptions in Bayesian Biostatistics

Satoshi Morita et al.

STATISTICS IN BIOSCIENCES (2010)

Article Mathematical & Computational Biology

Comparison of algorithms to generate event times conditional on time-dependent covariates

Marie-Pierre Sylvestre et al.

STATISTICS IN MEDICINE (2008)

Article Mathematical & Computational Biology

The performance of different propensity score methods for estimating marginal odds ratios

Peter C. Austin

STATISTICS IN MEDICINE (2007)

Article Biochemical Research Methods

Validating module network learning algorithms using simulated data

Tom Michoel et al.

BMC BIOINFORMATICS (2007)

Article Mathematical & Computational Biology

The design of simulation studies in medical statistics

Andrea Burton et al.

STATISTICS IN MEDICINE (2006)

Article Statistics & Probability

The design and analysis of benchmark experiments

T Hothorn et al.

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2005)

Article Critical Care Medicine

In silico design of clinical trials:: A method coming of age

G Clermont et al.

CRITICAL CARE MEDICINE (2004)