☆ 4.6 Article

Reproducible big data science: A case study in continuous FAIRness

PLOS ONE (2019)

Journal

PLOS ONE

Volume 14, Issue 4, Pages -

Publisher

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0213013

Keywords

Funding

NIH [1U54EB020406-01, 1OT3OD025458-01, 5R01HG009018]
DOE [DE-AC02-06CH11357]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re) use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.

Reproducible big data science: A case study in continuous FAIRness

Journal

PLOS ONE

Publisher

PUBLIC LIBRARY SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Reproducible big data science: A case study in continuous FAIRness

Journal

PLOS ONE

Publisher

PUBLIC LIBRARY SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper