☆ 4.5 Article

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2019)

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Volume 28, Issue 2, Pages 256-264

Publisher

TAYLOR & FRANCIS INC

DOI: 10.1080/10618600.2018.1512867

Keywords

Databases; Data wrangling; Reproducibility; Statistical computing

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Many interesting datasets available on the Internet are of a medium size-too big to fit into a personal computer's memory, but not so large that they would not fit comfortably on its hard disk. In the coming years, datasets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality. Supplementary material for this article is available online.

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper