4.6 Review

R Packages for Data Quality Assessments and Data Monitoring: A Software Scoping Review with Recommendations for Future Developments

期刊

APPLIED SCIENCES-BASEL
卷 12, 期 9, 页码 -

出版社

MDPI
DOI: 10.3390/app12094238

关键词

data quality; data quality monitoring; data reporting; exploratory data analysis; initial data analysis; R project for statistical computing

资金

  1. DFG [NFDI 13/1, SCHM 2744/9-1, SCHM 2744/3-1]
  2. European Union [825903]
  3. German Federal Ministry of Education and Research (BMBF) within the Medical Informatics Initiative (MIRACUM Consortium) [FKZ: 01ZZ1801A]

向作者/读者索取更多资源

Data quality assessments are crucial for ensuring valid research results, but a systematic comparison of R packages for DQA is lacking. Out of over 140 R packages screened, only 27 were found to meet the criteria set by a DQ framework for observational health studies. These packages vary considerably in terms of functionalities and usability, highlighting the need for future developments in metadata utilization and user-friendliness enhancement.
Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) and assess their scope against a DQ framework for observational health studies. Based on a systematic search, we screened more than 140 R packages related to DQA in the Comprehensive R Archive Network. From these, we selected packages which target at least three of the four DQ dimensions (integrity, completeness, consistency, accuracy) in a reference framework. We evaluated the resulting 27 packages for general features (e.g., usability, metadata handling, output types, descriptive statistics) and the possible assessment's breadth. To facilitate comparisons, we applied all packages to a publicly available dataset from a cohort study. We found that the packages' scope varies considerably regarding functionalities and usability. Only three packages follow a DQ concept, and some offer an extensive rule-based issue analysis. However, the reference framework does not include a few implemented functionalities, and it should be broadened accordingly. Improved use of metadata to empower DQA and user-friendliness enhancement, such as GUIs and reports that grade the severity of DQ issues, stand out as the main directions for future developments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据