☆ 4.5 Article

Experience with using the Parallel Workloads Archive

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2014)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

卷 74, 期 10, 页码 2967-2982

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jpdc.2014.06.013

关键词

Workload log; Data quality; Parallel job scheduling

类别

Computer Science, Theory & Methods

资金

Israel Science Foundation [219/99, 167/03]
Ministry of Science and Technology, Israel

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Science is based upon observation. The scientific study of complex computer systems should therefore be based on observation of how they are used in practice, as opposed to how they are assumed to be used or how they were designed to be used. In particular, detailed workload logs from real computer systems are invaluable for research on performance evaluation and for designing new systems. Regrettably, workload data may suffer from quality issues that might distort the study results, just as scientific observations in other fields may suffer from measurement errors. The cumulative experience with the Parallel Workloads Archive, a repository of job-level usage data from large-scale parallel supercomputers, clusters, and grids, has exposed many such issues. Importantly, these issues were not anticipated when the data was collected, and uncovering them was not trivial. As the data in this archive is used in hundreds of studies, it is necessary to describe and debate procedures that may be used to improve its data quality. Specifically, we consider issues like missing data, inconsistent data, erroneous data, system configuration changes during the logging period, and unrepresentative user behavior. Some of these may be countered by filtering out the problematic data items. In other cases, being cognizant of the problems may affect the decision of which datasets to use. While grounded in the specific domain of parallel jobs, our findings and suggested procedures can also inform similar situations in other domains. (C) 2014 Elsevier Inc. All rights reserved.

Experience with using the Parallel Workloads Archive

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Experience with using the Parallel Workloads Archive

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文