☆ 3.8 Proceedings Paper

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) (2014)

期刊

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS)

卷 -, 期 -, 页码 645-652

出版社

IEEE

DOI: 10.1109/BigData.Congress.2014.137

关键词

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

NSF [OCI-1253644]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of Big Data Ogres and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common implementation/approaches of these paradigms, shed light upon the reasons for their current architecture and discuss some typical workloads that utilize them. In spite of the significant software distinctions, we believe there is architectural similarity. We discuss the potential integration of different implementations, across the different levels and components. Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology. We use a simple and broadly used Ogre (K-means clustering), characterize its performance on a range of representative platforms, covering several implementations from both paradigms. Our experiments provide an insight into the relative strengths of the two paradigms. We propose that the set of Ogres will serve as a benchmark to evaluate the two paradigms along different dimensions.

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

期刊

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

期刊

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文