Related references
Note: Only part of the references are listed.TiDB: A Raft-based HTAP Database
Dongxu Huang et al.
PROCEEDINGS OF THE VLDB ENDOWMENT (2020)
Distributed and Parallel Ensemble Classification for Big Data Based on Kullback-Leibler Random Sample Partition
Chenghao Wei et al.
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT I (2020)
Big data and Spark: Comparison with Hadoop
Yassine Benlachmi et al.
PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020) (2020)
Distributed Data Strategies to Support Large-Scale Data Analysis Across Geo-Distributed Data Centers
Tamer Z. Emara et al.
IEEE ACCESS (2020)
A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis
Mohammad Sultan Mahmud et al.
BIG DATA MINING AND ANALYTICS (2020)
Sampling Techniques for Big Data Analysis
Jae Kwang Kim et al.
INTERNATIONAL STATISTICAL REVIEW (2019)
A distributed data management system to support large-scale data analysis
Tamer Z. Emara et al.
JOURNAL OF SYSTEMS AND SOFTWARE (2019)
Random Sample Partition: A Distributed Data Model for Big Data Analysis
Salman Salloum et al.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS (2019)
Wireless MapReduce Distributed Computing
Fan Li et al.
IEEE TRANSACTIONS ON INFORMATION THEORY (2019)
An Asymptotic Ensemble Learning Framework for Big Data Analysis
Salman Salloum et al.
IEEE ACCESS (2019)
Exploring and cleaning big data with random sample data blocks
Salman Salloum et al.
JOURNAL OF BIG DATA (2019)
TensorFlow on state-of-the-art HPC clusters: a machine learning use case
Guillem Ramirez-Gargallo et al.
2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID) (2019)
H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs
Hamoud Alshammari et al.
IEEE TRANSACTIONS ON CLOUD COMPUTING (2018)
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
Marco A. S. Netto et al.
ACM COMPUTING SURVEYS (2018)
Efficient Parallel Random Sampling-Vectorized, Cache-Efficient, and Online
Peter Sanders et al.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE (2018)
Load balancing in reducers for skewed data in MapReduce systems by using scalable simple random sampling
Elaheh Gavagsaz et al.
JOURNAL OF SUPERCOMPUTING (2018)
Ensemble learning: A survey
Omer Sagi et al.
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2018)
DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters
Xiaoyi Lu et al.
IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS (2018)
Distributed stream clustering using micro-clusters on Apache Storm
Pasan Karunaratne et al.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2017)
Big Data Processing Stacks
Sherif Sakr
IT Professional (2017)
I-sampling: A New Block-Based Sampling Method for Large-Scale Dataset
Yulin He et al.
2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017) (2017)
State Management in Apache Flink® Consistent Stateful Distributed Stream Processing
Paris Carbone et al.
PROCEEDINGS OF THE VLDB ENDOWMENT (2017)
Empirical Analysis of Asymptotic Ensemble Learning for Big Data
Salman Salloum et al.
2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT) (2016)
Apache Flink: Stream Analytics at Scale
Asterios Katsifodimos et al.
2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING WORKSHOP (IC2EW) (2016)
Performance Optimization for Managing Massive Numbers of Small Files in Distributed File Systems
Songling Fu et al.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2015)
Hadoop, MapReduce and HDFS: A Developers Perspective
Mohd Rehan Ghazi et al.
INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015) (2015)
A Survey on Distributed File System Technology
J. Blomer
16TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2014) (2015)
Spark SQL: Relational Data Processing in Spark
Michael Armbrust et al.
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (2015)
A comprehensive view of Hadoop research-A systematic literature review
Ivanilton Polato et al.
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS (2014)
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Rong Gu et al.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2014)
Challenges of Big Data analysis
Jianqing Fan et al.
NATIONAL SCIENCE REVIEW (2014)
Sampling for Big Data: A Tutorial
Graham Cormode et al.
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14) (2014)
Parallel and distributed computing: Memories of Time Past and a Glimpse at the Future
Dan C. Marinescu
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC) (2014)
iMapReduce: A Distributed Computing Framework for Iterative Computation
Yanfeng Zhang et al.
JOURNAL OF GRID COMPUTING (2012)
The HaLoop approach to large-scale iterative data analysis
Yingyi Bu et al.
VLDB JOURNAL (2012)
The TianHe-1A Supercomputer: Its Hardware and Software
Xue-Jun Yang et al.
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (2011)
MapReduce: A Flexible Data Processing Tool
Jeffrey Dean et al.
COMMUNICATIONS OF THE ACM (2010)
DFS: A File System for Virtualized Flash Storage
William K. Josephson et al.
ACM TRANSACTIONS ON STORAGE (2010)
Hive - A Warehousing Solution Over a Map-Reduce Framework
Ashish Thusoo et al.
PROCEEDINGS OF THE VLDB ENDOWMENT (2009)
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
Alan F. Gates et al.
PROCEEDINGS OF THE VLDB ENDOWMENT (2009)
Mapreduce: Simplified data processing on large clusters
Jeffrey Dean et al.
COMMUNICATIONS OF THE ACM (2008)
A high performance algorithm for static task scheduling in heterogeneous distributed computing systems
Mohammad I. Daoud et al.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2008)
A taxonomy and survey of grid resource management systems for distributed computing
K Krauter et al.
SOFTWARE-PRACTICE & EXPERIENCE (2002)