☆ 4.3 Article

A generic parallel processing model for facilitating data mining and integration

PARALLEL COMPUTING (2011)

期刊

PARALLEL COMPUTING

卷 37, 期 3, 页码 157-171

出版社

ELSEVIER

DOI: 10.1016/j.parco.2011.02.006

关键词

Pipeline streaming; Parallelism; Data mining and data integration (DMI); Workflow; Life sciences; OGSA-DAI

类别

Computer Science, Theory & Methods

资金

EU [FP7-ICT-215024]
Engineering and Physical Sciences Research Council [EP/D079829/1] Funding Source: researchfish
EPSRC [EP/D079829/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study. (C) 2011 Elsevier B.V. All rights reserved.

A generic parallel processing model for facilitating data mining and integration

期刊

PARALLEL COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A generic parallel processing model for facilitating data mining and integration

期刊

PARALLEL COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文