4.3 Article

A generic parallel processing model for facilitating data mining and integration

期刊

PARALLEL COMPUTING
卷 37, 期 3, 页码 157-171

出版社

ELSEVIER
DOI: 10.1016/j.parco.2011.02.006

关键词

Pipeline streaming; Parallelism; Data mining and data integration (DMI); Workflow; Life sciences; OGSA-DAI

资金

  1. EU [FP7-ICT-215024]
  2. Engineering and Physical Sciences Research Council [EP/D079829/1] Funding Source: researchfish
  3. EPSRC [EP/D079829/1] Funding Source: UKRI

向作者/读者索取更多资源

To facilitate data mining and integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements (PEs). The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provide room for performance enhancement. We have applied this approach to a real DMI case in the life sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study. (C) 2011 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据