4.7 Article

I/O efficient structural clustering and maintenance of clusters for large-scale graphs

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 168, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.114221

Keywords

Graph; Structural graph clustering; I/O-efficient algorithm; Cluster maintenance; Dynamic graph

Funding

  1. MSIT (Ministry of Science and ICT), Korea, through the NRF [2013M3A9C4078137]
  2. National Research Foundation of Korea - Korea government (MSIT) [2020R1A2C1004032]
  3. National Research Foundation of Korea [2020R1A2C1004032, 2013M3A9C4078137] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

This study introduces an I/O-efficient algorithm for large-scale graph data, pm-SCAN, capable of clustering structures even with limited memory, and proposes a cluster maintenance method for dynamic graph data that shows significant performance improvement compared to traditional methods.
In recent years, the size of graph data has increased significantly, but most existing graph clustering algorithms do not consider the case where the size of main memory is not sufficient to handle large amount of graph data. Exploring entire region of graph for clustering causes too many random disk accesses to use data that are not loaded into memory, resulting in excessive disk I/O and thrashing. To address this problem, we propose an I/O-efficient algorithm for structural clustering of a graph, called pm-SCAN. In the proposed method, if memory is insufficient, an input graph is partitioned into several subgraphs smaller than memory, and clustering is first performed for each subgraph. And then clusters from the subgraphs are merged based on connectivity between clusters so that global results can be obtained in the point of view of an original input graph. Not only does pm SCAN produce scalable performance even for very large graphs, i.e., significant shortage of available memory, but also the result of pm-SCAN is the same as that of the original structural clustering algorithm SCAN. We also propose a cluster maintenance method for large-scale dynamic graphs that change over time. Instead of reclustering with a whole graph, only a small set of nodes whose structural connectivities are subject to change by a given update operation is first identified, and we access only those nodes in disk and update their clusters to reduce maintenance costs. This dynamic graph handling mechanism shows significant performance improvement compared to the existing method and the baseline that performs clustering from scratch.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available