4.7 Article

Clustering Large Probabilistic Graphs

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2011.243

Keywords

Uncertain data; probabilistic graphs; clustering algorithms; probabilistic databases

Funding

  1. US National Science Foundation (NSF) [IIS-0812309]
  2. NSF [1017529]
  3. Division Of Computer and Network Systems
  4. Direct For Computer & Info Scie & Enginr [1017529] Funding Source: National Science Foundation

Ask authors/readers for more resources

We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available