4.7 Article

Clustering Large Probabilistic Graphs

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2011.243

关键词

Uncertain data; probabilistic graphs; clustering algorithms; probabilistic databases

资金

  1. US National Science Foundation (NSF) [IIS-0812309]
  2. NSF [1017529]
  3. Division Of Computer and Network Systems
  4. Direct For Computer & Info Scie & Enginr [1017529] Funding Source: National Science Foundation

向作者/读者索取更多资源

We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据