☆ 4.8 Article

Fuzzy c-Means Algorithms for Very Large Data

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2012)

期刊

IEEE TRANSACTIONS ON FUZZY SYSTEMS

卷 20, 期 6, 页码 1130-1146

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TFUZZ.2012.2201485

关键词

Big data; fuzzy c-means (FCM); kernel methods; scalable clustering; very large (VL) data

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

Radiomics of Non-Small Cell Lung Cancer from the National Institutes of Health [1U01CA143062-01]
Michigan State University High Performance Computing Center
Institute for Cyber Enabled Research
National Science Foundation [1019343]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Very large (VL) data or big data are any data that you cannot load into your computer's working memory. This is not an objective definition, but a definition that is easy to understand and one that is practical, because there is a dataset too big for any computer you might use; hence, this is VL data for you. Clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases (including VL images) in various applications, and so, clustering algorithms that scale well to VL data are important and useful. This paper compares the efficacy of three different implementations of techniques aimed to extend fuzzy c-means (FCM) clustering to VL data. Specifically, we compare methods that are based on 1) sampling followed by noniterative extension; 2) incremental techniques that make one sequential pass through subsets of the data; and 3) kernelized versions of FCM that provide approximations based on sampling, including three proposed algorithms. We use both loadable and VL datasets to conduct the numerical experiments that facilitate comparisons based on time and space complexity, speed, quality of approximations to batch FCM (for loadable data), and assessment of matches between partitions and ground truth. Empirical results show that random sampling plus extension FCM, bit-reduced FCM, and approximate kernel FCM are good choices to approximate FCM for VL data. We conclude by demonstrating the VL algorithms on a dataset with 5 billion objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.

Fuzzy c-Means Algorithms for Very Large Data

期刊

IEEE TRANSACTIONS ON FUZZY SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fuzzy c-Means Algorithms for Very Large Data

期刊

IEEE TRANSACTIONS ON FUZZY SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文