4.7 Article

Spark-GHSOM: Growing Hierarchical Self-Organizing Map for large scale mixed attribute datasets

Journal

INFORMATION SCIENCES
Volume 496, Issue -, Pages 572-591

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2018.12.007

Keywords

GHSOM; Self-Organizing Map; Clustering; Distributed; Big data

Funding

  1. European Commission [ICT-2013-612944, H2020-ICT-688797]
  2. Italian Ministry of Education, University and Research (MIUR) [ARS01_01259]
  3. Rotary Foundation

Ask authors/readers for more resources

The Growing Hierarchical Self-Organizing Map (GHSOM) algorithm has shown its potential for performing several tasks such as exploratory analysis, anomaly detection and forecasting on a variety of domains including the financial and cyber-security domains. GHSOM is a dynamic variant of the SOM algorithm which generates a multi-level hierarchy of SOM maps based solely on input data. However, in order to generate this multi-level structure, GHSOM requires multiple iterations over the input dataset, thus making it intractable on large datasets. Moreover, the conventional GHSOM algorithm is designed to handle datasets with numeric attributes only. This represents an important limitation as most modern real-world datasets are characterized by mixed attributes - numerical and categorical. In this work, we propose an extension of the conventional GHSOM algorithm called Spark-GHSOM, which exploits the Spark platform to process massive datasets in a distributed manner. Moreover, we leverage a method known as the distance hierarchy approach to modify the optimization function of GHSOM so that it can (also) coherently handle mixed-attribute datasets. We test our new method with respect to accuracy, scalability and descriptive power. The results obtained using different datasets demonstrate the superior predictive and descriptive capabilities of Spark-GHSOM, as well as its applicability to large-scale datasets which could not be analyzed before. (C) 2018 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available