4.7 Article

A scalable and flexible basket analysis system for big transaction data in Spark

Related references

Note: Only part of the references are listed.
Article Computer Science, Information Systems

Optimized hadoop map reduce system for strong analytics of cloud big product data on amazon web service

Shengying Yang et al.

Summary: Due to the rapid increase of data in AWS cloud, traditional methods of data analysis are not suitable. Non-traditional methods, such as concurrent/parallel techniques, have been proposed by data scientists to meet the performance and scalability requirements of big data analyses. This paper utilizes the Hadoop Map Reduce system, combined with five efficient data mining algorithms, to perform strong analytics on cloud big data. The proposed system is applied to product review data from AWS cloud, and is evaluated using important benchmarks and metrics. The experiments show that FCNB is effective in addressing the problem of big data.

INFORMATION PROCESSING & MANAGEMENT (2023)

Article Computer Science, Artificial Intelligence

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

Xudong Sun et al.

Summary: Distributed computing frameworks are essential for efficient processing of big data. The current MapReduce model is inadequate for complex analysis tasks on terabytes of data. New frameworks are needed to overcome these challenges.

BIG DATA MINING AND ANALYTICS (2023)

Article Computer Science, Information Systems

Approximate Clustering Ensemble Method for Big Data

Mohammad Sultan Mahmud et al.

Summary: This paper proposes a distributed computing framework to tackle the challenging task of clustering a big distributed dataset. The approach uses multiple random samples to compute an ensemble result as an estimation of the true result of the dataset. The framework proves to be efficient and scalable in clustering big datasets.

IEEE TRANSACTIONS ON BIG DATA (2023)

Article Computer Science, Information Systems

An Intelligent Cognitive-Inspired Computing with Big Data Analytics Framework for Sentiment Analysis and Classification

Deepak Kumar Jain et al.

Summary: Advancements in networking and information technology have led to the rise of Big Data Analytics (BDA) due to the exponential data generated by people in their daily lives. Cognitive computing, an AI-based system, helps in reducing issues in BDA. Sentiment Analysis (SA) is employed to understand linguistic tweets and extract features, while Binary Brain Storm Optimization (BBSO) and Fuzzy Cognitive Maps (FCMs) are used for feature selection and classification in the proposed model, showing improved performance on benchmark datasets.

INFORMATION PROCESSING & MANAGEMENT (2022)

Article Education, Scientific Disciplines

Validity of a Market Basket Assessment Tool for Use in Supplemental Nutrition Assistance Program Education Healthy Retail Initiatives

Valisa E. Hedrick et al.

Summary: This study assessed the validity of the Market Basket Analysis Tool (MBAT) for measuring food environment quality in various retail settings compared to the Nutrition Environment Measures Survey in Stores (NEMS-S). The results showed significant correlations between MBAT and NEMS-S scores. The study suggests that MBAT offers a more streamlined data collection process and shorter training time compared to NEMS-S.

JOURNAL OF NUTRITION EDUCATION AND BEHAVIOR (2022)

Article Computer Science, Information Systems

PartEclat: an improved Eclat-based frequent itemset mining algorithm on spark clusters using partition technique

Shashi Raj et al.

Summary: Frequent itemset mining is a prominent technique for extracting knowledge from transactional databases, but efficiency becomes an issue with large datasets. Cluster-based FIM algorithms are chosen to address scalability, and Spark-based adaptations aim for efficiency with in-memory processing capabilities. However, challenges such as communication overhead persist despite Spark's advantages.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2022)

Proceedings Paper Computer Science, Information Systems

Parallel Rule Discovery from Large Datasets by Sampling

Wenfei Fan et al.

Summary: This paper proposes a multi-round sampling strategy for rule discovery in large datasets to ensure the accuracy and extractability of rules through precision and recall rates. To improve recall, a tableau method is used to recover constant patterns, and deep Q-learning is used to select semantically relevant predicates.

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) (2022)

Article Statistics & Probability

Market basket analysis with association rules

Yuksel Akay Unvan

Summary: This study conducted a Market Basket Analysis using Association Rules on supermarket sales data, revealing that customers who buy Milk, Sweet Relish, and Pepperoni Pizza also purchase eggs, with 24 customers in the dataset fitting this rule.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS (2021)

Article Computer Science, Hardware & Architecture

A Spark-based Apriori algorithm with reduced shuffle overhead

Shashi Raj et al.

Summary: This paper introduces a Spark-based Apriori algorithm called SARSO, which improves efficiency by reducing shuffle overhead caused by RDD operations. The method restricts the movement of key-value pairs across cluster nodes, reducing necessary communication and synchronization overhead incurred by the Spark shuffle operation.

JOURNAL OF SUPERCOMPUTING (2021)

Article Computer Science, Information Systems

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Yimin Mao et al.

Summary: This study presents an optimized parallel frequent itemset mining algorithm PFIMD based on MapReduce, addressing the time and space complexity as well as load balancing issues in existing parallel FIM algorithms. With the adoption of DiffNodeset structure and a 2-way comparison strategy, the algorithm's execution speed is accelerated. The algorithm is parallelized using the cloud computing platform Hadoop and the programming model MapReduce.

MULTIMEDIA SYSTEMS (2021)

Article Computer Science, Artificial Intelligence

A Survey of Utility-Oriented Pattern Mining

Wensheng Gan et al.

Summary: The main purpose of data mining and analytics is to discover novel and potentially useful patterns. Utility-oriented pattern mining (UPM) has become increasingly important in various applications. This survey provides an overview of state-of-the-art methods for UPM, including techniques, applications, and challenges in the field.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

Article Computer Science, Theory & Methods

HBPFP-DC: A parallel frequent itemset mining using Spark

Yaling Xun et al.

Summary: The study introduces a HBPFP-DC algorithm on the Spark platform for efficient and scalable frequent itemset mining, utilizing a balanced grouping of computation tasks and considering data correlation to boost mining efficiency.

PARALLEL COMPUTING (2021)

Article Computer Science, Information Systems

User-Defined SWOT analysis-A change mining perspective on user-generated content

Li-Chen Cheng et al.

Summary: This study proposes a change mining framework to address the key shortcomings of traditional SWOT analysis based on consumer sentiment calculated from product reviews. The approach not only answers the four key shortcomings of traditional SWOT analysis, but also offers additional opportunities such as trend monitoring and flexibility in framing SWOT factors based on desired time periods. Further managerial insights are provided.

INFORMATION PROCESSING & MANAGEMENT (2021)

Article Computer Science, Information Systems

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Peng-Yu Huang et al.

Summary: In recent years, knowledge discovery in databases has provided powerful capabilities for discovering meaningful information, leading to a focus on distributed data mining as an important research area. The proposed algorithms based on FP growth offer fast and scalable service in distributed computing environments, showing superior cost-effectiveness and performance.

IEEE ACCESS (2021)

Article Computer Science, Information Systems

A New Approximate Method For Mining Frequent Itemsets From Big Data *

Timur Valiullin et al.

Summary: Frequent itemsets mining is a critical step in finding association rules from transaction databases, and various efficient algorithms have been proposed for this task.

COMPUTER SCIENCE AND INFORMATION SYSTEMS (2021)

Article Computer Science, Information Systems

An extensive study on the evolution of context-aware personalized travel recommender systems

Shini Renjith et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Theory & Methods

Map-optimize-reduce: CAN tree assisted FP-growth algorithm for clusters based FP mining on Hadoop

J. Ragaventhiran et al.

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2020)

Article Computer Science, Information Systems

Heuristics for interesting class association rule mining a colorectal cancer database

Jose A. Delgado-Osuna et al.

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Artificial Intelligence

EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data

Shashi Raj et al.

KNOWLEDGE AND INFORMATION SYSTEMS (2020)

Article Computer Science, Information Systems

A market basket analysis of the US auto-repair industry

Hilde Patron et al.

JOURNAL OF BUSINESS ANALYTICS (2020)

Article Computer Science, Artificial Intelligence

A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis

Mohammad Sultan Mahmud et al.

BIG DATA MINING AND ANALYTICS (2020)

Article Computer Science, Information Systems

Exploiting GPU and cluster parallelism in single scan frequent itemset mining

Youcef Djenouri et al.

INFORMATION SCIENCES (2019)

Review Computer Science, Artificial Intelligence

Frequent itemset mining: A 25 years review

Jose Maria Luna et al.

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY (2019)

Article Automation & Control Systems

Random Sample Partition: A Distributed Data Model for Big Data Analysis

Salman Salloum et al.

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS (2019)

Article Computer Science, Theory & Methods

Large-scale e-learning recommender system based on Spark and Hadoop

Karim Dandouh et al.

JOURNAL OF BIG DATA (2019)

Article Computer Science, Information Systems

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Kang-Wook Chon et al.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2018)

Article Computer Science, Information Systems

A survey towards an integration of big data analytics to big insights for value-creation

Mandeep Kaur Saggi et al.

INFORMATION PROCESSING & MANAGEMENT (2018)

Article Computer Science, Information Systems

Recommendation With Social Roles

Dugang Liu et al.

IEEE ACCESS (2018)

Article Computer Science, Information Systems

Frequent Itemset Mining in Big Data With Effective Single Scan Algorithms

Youcef Djenouri et al.

IEEE ACCESS (2018)

Article Computer Science, Theory & Methods

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Yaling Xun et al.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2017)

Article Computer Science, Artificial Intelligence

An efficient algorithm for mining high utility patterns from incremental databases with one database scan

Unil Yun et al.

KNOWLEDGE-BASED SYSTEMS (2017)

Proceedings Paper Computer Science, Interdisciplinary Applications

A Parallel FP-growth Algorithm Based on GPU

Hao Jiang et al.

2017 IEEE 14TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2017) (2017)

Article Computer Science, Information Systems

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Feng Zhang et al.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2015)

Article Biochemical Research Methods

A primer to frequent itemset mining for bioinformatics

Stefan Naulaerts et al.

BRIEFINGS IN BIOINFORMATICS (2015)

Article Computer Science, Information Systems

Two scalable algorithms for associative text classification

Yongwook Yoon et al.

INFORMATION PROCESSING & MANAGEMENT (2013)

Article Computer Science, Information Systems

MapReduce indexing strategies: Studying scalability and efficiency

Richard McCreadie et al.

INFORMATION PROCESSING & MANAGEMENT (2012)

Article Computer Science, Artificial Intelligence

Isolated items discarding strategy for discovering high utility itemsets

Yu-Chiang Li et al.

DATA & KNOWLEDGE ENGINEERING (2008)