☆ 4.8 Article

Soft and Declarative Fishing of Information in Big Data Lake

IEEE TRANSACTIONS ON FUZZY SYSTEMS (2018)

Journal

IEEE TRANSACTIONS ON FUZZY SYSTEMS

Volume 26, Issue 5, Pages 2732-2747

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TFUZZ.2018.2812157

Keywords

Big data; biomedical data analysis; cloud computing; declarative languages; fuzzy logic; querying

Funding

Microsoft Research within Microsoft Azure
Statutory Research Funds of Institute of Informatics, Silesian University of Technology, Gliwice, Poland [BK/213/RAU2/2018]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In recent years, many fields that experience a sudden proliferation of data, which increases the volume of data that must be processed and the variety of formats the data is stored in have been identified. This causes pressure on existing compute infrastructures and data analysis methods, as more and more data are considered as a useful source of information for making critical decisions in particular fields. Among these fields exist several areas related to human life, e.g., various branches of medicine, where the uncertainty of data complicates the data analysis, and where the inclusion of fuzzy expert knowledge in data processing brings many advantages. In this paper, we show how fuzzy techniques can be incorporated in big data analytics carried out with the declarative U-SQL language over a big data lake located on the cloud. We define the concept of big data lake together with the Extract, Process, and Store process performed while schematizing and processing data from the Data Lake, and while storing results of the processing. Our solution, developed as a Fuzzy Search Library for Data Lake, introduces the possibility of massively parallel, declarative querying of big data lake with simple and complex fuzzy search criteria, using fuzzy linguistic terms in various data transformations, and fuzzy grouping. Presented ideas are exemplified by a distributed analysis of large volumes of biomedical data on Microsoft Azure cloud. Results of performed tests confirm that the presented solution is highly scalable on the Cloud and is a successful step toward soft and declarative processing of data on a large scale. The solution presented in this paper directly addresses three characteristics of big data, i.e., volume, variety, and velocity, and indirectly addresses, veracity and value.

Soft and Declarative Fishing of Information in Big Data Lake

Journal

IEEE TRANSACTIONS ON FUZZY SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Soft and Declarative Fishing of Information in Big Data Lake

Journal

IEEE TRANSACTIONS ON FUZZY SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper