4.8 Article

Soft and Declarative Fishing of Information in Big Data Lake

Journal

IEEE TRANSACTIONS ON FUZZY SYSTEMS
Volume 26, Issue 5, Pages 2732-2747

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TFUZZ.2018.2812157

Keywords

Big data; biomedical data analysis; cloud computing; declarative languages; fuzzy logic; querying

Funding

  1. Microsoft Research within Microsoft Azure
  2. Statutory Research Funds of Institute of Informatics, Silesian University of Technology, Gliwice, Poland [BK/213/RAU2/2018]

Ask authors/readers for more resources

In recent years, many fields that experience a sudden proliferation of data, which increases the volume of data that must be processed and the variety of formats the data is stored in have been identified. This causes pressure on existing compute infrastructures and data analysis methods, as more and more data are considered as a useful source of information for making critical decisions in particular fields. Among these fields exist several areas related to human life, e.g., various branches of medicine, where the uncertainty of data complicates the data analysis, and where the inclusion of fuzzy expert knowledge in data processing brings many advantages. In this paper, we show how fuzzy techniques can be incorporated in big data analytics carried out with the declarative U-SQL language over a big data lake located on the cloud. We define the concept of big data lake together with the Extract, Process, and Store process performed while schematizing and processing data from the Data Lake, and while storing results of the processing. Our solution, developed as a Fuzzy Search Library for Data Lake, introduces the possibility of massively parallel, declarative querying of big data lake with simple and complex fuzzy search criteria, using fuzzy linguistic terms in various data transformations, and fuzzy grouping. Presented ideas are exemplified by a distributed analysis of large volumes of biomedical data on Microsoft Azure cloud. Results of performed tests confirm that the presented solution is highly scalable on the Cloud and is a successful step toward soft and declarative processing of data on a large scale. The solution presented in this paper directly addresses three characteristics of big data, i.e., volume, variety, and velocity, and indirectly addresses, veracity and value.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available