4.5 Article

Detecting coherent explorations in SQL workloads

Journal

INFORMATION SYSTEMS
Volume 92, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.is.2019.101479

Keywords

-

Ask authors/readers for more resources

This paper presents a proposal aiming at better understanding a workload of SQL queries and detecting coherent explorations hidden within the workload. In particular, our work investigates SQLShare (Jain et al., 2016), a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community. According to the authors of (Jain et al., 2016), this workload is the only one containing primarily ad-hoc handwritten queries over user-uploaded datasets. We analyzed this workload by extracting features that characterize SQL queries and we investigate three different machine learning approaches to use these features to separate sequences of SQL queries into meaningful explorations. The first approach is unsupervised and based only on similarity between contiguous queries. The second approach uses transfer learning to apply a model trained over a dataset where ground truth is available. The last approach uses weak labeling to predict the most probable segmentation from heuristics meant to label a training set. We ran several tests over various query workloads to evaluate and compare the proposed methods. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available