4.7 Article

Creating CREATE queries with multi-task deep neural networks

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 266, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2023.110416

Keywords

Natural language processing; Deep neural networks; Transfer learning; Multi -task learning; Database schema creation

Ask authors/readers for more resources

This paper introduces a new application for text-to-SQL studies where users can create database models from natural language, and presents the first dataset for this task. The authors propose a framework consisting of three modular components to predict column data types and constraints, establish foreign key relationships between tables, and generate CREATE queries. They evaluate various baseline models and demonstrate the importance of contextualized word representations in classifying column data types and constraints. The experiments also show that a multi-task BERT model effectively addresses the training time and model size issues.
Text-to-SQL is the task of mapping natural language utterances to structured query language (SQL). Prior studies focus on information retrieval aspect of this task. In this paper, we demonstrate a new use case for the text-to-SQL studies where a user can create database models from natural language and introduce the first dataset for this task. Furthermore, we propose a framework that consists of three modular components: (1) classifier component which predicts the data type and constraints of a column, (2) constraint component which establishes foreign key relationships between tables, (3) query component which generates a series of CREATE queries through a slot-filling approach. We propose various baseline models to evaluate the classifier component in different aspects. Each model is based on a state-of-the-art pre-trained language model that allows us to assess contextualized word representations in the table creation task. The obtained results showed that such representations play a vital role in classifying column data types and constraints correctly. One of the downsides of pre-trained models is the training time and the model size. Our experiments revealed that a multi-task BERT model achieving 75% and 96% accuracy for the data type and constraint prediction tasks, respectively, effectively addresses both problems. (c) 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available