3.8 Proceedings Paper

Automated Data Science for Relational Data

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/ICDE51399.2021.00305

Keywords

-

Ask authors/readers for more resources

Feature engineering is a crucial but time-consuming task in data science projects, requiring up to 80% of the total time. The OneBM system demonstrated in this work helps data scientists increase efficiency by automating relational data feature engineering tasks, saving time and labor costs.
Feature engineering is a crucial but tedious task that requires up to 80% of the total time in data science projects. A significant challenge is when data consists of tables from different data sources, thus data scientists need to wisely aggregate and join tables while performing feature engineering task. In this work, we demonstrate a novel system called OneBM (One Button Machine), that enables data scientists to increase their efficiency with automated feature engineering for relational data. OneBM takes as input a relational dataset with multiple tables and its entity relation diagram (ERD) which can be declared with a novel, easy-to-use drag-and-drop graphical user interface. The system then automatically identifies and executes relevant joins and aggregates in the data, and generates new features with a rich set of transformations for various types of data including but not limited to time-series, sequences, number sets and itemsets, etc. The generated features then can be used by automated model selection and hyper-parameter optimization algorithms to complete a fully end-to-end automated data science (or AutoDS) workflow. A follow-up user evaluation illustrated how data scientists can perform multi-table feature engineering tasks in minutes using our system, compared to repeatedly coding SQL-like queries to transform and aggregate relational data requiring weeks of manual labor for comparable performance. In the live demos we plan to show two use cases with real-world datasets (video demos are available at the links in the footnote): sale prediction(1) and call center user experience(2). Pre-registered partcipants can play with these use-cases and the given datasets via Watson Studio on the cloud.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available