Journal
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
Volume -, Issue -, Pages 1951-1966Publisher
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3318464.3389726
Keywords
data lakes; table search; interactive data science; notebooks
Categories
Funding
- NSF [III-1910108, ACI-1547360]
- NIH [1U01EB020954]
Ask authors/readers for more resources
Many modern data science applications build on data lakes, schema-agnostic repositories of data files and data products that offer limited organization and management capabilities. There is a need to build data lake search capabilities into data science environments, so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. We develop search and management solutions for the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available