☆ 4.7 Article

Relational data embeddings for feature enrichment with background information

MACHINE LEARNING (2023)

期刊

MACHINE LEARNING

卷 112, 期 2, 页码 687-720

出版社

SPRINGER

DOI: 10.1007/s10994-022-06277-7

关键词

Feature engineering; Feature enrichment; Knowledge graph embedding

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

For many machine learning tasks, improving performance requires augmenting the data table with features derived from external sources. This study proposes replacing manually crafted features with vector representations of entities, such as cities, that capture relevant information. The research shows the importance of modeling entity relationships and capturing numerical attributes for creating effective feature vectors from relational data.

For many machine-learning tasks, augmenting the data table at hand with features built from external sources is key to improving performance. For instance, estimating housing prices benefits from background information on the location, such as the population density or the average income. However, this information must often be assembled across many tables, requiring time and expertise from the data scientist. Instead, we propose to replace human-crafted features by vectorial representations of entities (e.g. cities) that capture the corresponding information. We represent the relational data on the entities as a graph and adapt graph-embedding methods to create feature vectors for each entity. We show that two technical ingredients are crucial: modeling well the different relationships between entities, and capturing numerical attributes. We adapt knowledge graph embedding methods that were primarily designed for graph completion. Yet, they model only discrete entities, while creating good feature vectors from relational data also requires capturing numerical attributes. For this, we introduce KEN: Knowledge Embedding with Numbers. We thoroughly evaluate approaches to enrich features with background information on 7 prediction tasks. We show that a good embedding model coupled with KEN can perform better than manually handcrafted features, while requiring much less human effort. It is also competitive with combinatorial feature engineering methods, but much more scalable. Our approach can be applied to huge databases, creating general-purpose feature vectors reusable in various downstream tasks.

Relational data embeddings for feature enrichment with background information

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Relational data embeddings for feature enrichment with background information

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文