☆ 4.7 Article

A scalable robust and automatic propositionalization approach for Bayesian classification of large mixed numerical and categorical data

MACHINE LEARNING (2019)

Journal

MACHINE LEARNING

Volume 108, Issue 2, Pages 229-266

Publisher

SPRINGER

DOI: 10.1007/s10994-018-5746-9

Keywords

Relational data mining; Propositionalization; Feature construction; Regularization; Supervised classification

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Companies want to extract value from their relational databases. This is the aim of relational data mining. Propositionalization is one possible approach to relational data mining. Propositionalization adds new attributes, called features, to the main table, leading to an attribute-value representation, a single table, on which a propositional learner can be applied. However, current relational databases are large and composed of mixed, numerical and categorical, data. Moreover, the specificity of relational data is to involve one-to-many relationships. As an example of such data, consider customers purchasing products: each customer can purchase several products. Therefore, there is a need for techniques able to learn complex aggregates. Learning such features means to explore a combinatorial, possibly infinite, space and such an approach is prone to overfitting. We introduce a propositionalization approach dedicated to a robust Bayesian classifier. It efficiently samples a given number of features in the language bias, following a distribution over the complex aggregates. This distribution is also used to penalize complex aggregates in the regularization of the robust Bayesian classifier. Experiments show that it performs better than state-of-the-art methods on most investigated benchmarks and can deal with large datasets more easily. A new real, large, mixed relational dataset is introduced which confirms the ability of our approach to learn complex aggregates.

A scalable robust and automatic propositionalization approach for Bayesian classification of large mixed numerical and categorical data

Journal

MACHINE LEARNING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A scalable robust and automatic propositionalization approach for Bayesian classification of large mixed numerical and categorical data

Journal

MACHINE LEARNING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper