☆ 4.7 Article

Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods

EXPERT SYSTEMS WITH APPLICATIONS (2024)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 237, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2023.121694

关键词

Missing values; Imbalanced classes; Machine learning; Consumer preference

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Consumer preference prediction aims to predict future purchases based on historical behavior-level data. However, missing values and imbalanced class problems often make machine learning algorithms ineffective. This study proposes an adaptive process for selecting the optimal combination of amputation, imputation, imbalance treatment, and classification based on classification performance.

Consumer preference prediction aims to predict consumers' future purchases based on their historical behavior-level data. Using machine learning algorithms, the prediction results provide evidence to conduct commercial activities and further improve consumer experiences. However, missing values and imbalanced class problems of consumer behavioral data always make machine learning algorithms ineffective. While several methods have been proposed to address missing data or imbalanced class problems, few works have considered the relation-ships among missing mechanisms, imputation algorithms, imbalanced class methods, and the effectiveness of classification algorithms that use impute data. In this study, we aim to propose an adaptive process for selecting the optimal combination of amputation, imputation, imbalance treatment, and classification based on classifi-cation performance. Our research extends the literature by showing significant interaction effects between 1) the amputation mechanism and imputation algorithms, 2) imputation and imbalance treatments, and 3) imbalance treatments and classification algorithms. Using three consumer behavioral datasets from the UCI Machine Learning Repository, we empirically show that, among different classification methods, the overall performance of Random Forest is better than that of Logit, SVM, or Decision Tree. Moreover, Logit, as the most widely used classification method, suffers most from imbalance issues in real-world datasets. Furthermore, Metacost is always the best imbalance treatment for different imputation techniques or missing value mechanisms.

Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文