4.6 Article

Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression

Journal

IEEE ACCESS
Volume 9, Issue -, Pages 135675-135688

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3115848

Keywords

Feature extraction; Anomaly detection; Optimization; Machine learning algorithms; Task analysis; Evolutionary computation; Water pollution; Outlier detection; feature selection; evolutionary computation; multi-objective optimization; underground water contamination

Funding

  1. Spanish Ministry of Science, Innovation and Universities through the SITUS Project [RTI2018-094832-B-I00]
  2. European Fund for Regional Development (EFRD, FEDER)
  3. Emilia-Romagna (Italy) Regional Project 'New Mathematical and Computer Science Methods for the Water and Food Resources Exploitation Optimization'
  4. University of Ferrara under the FIR Program through the Project 'Arti~cial Intelligence for Improving the Exploitation of Water and Food Resources'

Ask authors/readers for more resources

This paper introduces a comprehensive optimization model for detecting the causes of contamination in underground water wells, addressing the issues of selecting the best predicting variables and detecting outliers simultaneously. The results demonstrate that the proposed model can generate reliable, interpretable, and clean regression models.
When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available