4.7 Article

Improving partial mutual information-based input variable selection by consideration of boundary issues associated with bandwidth estimation

Journal

ENVIRONMENTAL MODELLING & SOFTWARE
Volume 71, Issue -, Pages 78-96

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.envsoft.2015.05.013

Keywords

Artificial neural networks; Data-driven models; Partial mutual information; Kernel density estimation; Kernel bandwidth; Boundary issues; Hydrology and water resources; Input variable selection

Ask authors/readers for more resources

Input variable selection (IVS) is vital in the development of data-driven models. Among different IVS methods, partial mutual information (PMI) has shown significant promise, although its performance has been found to deteriorate for non-Gaussian and non-linear data. In this paper, the effectiveness of different approaches to improving PMI performance is investigated, focussing on boundary issues associated with bandwidth estimation. Boundary issues, associated with kernel-based density and residual computations within PMI, arise from the extension of symmetrical kernels beyond the feasible bounds of potential inputs, and result in an underestimation of kernel-based marginal and joint probability distribution functions in the PMI. In total, the effectiveness of 16 different approaches is tested on synthetically generated data and the results are used to develop preliminary guidelines for PMI IVS. By using the proposed guidelines, the correct inputs can be identified in 100% of trials, even if the data are highly non-linear or non-Gaussian. (C) 2015 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available