Journal
ENVIRONMENTAL MODELLING & SOFTWARE
Volume 71, Issue -, Pages 78-96Publisher
ELSEVIER SCI LTD
DOI: 10.1016/j.envsoft.2015.05.013
Keywords
Artificial neural networks; Data-driven models; Partial mutual information; Kernel density estimation; Kernel bandwidth; Boundary issues; Hydrology and water resources; Input variable selection
Ask authors/readers for more resources
Input variable selection (IVS) is vital in the development of data-driven models. Among different IVS methods, partial mutual information (PMI) has shown significant promise, although its performance has been found to deteriorate for non-Gaussian and non-linear data. In this paper, the effectiveness of different approaches to improving PMI performance is investigated, focussing on boundary issues associated with bandwidth estimation. Boundary issues, associated with kernel-based density and residual computations within PMI, arise from the extension of symmetrical kernels beyond the feasible bounds of potential inputs, and result in an underestimation of kernel-based marginal and joint probability distribution functions in the PMI. In total, the effectiveness of 16 different approaches is tested on synthetically generated data and the results are used to develop preliminary guidelines for PMI IVS. By using the proposed guidelines, the correct inputs can be identified in 100% of trials, even if the data are highly non-linear or non-Gaussian. (C) 2015 Elsevier Ltd. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available