4.7 Article

Tree-based iterative input variable selection for hydrological modeling

Journal

WATER RESOURCES RESEARCH
Volume 49, Issue 7, Pages 4295-4310

Publisher

AMER GEOPHYSICAL UNION
DOI: 10.1002/wrcr.20339

Keywords

input variable selection; tree-based methods; streamflow prediction

Funding

  1. Singapore-Delft Water Alliance (SDWA) Multi-Objective Multiple-Reservoir Management research program [R-303-001-005-272]

Ask authors/readers for more resources

Input variable selection is an important issue associated with the development of several hydrological applications. Determining the optimal input vector from a large set of candidates to characterize a preselected output might result in a more accurate, parsimonious, and, possibly, physically interpretable model of the natural process. In the hydrological context, the modeled system often exhibits nonlinear dynamics and multiple interrelated variables. Moreover, the number of candidate inputs can be very large and redundant, especially when the model reproduces the spatial variability of the physical process. The ideal input selection algorithm should therefore provide modeling flexibility, computational efficiency in dealing with high dimension data set, scalability with respect to input dimensionality and minimum redundancy. In this paper, we propose the tree-based iterative input variable selection algorithm, a novel hybrid model-based/model-free approach specifically designed to fulfill these four requirements. The algorithm structure provides robustness against redundancy, while the tree-based nature of the underlying model ensures the other key properties. The approach is first tested on a well-known benchmark case study to validate its accuracy and subsequently applied to a real-world streamflow prediction problem in the upper Ticino River Basin (Switzerland). Results indicate that the algorithm is capable of selecting the most significant and nonredundant inputs in different testing conditions, including the real-world large data set characterized by the presence of several redundant variables. This permits one to identify a compact representation of the observational data set, which is key to improving the model performance and assisting with the interpretation of the underlying physical processes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available