4.7 Article

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods

期刊

SCIENCE OF THE TOTAL ENVIRONMENT
卷 624, 期 -, 页码 661-672

出版社

ELSEVIER
DOI: 10.1016/j.scitotenv.2017.12.152

关键词

Machine learning algorithms; Feature selection; Embedded methods; Wrapper methods; Groundwater; Nitrates

资金

  1. Spanish Ministerio de Economia, Industria y Competitividad [CGL2017-84739-R]
  2. FCT-MEC Post doctoral Grant [SFRH/BDP/110346/2015]

向作者/读者索取更多资源

Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SOS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SUS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce = 0.12 and AUC = 0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3 km buffer, ii) livestock farms rating by manure production within a 5 km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield. (C) 2017 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据