期刊
METHODS IN ECOLOGY AND EVOLUTION
卷 3, 期 1, 页码 116-128出版社
WILEY
DOI: 10.1111/j.2041-210X.2011.00124.x
关键词
benthic macroinvertebrates; diversity; fish; generalized additive models; richness; spatial autocorrelation; streams
类别
资金
- Smithsonian Institution
- US Environmental Protection Agency National Center for Environmental Research (NCER) [R831369]
- Interdisciplinary Center for Clinical Research (IZKF) at the University Hospital of the University of Erlangen-Nuremberg [J11]
- EPA [908959, R831369] Funding Source: Federal RePORTER
1. Issues with ecological data (e.g. non-normality of errors, nonlinear relationships and autocorrelation of variables) and modelling (e.g. overfitting, variable selection and prediction) complicate regression analyses in ecology. Flexible models, such as generalized additive models (GAMs), can address data issues, and machine learning techniques (e.g. gradient boosting) can help resolve modelling issues. Gradient boosted GAMs do both. Here, we illustrate the advantages of this technique using data on benthic macroinvertebrates and fish from 1573 small streams in Maryland, USA. 2. We assembled a predictor matrix of 15 watershed attributes (e. g. ecoregion and land use), 15 stream attributes (e. g. width and habitat quality) and location (latitude and longitude). We built boosted and conventionally estimated GAMs for macroinvertebrate richness and for the relative abundances of macroinvertebrates in the Orders Ephemeroptera, Plecoptera and Trichoptera (% EPT); individuals that cling to substrate (% Clingers); and individuals in the collector/gatherer functional feeding group (% Collectors). For fish, models were constructed for taxonomic richness, benthic species richness, biomass and the relative abundance of tolerant individuals (% Tolerant Fish). 3. For several of the responses, boosted GAMs had lower pseudo R-s(2) than conventional GAMs for in-sample data but larger pseudo R-s(2) for out-of-bootstrap data, suggesting boosted GAMs do not overfit the data and have higher prediction accuracy than conventional GAMs. The models explained most variation in fish richness (pseudo R-2 = 0 97), least variation in % Clingers (pseudo R-2 = 0 28) and intermediate amounts of variation in the other responses (pseudo R(2)s between 0.41 and 0.60). Many relationships of macroinvertebrate responses to anthropogenic measures and natural watershed attributes were nonlinear. Fish responses were related to system size and local habitat quality. 4. For impervious surface, models predicted below model-average macroinvertebrate richness at levels above c.3 0%, lower % EPT above c. 1 5%, and lower % Clingers for levels above c.2 0%. Impervious surface did not affect% Collectors or any fish response. Prediction functions for% EPT and fish richness increased linearly with log(10) (watershed area), % Tolerant Fish decreased with log(10) (watershed area), and benthic fish richness and biomass both increased nonlinearly with log(10) (watershed area). 5. Gradient boosting optimizes the predictive accuracy of GAMs while preserving the structure of conventional GAMs, so that predictor-response relationships are more interpretable than with other machine learning methods. Boosting also avoids overfitting the data (by shrinking effect estimates towards zero and by performing variable selection), thus avoiding spurious predictor effects
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据