4.7 Article

The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: A case study

期刊

ENERGY AND BUILDINGS
卷 268, 期 -, 页码 -

出版社

ELSEVIER SCIENCE SA
DOI: 10.1016/j.enbuild.2022.112204

关键词

Fairness; Generalizability; Accuracy; Data-driven Model; Building

向作者/读者索取更多资源

In recent years, the development and application of data-driven building models have been a hot research topic due to the massive data collected from buildings. This study proposes a sequentially balanced sampling (SBS) technique to address the issues of data volume variation and fairness. The performance of SBS is compared with four existing pre-processing techniques, showing comparable performance in accuracy and fairness improvement.
In recent years, massive data collected from buildings made development and application of data-driven building models is a hot research topic. Due to the variation of data volume in different conditions, existing data-driven building models (DDBMs) would present distinct accuracy for different users or periods. This may create further fairness problems. To solve these issues, balancing training dataset between different conditions using pre-processing techniques could help. In this study, a sequentially balanced sampling (SBS) technique is proposed. Its generalizability to improve fairness and preserve accuracy of DDBMs is compared with four existing pre-processing techniques-random sampling (RS), sequential sampling (SS), reversed preferential sampling (RPS), and sequential preferential sampling (SPS). Totally, 4960 cases are carried out to apply these pre-processing techniques to process training dataset before developing 4 types of classifiers for one-week ahead lighting status prediction of 155 lights in 16 apartments through a year. Note that the collected data show 5 distribution modes. The newly proposed SBS shows comparable performance to RPS. They significantly improve predictive accuracy for minority classes but decrease the accuracy for majority classes. On the other hand, SS and SPS show a slight accuracy improvement for minority classes with an acceptable price of accuracy decrease on majority classes. In terms of fairness improvement, SBS, RS, and RPS could effectively increase the recall rate. However, RS and RPS show more negative effect on accuracy rate and specificity rate. The results of this study provide guidance for researchers to select proper pre-processing techniques to improve the preferred predictive performance under different data distribution. (c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据