4.7 Article

The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: A case study

Journal

ENERGY AND BUILDINGS
Volume 268, Issue -, Pages -

Publisher

ELSEVIER SCIENCE SA
DOI: 10.1016/j.enbuild.2022.112204

Keywords

Fairness; Generalizability; Accuracy; Data-driven Model; Building

Ask authors/readers for more resources

In recent years, the development and application of data-driven building models have been a hot research topic due to the massive data collected from buildings. This study proposes a sequentially balanced sampling (SBS) technique to address the issues of data volume variation and fairness. The performance of SBS is compared with four existing pre-processing techniques, showing comparable performance in accuracy and fairness improvement.
In recent years, massive data collected from buildings made development and application of data-driven building models is a hot research topic. Due to the variation of data volume in different conditions, existing data-driven building models (DDBMs) would present distinct accuracy for different users or periods. This may create further fairness problems. To solve these issues, balancing training dataset between different conditions using pre-processing techniques could help. In this study, a sequentially balanced sampling (SBS) technique is proposed. Its generalizability to improve fairness and preserve accuracy of DDBMs is compared with four existing pre-processing techniques-random sampling (RS), sequential sampling (SS), reversed preferential sampling (RPS), and sequential preferential sampling (SPS). Totally, 4960 cases are carried out to apply these pre-processing techniques to process training dataset before developing 4 types of classifiers for one-week ahead lighting status prediction of 155 lights in 16 apartments through a year. Note that the collected data show 5 distribution modes. The newly proposed SBS shows comparable performance to RPS. They significantly improve predictive accuracy for minority classes but decrease the accuracy for majority classes. On the other hand, SS and SPS show a slight accuracy improvement for minority classes with an acceptable price of accuracy decrease on majority classes. In terms of fairness improvement, SBS, RS, and RPS could effectively increase the recall rate. However, RS and RPS show more negative effect on accuracy rate and specificity rate. The results of this study provide guidance for researchers to select proper pre-processing techniques to improve the preferred predictive performance under different data distribution. (c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available