4.7 Article

GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery

期刊

IEEE TRANSACTIONS ON BIG DATA
卷 9, 期 5, 页码 1347-1364

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2023.3277710

关键词

Air pollution; Data models; Atmospheric modeling; Monitoring; Training; Convolutional neural networks; Big Data; Air pollution data; graph convolutional network; transfer learning; automatic; missing data pattern identification; missing data pattern recovery; similarity matrix; spatial-temporal

向作者/读者索取更多资源

GCN-ST-MDIR is a Graph Convolutional Network-based framework for Missing Data Pattern Identification and Recovery (MDIR), which identifies daily missing data patterns and selects the best recovery method automatically. It improves data representation for MDIR using a new graph construction and domain-specific knowledge. The model achieves better recovery performance compared to baselines, with an accuracy of 88.48% for general missing data recovery.
Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据