4.7 Article

Estimation of missing values in astronomical survey data: An improved local approach using cluster directed neighbor selection

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2022.102881

关键词

Astronomy; Sky survey; Missing value; Imputation; Clustering

向作者/读者索取更多资源

This paper aims to develop new imputation methods to handle missing values in astronomical data analysis, particularly in the classification of transient events in a sky survey. The proposed Iterative-CKNN and Iterative-CLLS models extend the cluster directed selection of neighbors framework and achieve better performance than baseline models and Bayesian Principal Component Analysis. These methods have practical implications for classifying transients.
The work presented in this paper aims to develop new imputation methods to better handle missing values encountered in astronomical data analysis, especially the classification of transient events in a sky survey from the Gravitational wave Optical Transient Observatory (GOTO) project. In particular, the framework of cluster directed selection of neighbors that has proven effective for benchmark local imputation techniques of KNNimpute and LLSimpute are extended to new multi-stage models. The proposed models, namely Iterative-CKNN and Iterative-CLLS, are novel with an original application to analyze sky survey data. They bring out advantages from both local approaches, where estimates are summarized from neighbors in the same data cluster, within the iterative process to refine previous guesses. Based on experiments with simulated datasets corresponding to different survey sizes and missing rations between 1 to 20%, they usually outperform baseline models and Bayesian Principal Component Analysis (BPCA), which is the well-known global technique. For instance, at 10% missing rate, Iterative-CLLS appears to be the most accurate with NRMSE score of 0.190, while BPCA and the best among its baseline models reaches 0.351 and 0.249, respectively. For their practical implications, these methods have proven to be effective for classifying transients, using common algorithms like KNN, Naive Bayes and Random Forest.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据