4.6 Article

Unsupervised star, galaxy, QSO classification: Application of HDBSCAN

期刊

ASTRONOMY & ASTROPHYSICS
卷 633, 期 -, 页码 -

出版社

EDP SCIENCES S A
DOI: 10.1051/0004-6361/201936648

关键词

stars: general; galaxies: general; galaxies: active; methods: data analysis; surveys

资金

  1. Swiss National Science Foundation
  2. ESO programme [179.A-2006, 179.A-2004, 177.A-3016, 177.A-3017, 177.A-3018]
  3. Alfred P. Sloan Foundation
  4. National Science Foundation
  5. U.S. Department of Energy Office of Science
  6. University of Arizona
  7. Brazilian Participation Group
  8. Brookhaven National Laboratory
  9. Carnegie Mellon University
  10. University of Florida
  11. French Participation Group
  12. German Participation Group
  13. Harvard University
  14. Instituto de Astrofisica de Canarias
  15. Michigan State/Notre Dame/JINA Participation Group
  16. Johns Hopkins University
  17. Lawrence Berkeley National Laboratory
  18. Max Planck Institute for Astrophysics
  19. Max Planck Institute for Extraterrestrial Physics
  20. New Mexico State University
  21. New York University
  22. Ohio State University
  23. Pennsylvania State University
  24. University of Portsmouth
  25. Princeton University
  26. Spanish Participation Group
  27. University of Tokyo
  28. University of Utah
  29. Vanderbilt University
  30. University of Virginia
  31. University of Washington
  32. Yale University
  33. NOVA grant
  34. NWO-M grant
  35. Department of Physics & Astronomy of the University of Padova
  36. Deutsche Forschungsgemeinschaft
  37. ERC
  38. Department of Physics of Univ. Federico II (Naples)
  39. National Aeronautics and Space Administration
  40. ESO Very Large Telescope, under the Large Programme [182.A-0886]
  41. STFC (UK)
  42. ARC (Australia)
  43. AAO
  44. NSF [AST-0607701, AST-0908246, AST-0908442, AST-0908354]
  45. NASA [Spitzer-1356708, 08-ADP08-0019, NNX09AC95G]
  46. STFC [ST/R000700/1] Funding Source: UKRI

向作者/读者索取更多资源

Context. Classification will be an important first step for upcoming surveys aimed at detecting billions of new sources, such as LSST and Euclid, as well as DESI, 4MOST, and MOONS. The application of traditional methods of model fitting and colour-colour selections will face significant computational constraints, while machine-learning methods offer a viable approach to tackle datasets of that volume. Aims. While supervised learning methods can prove very useful for classification tasks, the creation of representative and accurate training sets is a task that consumes a great deal of resources and time. We present a viable alternative using an unsupervised machine learning method to separate stars, galaxies and QSOs using photometric data. Methods. The heart of our work uses Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to find the star, galaxy, and QSO clusters in a multidimensional colour space. We optimized the hyperparameters and input attributes of three separate HDBSCAN runs, each to select a particular object class and, thus, treat the output of each separate run as a binary classifier. We subsequently consolidated the output to give our final classifications, optimized on the basis of their F1 scores. We explored the use of Random Forest and PCA as part of the pre-processing stage for feature selection and dimensionality reduction. Results. Using our dataset of similar to 50 000 spectroscopically labelled objects we obtain F1 scores of 98.9, 98.9, and 93.13 respectively for star, galaxy, and QSO selection using our unsupervised learning method. We find that careful attribute selection is a vital part of accurate classification with HDBSCAN. We applied our classification to a subset of the SDSS spectroscopic catalogue and demonstrated the potential of our approach in correcting misclassified spectra useful for DESI and 4MOST. Finally, we created a multiwavelength catalogue of 2.7 million sources using the KiDS, VIKING, and ALLWISE surveys and published corresponding classifications and photometric redshifts.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据