4.7 Article

Comparing Multiclass, Binary, and Hierarchical Machine Learning Classification schemes for variable stars

期刊

出版社

OXFORD UNIV PRESS
DOI: 10.1093/mnras/stz1999

关键词

methods: data analysis; methods: statistical; stars: variables: general

资金

  1. UK Newton Fund as part of the Development in Africa with Radio Astronomy (DARA) Big Data project
  2. European Research Council (ERC) under the European Union [694745]
  3. Imperial President's PhD Scholarship
  4. STFC [ST/R001898/1, ST/T001399/1, ST/P000649/1] Funding Source: UKRI

向作者/读者索取更多资源

Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Survey (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use information theory for feature selection and evaluation. We apply three machine learning algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS data set, we find that the random forest classifier performs best in terms of balanced accuracy and geometric means. We demonstrate substantially improved classification results by converting the multiclass problem into a binary classification task, achieving a balanced-accuracy rate of similar to 99percent for the classification of delta Scuti and anomalous Cepheids. Additionally, we describe how classification performance can be improved via converting a flat multiclass' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of Cepheids, RRLyrae, and eclipsing binary stars in CRTS data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据