4.7 Article

Machine learning for transient discovery in Pan-STARRS1 difference imaging

期刊

出版社

OXFORD UNIV PRESS
DOI: 10.1093/mnras/stv292

关键词

methods: data analysis; methods: statistical; techniques: image processing; surveys; supernovae: general

资金

  1. National Aeronautics and Space Administration [NNX08AR22G]
  2. European Research Council under the European Union [291222]
  3. RCUK STFC [ST/I001123/1, ST/L000709/1]
  4. DEL
  5. EPSRC [EP/G034303/1, EP/J006238/1, EP/K004379/1, EP/H049606/1] Funding Source: UKRI
  6. STFC [ST/L000709/1, ST/I001123/1] Funding Source: UKRI
  7. Engineering and Physical Sciences Research Council [EP/H049606/1, EP/J006238/1, EP/G034303/1, EP/K004379/1] Funding Source: researchfish
  8. Science and Technology Facilities Council [ST/I001123/1, ST/L000709/1] Funding Source: researchfish
  9. Direct For Mathematical & Physical Scien
  10. Division Of Astronomical Sciences [1238877] Funding Source: National Science Foundation

向作者/读者索取更多资源

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据