论文标题

使用集合学习的光度红移辅助分类

Photometric redshift-aided classification using ensemble learning

论文作者

Cunha, P. A. C., Humphrey, A.

论文摘要

我们提出绵羊是一种新的机器学习方法,解决了天文源分类的经典问题,该方法结合了XGBoost,LightGBM和Catboost学习算法的输出,以创建更强大的分类器。绵羊首先估算了我们的管道中的新一步,即在执行分类之前,将其放入数据集中,作为分类模型训练的附加功能;这导致随后的分类性能有了显着改善。绵羊包含两个不同的分类方法:(i)多级和(ii)一个与元学习者进行校正的相对于所有方法。我们使用由SDSS组成的数据集和350万天文学来源的明智的光度法来展示绵羊的性能,用于恒星,星系和类星体的分类。所得的F1得分如下:星系为0.992;数量0.967;恒星为0.985。就三个类的F1分数而言,发现绵羊使用基本相同的数据集优于最近基于Rancomerforest的分类方法。我们的方法还促进了模型和数据集通过特征重要性来解释。它还允许选择其不确定分类的来源可能使它们成为随访观察的有趣来源。

We present SHEEP, a new machine learning approach to the classic problem of astronomical source classification, which combines the outputs from the XGBoost, LightGBM, and CatBoost learning algorithms to create stronger classifiers. A novel step in our pipeline is that prior to performing the classification, SHEEP first estimates photometric redshifts, which are then placed into the data set as an additional feature for classification model training; this results in significant improvements in the subsequent classification performance. SHEEP contains two distinct classification methodologies: (i) Multi-class and (ii) one versus all with correction by a meta-learner. We demonstrate the performance of SHEEP for the classification of stars, galaxies, and quasars using a data set composed of SDSS and WISE photometry of 3.5 million astronomical sources. The resulting F1-scores are as follows: 0.992 for galaxies; 0.967 for quasars; and 0.985 for stars. In terms of the F1-scores for the three classes, SHEEP is found to outperform a recent RandomForest-based classification approach using an essentially identical data set. Our methodology also facilitates model and data set explainability via feature importances; it also allows the selection of sources whose uncertain classifications may make them interesting sources for follow-up observations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源