论文标题

通过无监督的机器学习来识别天文图像中的异常值

Identifying outliers in astronomical images with unsupervised machine learning

论文作者

Han, Yang, Zou, Zhiqiang, Li, Nan, Chen, Yanli

论文摘要

天文学异常值,例如不寻常的,稀有或未知类型的天文学对象或现象,不断导致在天文学中发现真正不可预见的知识。原则上,将发现更多不可预测的异常值,而即将到来的调查数据的覆盖范围和质量的增加。但是,由于大量工作量,从人类检查中挖掘出巨大数据的稀有目标和意外目标是一个艰巨的挑战。监督学习也不适合此目的,因为为意外信号设计适当的培训集是不可行的。受这些挑战的促进,我们采用了无监督的机器学习方法来识别银河图像数据中的异常值,以探索检测天文学异常值的路径。为了进行比较,我们构建了三种方法,这些方法分别建立在K-nearthear的邻居(KNN),卷积自动编码器(CAE) + KNN和CAE + KNN +注意机制(ATTCAE KNN)上。测试集是根据在线发布的Galaxy Zoo图像数据创建的,以评估上述方法的性能。结果表明,Attcae KNN获得了最佳召回率(78%),比经典KNN方法高53%,比CAE+KNN高22%。用于完成相同任务的attcae knn(10分钟)(10分钟)的效率也优于KNN(4小时),等于CAE+KNN(10分钟)。因此,我们认为以无监督的方式检测星系图像数据中的天文异常值是可行的。接下来,我们将将ATTCAE KNN应用于可用的调查数据集,以评估其适用性和可靠性。

Astronomical outliers, such as unusual, rare or unknown types of astronomical objects or phenomena, constantly lead to the discovery of genuinely unforeseen knowledge in astronomy. More unpredictable outliers will be uncovered in principle with the increment of the coverage and quality of upcoming survey data. However, it is a severe challenge to mine rare and unexpected targets from enormous data with human inspection due to a significant workload. Supervised learning is also unsuitable for this purpose since designing proper training sets for unanticipated signals is unworkable. Motivated by these challenges, we adopt unsupervised machine learning approaches to identify outliers in the data of galaxy images to explore the paths for detecting astronomical outliers. For comparison, we construct three methods, which are built upon the k-nearest neighbors (KNN), Convolutional Auto-Encoder (CAE)+ KNN, and CAE + KNN + Attention Mechanism (attCAE KNN) separately. Testing sets are created based on the Galaxy Zoo image data published online to evaluate the performance of the above methods. Results show that attCAE KNN achieves the best recall (78%), which is 53% higher than the classical KNN method and 22% higher than CAE+KNN. The efficiency of attCAE KNN (10 minutes) is also superior to KNN (4 hours) and equal to CAE+KNN(10 minutes) for accomplishing the same task. Thus, we believe it is feasible to detect astronomical outliers in the data of galaxy images in an unsupervised manner. Next, we will apply attCAE KNN to available survey datasets to assess its applicability and reliability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源