论文标题

关于天文光谱数据的数据挖掘技术。 I:聚类分析

Data mining techniques on astronomical spectra data. I : Clustering Analysis

论文作者

Yang, Haifeng, Shi, Chenhui, Cai, Jianghui, Zhou, Lichan, Yang, Yuqing, Zhao, Xujun, He, Yanting, Hao, Jing

论文摘要

聚类是天文光谱分析的有效工具,可以在数据之间开采聚类模式。随着大型天空调查的实施,已经应用了许多聚类方法来有效,自动地处理光谱和光度计数据。同时,不同数据特征下聚类方法的性能变化很大。为了总结天文光谱聚类算法并为进一步研究奠定基础,这项工作综述了对三个部分应用于天文学光谱数据的聚类方法的回顾。首先,研究和分析了许多用于天文光谱的聚类方法,并研究算法思想,应用和特征。其次,在使用三个标准(光谱数据类型,光谱质量和数据量)构建的统一数据集上进行实验,以比较典型算法的性能;光谱数据是从大型天空区域多对象纤维光谱望远镜(Lamost)调查和斯隆数字天空调查(SDSS)中选择的。最后,GitHub提供了比较聚类算法和用于使用和改进的手册的源代码。

Clustering is an effective tool for astronomical spectral analysis, to mine clustering patterns among data. With the implementation of large sky surveys, many clustering methods have been applied to tackle spectroscopic and photometric data effectively and automatically. Meanwhile, the performance of clustering methods under different data characteristics varies greatly. With the aim of summarizing astronomical spectral clustering algorithms and laying the foundation for further research, this work gives a review of clustering methods applied to astronomical spectra data in three parts. First, many clustering methods for astronomical spectra are investigated and analysed theoretically, looking at algorithmic ideas, applications, and features. Secondly, experiments are carried out on unified datasets constructed using three criteria (spectra data type, spectra quality, and data volume) to compare the performance of typical algorithms; spectra data are selected from the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) survey and Sloan Digital Sky Survey (SDSS). Finally, source codes of the comparison clustering algorithms and manuals for usage and improvement are provided on GitHub.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源