论文标题
来自IA类超新星光曲线的数据驱动的光度红移估计
Data-driven photometric redshift estimation from type Ia supernovae light curves
论文作者
论文摘要
红移测量一直是现代天文学和宇宙学的持续需求。而且,由于新的调查已经提供了有关天文对象的大量数据,因此对这些数据进行处理的需求越来越越来越必要。在本文中,我们使用了来自黑暗能源调查的模拟数据,以及最初为超新星分类而创建的管道,我们开发了一种线性回归算法,该算法是通过新颖的自动化机器学习(AUTOML)框架优化的,当与其他现代算法(例如其他现代算法)(例如XGBOOST相比)时,实现错误得分比普通数据预处理方法更好。从数值上讲,IA型超新星事件的光度预测RMSE从0.16降低到0.09,所有超新星类型的RMSE从0.20降低到0.14。我们的管道由四个步骤组成:通过光谱数据点,我们使用高斯工艺拟合算法插入光曲线,然后使用小波变换,我们提取了此类曲线的最重要特征;在序列中,我们通过主组件分析降低了此类特征的维度,最后我们通过一个专门用于优化几种不同机器学习模型的参数的自动框架应用了超级学习技术(堆叠的集合方法),从而更好地解决了问题。作为最终检查,我们使用高斯内核密度估计通过预测由Automl训练和优化的50多个模型,从而获得了概率分布函数(PDF)。计算这些PDF以复制使用Salt2模型的原始曲线,该模型用于模拟原始数据本身。
Redshift measurement has always been a constant need in modern astronomy and cosmology. And as new surveys have been providing an immense amount of data on astronomical objects, the need to process such data automatically proves to be increasingly necessary. In this article, we use simulated data from the Dark Energy Survey, and from a pipeline originally created to classify supernovae, we developed a linear regression algorithm optimized through novel automated machine learning (AutoML) frameworks achieving an error score better than ordinary data pre-processing methods when compared with other modern algorithms (such as XGBOOST). Numerically, the photometric prediction RMSE of type Ia supernovae events was reduced from 0.16 to 0.09 and the RMSE of all supernovae types decreased from 0.20 to 0.14. Our pipeline consists of four steps: through spectroscopic data points we interpolate the light curve using Gaussian process fitting algorithm, then using a wavelet transform we extract the most important features of such curves; in sequence we reduce the dimensionality of such features through principal component analysis, and in the end we applied super learning techniques (stacked ensemble methods) through an AutoML framework dedicated to optimize the parameters of several different machine learning models, better resolving the problem. As a final check, we obtained probability distribution functions (PDFs) using Gaussian kernel density estimations through the predictions of more than 50 models trained and optimized by AutoML. Those PDFs were calculated to replicate the original curves that used SALT2 model, a model used for the simulation of the raw data itself.