论文标题
用卷积神经网络收集Lyα森林
Harvesting the Lyα forest with convolutional neural networks
论文作者
论文摘要
我们使用卷积神经网络(CNN)开发基于机器学习的算法,以识别低HI列密度LY $α$吸收系统($ \ log {n _ {n _ {\ Mathrm {hi}}}}}/{\ rm cm cm cm cm}^{ - 2}^{ - 2} <17 $)在ly $ $ y $ $ y $ al $ al $α$中,并预测他们的属性列表,并预测他们的promelities ly promelities topleties tolubly promelities tofteries thue thue promelities tobles lye promenties thue Promelties thu ($ \ log {n} _ {\ Mathrm {hi}}/{\ rm cm}^{ - 2} $),redshift($ z _ {\ m mathrm {hi}}} $)和doppler width($ b _ {$ b _ {\ m m i {\ m m i {hi}} $)。我们的CNN型号是使用模拟光谱(S/N $ \ simeq10 $)培训的,我们在Redshift $ Z \ sim2.5-2.9 $上测试了它们在keck I望远镜上高分辨率echelle Spectrimeter上观察到的高质量谱。我们发现,我们算法确定的系统的$ \ sim78 \%$在“手动Voigt配置文件拟合目录”中列出。我们证明,对于所有模拟和观察到的S/N $ \ gtrsim10 $的模拟和观察到的光谱,我们的CNN的性能是稳定且一致的。因此,我们的模型可以始终如一地用于分析与当前和将来的设施可用的大量低和高S/N数据。我们的CNN在范围内提供了最先进的预测,$ 12.5 \ leq \ log {n _ {\ Mathrm {hi}}}}}}}}/\ Mathrm {cm^{ - 2}} <15.5 $,带有$Δ(\ log { cm}^{ - 2})= 0.13 $,$Δ(z _ {\ mathrm {hi}})= 2.7 \ times {10}^{ - 5} $,和$Δ(b _ {\ \ m mathrm {hi}}} CNN预测每次频谱每款$ <3 $分钟,使用笔记本电脑的大小为120 \,000像素。我们证明,CNN可以显着提高分析$α$森林光谱的效率,从而大大提高LY $α$吸收器的统计数据。
We develop a machine learning based algorithm using a convolutional neural network (CNN) to identify low HI column density Ly$α$ absorption systems ($\log{N_{\mathrm{HI}}}/{\rm cm}^{-2}<17$) in the Ly$α$ forest, and predict their physical properties, such as their HI column density ($\log{N}_{\mathrm{HI}}/{\rm cm}^{-2}$), redshift ($z_{\mathrm{HI}}$), and Doppler width ($b_{\mathrm{HI}}$). Our CNN models are trained using simulated spectra (S/N $\simeq10$), and we test their performance on high quality spectra of quasars at redshift $z\sim2.5-2.9$ observed with the High Resolution Echelle Spectrometer on the Keck I telescope. We find that $\sim78\%$ of the systems identified by our algorithm are listed in the manual Voigt profile fitting catalogue. We demonstrate that the performance of our CNN is stable and consistent for all simulated and observed spectra with S/N $\gtrsim10$. Our model can therefore be consistently used to analyse the enormous number of both low and high S/N data available with current and future facilities. Our CNN provides state-of-the-art predictions within the range $12.5\leq\log{N_{\mathrm{HI}}}/\mathrm{cm^{-2}}<15.5$ with a mean absolute error of $Δ(\log{N}_{\mathrm{HI}}/{\rm cm}^{-2})=0.13$, $Δ(z_{\mathrm{HI}})=2.7\times{10}^{-5}$, and $Δ(b_{\mathrm{HI}})=4.1\ \mathrm{km\ s^{-1}}$. The CNN prediction costs $<3$ minutes per model per spectrum with a size of 120\,000 pixels using a laptop computer. We demonstrate that CNNs can significantly increase the efficiency of analysing Ly$α$ forest spectra, and thereby greatly increase the statistics of Ly$α$ absorbers.