使用无监督的机器学习探索X射线变异性I.应用于XMM-Newton数据的自组织图

论文标题

使用无监督的机器学习探索X射线变异性I.应用于XMM-Newton数据的自组织图

Exploring X-ray variability with unsupervised machine learning I. Self-organizing maps applied to XMM-Newton data

论文作者

Kovačević, Miloš, Pasquato, Mario, Marelli, Martino, De Luca, Andrea, Salvaterra, Ruben, Mondoni, Andrea Belfiore

论文摘要

XMM-Newton为X射线宇宙提供了前所未有的见解，为数十万个来源记录了可变性信息。在光曲线中手动搜索有趣的模式是不切实际的，需要一种自动数据挖掘方法来表征来源。将时间模型直接拟合到光线曲线并不是识别它们的确定方法，尤其是使用嘈杂的数据。我们使用无监督的机器学习来提炼大量的光曲线参数，从而揭示其聚类结构，以准备异常检测，并随后搜索特定的源行为（例如，耀斑，蚀）。自组织地图（SOM）在一个框架内实现了尺寸降低和聚类。它们是一种人工神经网络，训练有训练，可以用离散互连单元的二维网格近似数据，稍后可以在平面上可视化。我们培训了SOM的仅限时间参数，这些参数是根据来自Extras目录的100,000多个检测到的。最终的地图显示，根据时间特征，大约2500个可变源是聚集的。我们发现与耀斑，日食，倾斜，线性光曲线等相关的SOM图的独特区域。每个组都包含眼睛相似的来源。我们挑选了一些有趣的来源以进行进一步研究。 SOM提供的数据集的凝结视图使我们能够识别相似来源的组，从而通过数量级来加快手动表征。我们的方法还强调了将简单的时间模型拟合到点亮曲线的问题，并且可以在一定程度上缓解它们。这对于完全利用即将进行的X射线调查预期的高数据量至关重要，并且也可能有助于解释监督分类模型。

XMM-Newton provides unprecedented insight into the X-ray Universe, recording variability information for hundreds of thousands of sources. Manually searching for interesting patterns in light curves is impractical, requiring an automated data-mining approach for the characterization of sources. Straightforward fitting of temporal models to light curves is not a sure way to identify them, especially with noisy data. We used unsupervised machine learning to distill a large data set of light-curve parameters, revealing its clustering structure in preparation for anomaly detection and subsequent searches for specific source behaviors (e.g., flares, eclipses). Self-organizing maps (SOMs) achieve dimensionality reduction and clustering within a single framework. They are a type of artificial neural network trained to approximate the data with a two-dimensional grid of discrete interconnected units, which can later be visualized on the plane. We trained our SOM on temporal-only parameters computed from more than 100,000 detections from the EXTraS catalog. The resulting map reveals that about 2500 most variable sources are clustered based on temporal characteristics. We find distinctive regions of the SOM map associated with flares, eclipses, dips, linear light curves, and others. Each group contains sources that appear similar by eye. We single out a handful of interesting sources for further study. The condensed view of our dataset provided by SOMs allowed us to identify groups of similar sources, speeding up manual characterization by orders of magnitude. Our method also highlights problems with fitting simple temporal models to light curves and can be used to mitigate them to an extent. This will be crucial for fully exploiting the high data volume expected from upcoming X-ray surveys, and may also help with interpreting supervised classification models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题