论文标题

atmodist:自我监督的代表性学习大气动态

AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics

论文作者

Hoffmann, Sebastian, Lessig, Christian

论文摘要

在各种机器学习应用中,表示学习已被证明是一种强大的方法。但是,对于大气动力学,迄今为止尚未考虑它,这可以说是由于缺乏可用于培训的大规模标记的数据集。在这项工作中,我们表明困难是良性的,并引入了一项自我监督的学习任务,该任务定义了各种未标记的大气数据集的绝对损失。具体来说,我们在简单而复杂的任务上训练神经网络,即预测与不同但附近的大气场之间的时间距离。我们证明,对ERA5重新分析进行此任务的培训会导致内部表示,从而捕获了大气动态的内在方面。我们通过为大气状态引入数据驱动的距离度量。当在其他机器学习应用中用作损失功能时,与经典$ \ ell_2 $ -loss相比,该ATMODIST距离会提高结果。例如,对于降尺度,一个人获得了较高的分辨率字段,该字段比以前的方法更接近真正的统计信息,而对于缺失或遮挡数据的插值,ATMODIST距离会导致结果带来更逼真的精细规模特征。由于它来自观察数据,因此Atmodist还提供了关于大气可预测性的新观点。

Representation learning has proven to be a powerful methodology in a wide variety of machine learning applications. For atmospheric dynamics, however, it has so far not been considered, arguably due to the lack of large-scale, labeled datasets that could be used for training. In this work, we show that the difficulty is benign and introduce a self-supervised learning task that defines a categorical loss for a wide variety of unlabeled atmospheric datasets. Specifically, we train a neural network on the simple yet intricate task of predicting the temporal distance between atmospheric fields from distinct but nearby times. We demonstrate that training with this task on ERA5 reanalysis leads to internal representations capturing intrinsic aspects of atmospheric dynamics. We do so by introducing a data-driven distance metric for atmospheric states. When employed as a loss function in other machine learning applications, this Atmodist distance leads to improved results compared to the classical $\ell_2$-loss. For example, for downscaling one obtains higher resolution fields that match the true statistics more closely than previous approaches and for the interpolation of missing or occluded data the AtmoDist distance leads to results that contain more realistic fine scale features. Since it is derived from observational data, AtmoDist also provides a novel perspective on atmospheric predictability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源