论文标题
使用路径测量倾斜的功能数据基于能量的模型
Energy-Based Models for Functional Data using Path Measure Tilting
论文作者
论文摘要
基于能量的模型(EBM)已被证明是在有限维空间上建模密度的高效方法。他们通过组成将特定于领域的选择和约束纳入模型结构的能力使EBM成为物理,生物学和计算机视觉以及其他各个领域应用的吸引人候选人。最近,为\ textIt {无条件}交换数据(例如,点云)提出了用于建模随机过程的基于能量的过程(EBP)。在这项工作中,我们提供了一个新颖的EBP子类,称为$ \ Mathcal {f} $ -EBM,用于\ textIt {条件}可交换数据,可以从有限的许多点评估的功能样本中学习功能分布(例如曲线或表面)。在功能环境中出现了两个独特的挑战。首先,通常不会沿一组固定点评估培训数据。其次,必须采取步骤来控制评估点之间模型的行为,以减轻过度拟合。所提出的模型是基于能量的功能空间模型,该模型在频谱上分解,其中高斯过程路径度量用于重新持续分布以捕获所建模的基础过程的平滑度属性。最终的模型具有使用不规则采样的训练数据的能力,并可以以任何分辨率输出预测,从而为提高功能数据提供有效的方法。我们证明了我们提出的方法对一系列数据集进行建模的功效,包括从标准和穷人的500(S \&P)和英国国家电网收集的数据。
Energy-Based Models (EBMs) have proven to be a highly effective approach for modelling densities on finite-dimensional spaces. Their ability to incorporate domain-specific choices and constraints into the structure of the model through composition make EBMs an appealing candidate for applications in physics, biology and computer vision and various other fields. Recently, Energy-Based Processes (EBP) for modelling stochastic processes was proposed for \textit{unconditional} exchangeable data (e.g., point clouds). In this work, we present a novel subclass of EBPs, called $\mathcal{F}$-EBM for \textit{conditional} exchangeable data, which is able to learn distributions of functions (such as curves or surfaces) from functional samples evaluated at finitely many points. Two unique challenges arise in the functional context. Firstly, training data is often not evaluated along a fixed set of points. Secondly, steps must be taken to control the behaviour of the model between evaluation points, to mitigate overfitting. The proposed model is an energy based model on function space that is decomposed spectrally, where a Gaussian Process path measure is used to reweight the distribution to capture smoothness properties of the underlying process being modelled. The resulting model has the ability to utilize irregularly sampled training data and can output predictions at any resolution, providing an effective approach to up-scaling functional data. We demonstrate the efficacy of our proposed approach for modelling a range of datasets, including data collected from Standard and Poor's 500 (S\&P) and UK National grid.