使用注意力和潜在空间正则处理丢失的数据

论文标题

使用注意力和潜在空间正则处理丢失的数据

Dealing with missing data using attention and latent space regularization

论文作者

Penny-Dimri, Jahan C., Bergmeir, Christoph, Smith, Julian

论文摘要

大多数实用的数据科学问题遇到缺少数据。存在各种各样的解决方案，每种解决方案都具有取决于失踪过程的优点和缺点。在这里，我们仅使用仅观察到的变量来开发一个理论框架，用于训练和推理，从而实现了不完整的数据集建模而无需归类。使用信息和测量理论参数，我们构建具有潜在空间表示的模型，这些模型与丢失数据引入的潜在偏差进行了正规化。使用合成数据集在经验上证明了这种方法的理论特性。在11个基准数据集上测试了这种方法的性能，并在三种丢失模式中损坏了18个数据集，并与最先进的模型和行业标准的插图进行了比较。我们表明，我们提出的方法克服了插补方法的弱点，并优于当前的最新方法。

Most practical data science problems encounter missing data. A wide variety of solutions exist, each with strengths and weaknesses that depend upon the missingness-generating process. Here we develop a theoretical framework for training and inference using only observed variables enabling modeling of incomplete datasets without imputation. Using an information and measure-theoretic argument we construct models with latent space representations that regularize against the potential bias introduced by missing data. The theoretical properties of this approach are demonstrated empirically using a synthetic dataset. The performance of this approach is tested on 11 benchmarking datasets with missingness and 18 datasets corrupted across three missingness patterns with comparison against a state-of-the-art model and industry-standard imputation. We show that our proposed method overcomes the weaknesses of imputation methods and outperforms the current state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题