在缺少数据的情况下VAE

论文标题

在缺少数据的情况下VAE

VAEs in the Presence of Missing Data

论文作者

Collier, Mark, Nazabal, Alfredo, Williams, Christopher K. I.

论文摘要

现实世界数据集通常包含缺少元素的条目，例如在医疗数据集中，患者不太可能接受所有可能的诊断测试。变分自动编码器（VAE）是经常用于无监督学习的流行生成模型。尽管使用了广泛的使用，但尚不清楚如何最好地将VAE应用于数据集中缺少数据的数据集。我们开发了一个新颖的潜在变量模型，该模型会产生丢失的数据，并得出相应的可拖动证据下限（ELBO）。我们的模型可以直接实现，可以完全随机处理丢失（MCAR），而不是随机（MNAR）数据丢失，将范围缩放到高维输入，并提供VAE Encoder和解码器原理访问指示器变量是否缺少数据元素。在MNIST和SVHN数据集上，我们证明了与现有方法相比，观察到的数据的边缘对数可能改善，并且丢失的数据插补更好。

Real world datasets often contain entries with missing elements e.g. in a medical dataset, a patient is unlikely to have taken all possible diagnostic tests. Variational Autoencoders (VAEs) are popular generative models often used for unsupervised learning. Despite their widespread use it is unclear how best to apply VAEs to datasets with missing data. We develop a novel latent variable model of a corruption process which generates missing data, and derive a corresponding tractable evidence lower bound (ELBO). Our model is straightforward to implement, can handle both missing completely at random (MCAR) and missing not at random (MNAR) data, scales to high dimensional inputs and gives both the VAE encoder and decoder principled access to indicator variables for whether a data element is missing or not. On the MNIST and SVHN datasets we demonstrate improved marginal log-likelihood of observed data and better missing data imputation, compared to existing approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题