论文标题
使用注意力和潜在空间正则处理丢失的数据
Dealing with missing data using attention and latent space regularization
论文作者
论文摘要
大多数实用的数据科学问题遇到缺少数据。存在各种各样的解决方案,每种解决方案都具有取决于失踪过程的优点和缺点。在这里,我们仅使用仅观察到的变量来开发一个理论框架,用于训练和推理,从而实现了不完整的数据集建模而无需归类。使用信息和测量理论参数,我们构建具有潜在空间表示的模型,这些模型与丢失数据引入的潜在偏差进行了正规化。使用合成数据集在经验上证明了这种方法的理论特性。在11个基准数据集上测试了这种方法的性能,并在三种丢失模式中损坏了18个数据集,并与最先进的模型和行业标准的插图进行了比较。我们表明,我们提出的方法克服了插补方法的弱点,并优于当前的最新方法。
Most practical data science problems encounter missing data. A wide variety of solutions exist, each with strengths and weaknesses that depend upon the missingness-generating process. Here we develop a theoretical framework for training and inference using only observed variables enabling modeling of incomplete datasets without imputation. Using an information and measure-theoretic argument we construct models with latent space representations that regularize against the potential bias introduced by missing data. The theoretical properties of this approach are demonstrated empirically using a synthetic dataset. The performance of this approach is tested on 11 benchmarking datasets with missingness and 18 datasets corrupted across three missingness patterns with comparison against a state-of-the-art model and industry-standard imputation. We show that our proposed method overcomes the weaknesses of imputation methods and outperforms the current state-of-the-art.