OKAPI：通过制作统计匹配匹配来更好地概括

论文标题

OKAPI：通过制作统计匹配匹配来更好地概括

Okapi: Generalising Better by Making Statistical Matches Match

论文作者

Bartlett, Myles, Romiti, Sara, Sharmanska, Viktoriia, Quadrianto, Novi

论文摘要

我们提出了Okapi，这是一种基于在线统计匹配的鲁棒性半监督学习的简单，高效且一般的方法。我们的方法使用最近基于邻居的匹配过程来生成跨域视图以实现一致性损失，同时消除了统计异常值。为了以运行时和内存有效的方式执行在线匹配，我们利用自我监管的文献，并将记忆库与缓慢移动的动量编码器相结合。一致性损失是在特征空间内的，而不是在预测分布中应用的，这使得对模式和所讨论的任务不可知。我们在Wilds 2.0数据集Sagawa等人进行实验，该数据集大大扩展了可用于研究和基准现实世界中无与伦比的适应性的模态，应用和转移的范围。与Sagawa等人相反，我们表明实际上有可能利用其他未标记的数据来改善使用正确方法的经验风险最小化（ERM）结果。我们的方法在IWildCAM（一个多级分类任务）和PoverTyMap（回归任务）图像数据集以及民用（二进制分类任务）文本数据集上，根据IWildCAM（一个多级分类任务）和贫困榜单（一个回归任务）上的概括（OOD）概括的基线方法。此外，从定性的角度来看，我们表明从学习的编码器获得的匹配与语义相关。我们的论文代码可在https://github.com/wearepal/okapi/上公开获得。

We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime- and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets Sagawa et al., which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题