MetAshift：用于评估上下文分布变化和培训冲突的数据集数据集

论文标题

MetAshift：用于评估上下文分布变化和培训冲突的数据集数据集

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts

论文作者

Liang, Weixin, Zou, James

论文摘要

了解跨不同数据分布的机器学习模型的性能至关重要。在此激励的情况下，越来越重视策划捕获分布变化的基准数据集。虽然有价值，但现有的基准受到限制，因为其中许多仅包含少量转变，并且缺乏对不同变化的不同之处的系统注释。我们介绍了MetAshift-在410堂课上收集了12,868套自然图像的集合 - 应对这一挑战。我们利用视觉基因组及其注释的自然异质性来构建Metashift。关键的构造思想是使用其元数据群集图像，该元数据为每个图像（例如，浴室中的猫猫猫猫猫猫猫'或“浴室中的猫”），代表不同的数据分布。 MetAshift具有两个重要的好处：首先，它包含的天然数据变化比以前可用。其次，它提供了明确的解释，说明其每个数据集的独特内容以及衡量其两个数据集之间的分布变化量的距离得分。我们证明了MetAshift在基准测试最近的一些建议中，培训模型可以对数据转移进行鲁棒性的实用性。我们发现，当班次中等时，简单的经验风险最小化的性能最好，并且没有方法具有系统的优势。我们还展示了Metashift如何帮助可视化模型培训期间数据子集之间的冲突。

Understanding the performance of machine learning models across diverse data distributions is critically important for reliable applications. Motivated by this, there is a growing focus on curating benchmark datasets that capture distribution shifts. While valuable, the existing benchmarks are limited in that many of them only contain a small number of shifts and they lack systematic annotation about what is different across different shifts. We present MetaShift--a collection of 12,868 sets of natural images across 410 classes--to address this challenge. We leverage the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. The key construction idea is to cluster images using its metadata, which provides context for each image (e.g. "cats with cars" or "cats in bathroom") that represent distinct data distributions. MetaShift has two important benefits: first, it contains orders of magnitude more natural data shifts than previously available. Second, it provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets. We demonstrate the utility of MetaShift in benchmarking several recent proposals for training models to be robust to data shifts. We find that the simple empirical risk minimization performs the best when shifts are moderate and no method had a systematic advantage for large shifts. We also show how MetaShift can help to visualize conflicts between data subsets during model training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题