砂砾：一般强大的图像任务基准测试

论文标题

砂砾：一般强大的图像任务基准测试

GRIT: General Robust Image Task Benchmark

论文作者

Gupta, Tanmay, Marten, Ryan, Kembhavi, Aniruddha, Hoiem, Derek

论文摘要

当测试分布与训练分布相似时，计算机视觉模型在做出预测方面表现出色。这样的模型尚未与生物学愿景从多个来源学习并推广到新的数据源和任务的能力。为了促进更通用的视觉系统的开发和评估，我们介绍了一般强大的图像任务（Grit）基准。砂砾评估视觉系统在各种图像预测任务，概念和数据源中的性能，鲁棒性和校准。选择砂砾中的七个任务以涵盖一系列视觉技能：对象分类，对象定位，表达接地，视觉询问答案，分割，人类关键点检测和表面正常估计。谨慎设计的砂砾是为了评估图像扰动，图像源分布移位和概念分布变化下的鲁棒性。通过提供一个统一的平台，以彻底评估视觉模型学到的技能和概念，我们希望勇气能促进表演者和健壮的通用视觉系统的发展。

Computer vision models excel at making predictions when the test distribution closely resembles the training distribution. Such models have yet to match the ability of biological vision to learn from multiple sources and generalize to new data sources and tasks. To facilitate the development and evaluation of more general vision systems, we introduce the General Robust Image Task (GRIT) benchmark. GRIT evaluates the performance, robustness, and calibration of a vision system across a variety of image prediction tasks, concepts, and data sources. The seven tasks in GRIT are selected to cover a range of visual skills: object categorization, object localization, referring expression grounding, visual question answering, segmentation, human keypoint detection, and surface normal estimation. GRIT is carefully designed to enable the evaluation of robustness under image perturbations, image source distribution shift, and concept distribution shift. By providing a unified platform for thorough assessment of skills and concepts learned by a vision model, we hope GRIT catalyzes the development of performant and robust general purpose vision systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题