论文标题
学习使用以计划者为中心的指标评估感知模型
Learning to Evaluate Perception Models Using Planner-Centric Metrics
论文作者
论文摘要
准确性和精度的变体是计算机视觉社区衡量感知算法进度的金标准。这些指标无处不在的原因之一是它们在很大程度上是任务敏捷的;我们通常寻求检测零的假否定因素或积极因素。这些指标的缺点是,在最坏的情况下,它们平等地对所有不正确的检测进行了惩罚,而无需在任务或场景上进行调节,充其量需要选择启发式方法以确保不同的错误以不同的方式计算。在本文中,我们建议专门用于自动驾驶任务的3D对象检测的原则指标。我们的度量标准背后的核心思想是隔离对象检测的任务,并测量产生的检测会引起对驾驶的下游任务的影响。在没有手工设计的情况下,我们发现我们的指标会惩罚其他指标因设计而惩罚的许多错误。此外,我们的度量下降基于其他因素,例如从检测到自我汽车的距离以及其他检测指标没有的直观方式的检测速度。对于人类评估,我们产生了标准指标和我们的指标分歧的场景,并发现人类在79%的时间内与我们的指标相提并论。我们的项目页面(包括评估服务器)可以在https://nv-tlabs.github.io/detection-relevance上找到。
Variants of accuracy and precision are the gold-standard by which the computer vision community measures progress of perception algorithms. One reason for the ubiquity of these metrics is that they are largely task-agnostic; we in general seek to detect zero false negatives or positives. The downside of these metrics is that, at worst, they penalize all incorrect detections equally without conditioning on the task or scene, and at best, heuristics need to be chosen to ensure that different mistakes count differently. In this paper, we propose a principled metric for 3D object detection specifically for the task of self-driving. The core idea behind our metric is to isolate the task of object detection and measure the impact the produced detections would induce on the downstream task of driving. Without hand-designing it to, we find that our metric penalizes many of the mistakes that other metrics penalize by design. In addition, our metric downweighs detections based on additional factors such as distance from a detection to the ego car and the speed of the detection in intuitive ways that other detection metrics do not. For human evaluation, we generate scenes in which standard metrics and our metric disagree and find that humans side with our metric 79% of the time. Our project page including an evaluation server can be found at https://nv-tlabs.github.io/detection-relevance.