论文标题
在感知数据中找到标签和模型错误,并通过学习的观察断言
Finding Label and Model Errors in Perception Data With Learned Observation Assertions
论文作者
论文摘要
ML部署在复杂的现实世界情景中,其中错误会带来影响。在这些系统中,对ML管道的彻底测试至关重要。 ML部署管道中的关键组成部分是标记培训数据的策划。 ML文献中的常见实践假设标签是基础真理。但是,根据我们在大型自动驾驶汽车开发中心的经验,我们发现供应商通常可以提供错误的标签,这可能会导致训练有素的模型中的下游安全风险。 为了解决这些问题,我们提出了一个新的抽象,学习的观察断言,并在称为FIXY的系统中实施。 Fixy利用现有的组织资源,例如标有数据集的现有(可能是嘈杂)或以前受过训练的ML模型,以学习一种概率模型,以查找人类或模型生成的标签中的错误。给定用户提供的功能和这些现有资源,FIXY学习了特征分布,这些特征分布指定了可能且不太可能的值(例如,速度30mph的可能性可能为30mph,但300mph的速度不太可能)。然后,它使用这些功能分布来为潜在错误评分标签。我们表明,与最近在模型断言和标准技术(例如不确定性采样)上的工作相比,FIXY可以自动对实际数据集中的潜在错误进行排名,最高2 $ \ times $。
ML is being deployed in complex, real-world scenarios where errors have impactful consequences. In these systems, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data. Common practice in the ML literature assumes that labels are the ground truth. However, in our experience in a large autonomous vehicle development center, we have found that vendors can often provide erroneous labels, which can lead to downstream safety risks in trained models. To address these issues, we propose a new abstraction, learned observation assertions, and implement it in a system called Fixy. Fixy leverages existing organizational resources, such as existing (possibly noisy) labeled datasets or previously trained ML models, to learn a probabilistic model for finding errors in human- or model-generated labels. Given user-provided features and these existing resources, Fixy learns feature distributions that specify likely and unlikely values (e.g., that a speed of 30mph is likely but 300mph is unlikely). It then uses these feature distributions to score labels for potential errors. We show that FIxy can automatically rank potential errors in real datasets with up to 2$\times$ higher precision compared to recent work on model assertions and standard techniques such as uncertainty sampling.