论文标题
GSCLIP:用自然语言解释发行变化的框架
GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language
论文作者
论文摘要
帮助最终用户理解抽象分发的变化可以极大地促进AI部署。在此激励的情况下,我们提出了一项新颖的任务,数据集解释。给定两个图像数据集,数据集说明旨在自然用自然语言指出其数据集级别的分布。当前用于监视分配变化的技术提供了不足的信息以了解数据集,以提高数据质量。因此,我们介绍了GSCLIP,这是一个无培训的框架来解决数据集说明任务。在GSCLIP中,我们将选择器作为第一种定量评估方法,以识别适当总结数据集偏移的解释。此外,我们利用此选择器来证明基于语言模型生成的发电机的优势。对自然数据转移的系统评估验证了GSCLIP(混合发电机组的组合系统和有效的选择器的组合系统不仅易于使用,而且对于数据集的说明也很强大。
Helping end users comprehend the abstract distribution shifts can greatly facilitate AI deployment. Motivated by this, we propose a novel task, dataset explanation. Given two image data sets, dataset explanation aims to automatically point out their dataset-level distribution shifts with natural language. Current techniques for monitoring distribution shifts provide inadequate information to understand datasets with the goal of improving data quality. Therefore, we introduce GSCLIP, a training-free framework to solve the dataset explanation task. In GSCLIP, we propose the selector as the first quantitative evaluation method to identify explanations that are proper to summarize dataset shifts. Furthermore, we leverage this selector to demonstrate the superiority of a generator based on language model generation. Systematic evaluation on natural data shift verifies that GSCLIP, a combined system of a hybrid generator group and an efficient selector is not only easy-to-use but also powerful for dataset explanation at scale.