论文标题
机器可以帮助我们回答数据表中的问题16,进而反思不适当的内容吗?
Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?
论文作者
论文摘要
当前机器学习的大部分基础的大型数据集提出了有关不适当内容的严重问题,例如攻击性,侮辱性,威胁或可能引起焦虑。这需要增加数据集文档,例如使用数据表。它们除其他主题外,鼓励反思数据集的组成。到目前为止,该文档是手动完成的,因此可能乏味且容易出错,尤其是对于大型图像数据集。在这里,我们可以提出一个可以说的“循环”问题,即机器是否可以帮助我们反思不适当的内容,并在数据表中回答问题16。为此,我们建议使用存储在预训练的变压器模型中的信息来协助我们进行文档过程。具体而言,基于社会 - 道德价值数据集的及时调整引导剪辑以识别潜在的不适当的内容,从而减少了人工的劳动。然后,我们根据使用视觉模型生成的字幕来记录使用单词云找到的不适当图像。两个流行的大规模计算机视觉数据集的文档(ImageNet和OpenImages)以这种方式产生,这表明机器确实可以帮助数据集创建者在不适当的图像内容上回答问题16。
Large datasets underlying much of current machine learning raise serious issues concerning inappropriate content such as offensive, insulting, threatening, or might otherwise cause anxiety. This calls for increased dataset documentation, e.g., using datasheets. They, among other topics, encourage to reflect on the composition of the datasets. So far, this documentation, however, is done manually and therefore can be tedious and error-prone, especially for large image datasets. Here we ask the arguably "circular" question of whether a machine can help us reflect on inappropriate content, answering Question 16 in Datasheets. To this end, we propose to use the information stored in pre-trained transformer models to assist us in the documentation process. Specifically, prompt-tuning based on a dataset of socio-moral values steers CLIP to identify potentially inappropriate content, therefore reducing human labor. We then document the inappropriate images found using word clouds, based on captions generated using a vision-language model. The documentations of two popular, large-scale computer vision datasets -- ImageNet and OpenImages -- produced this way suggest that machines can indeed help dataset creators to answer Question 16 on inappropriate image content.