Crowdaq的简单，可重现和质量控制的数据收集

论文标题

Crowdaq的简单，可重现和质量控制的数据收集

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

论文作者

Ning, Qiang, Wu, Hao, Dasigi, Pradeep, Dua, Dheeru, Gardner, Matt, Logan IV, Robert L., Marasovic, Ana, Nie, Zhen

论文摘要

高质量和大规模数据是AI系统成功的关键。但是，大规模的数据注释工作通常面临一系列共同的挑战：（1）设计一个用户友好的注释接口；（2）有效培训足够的注释；（3）可重复性。为了解决这些问题，我们介绍了Crowdaq，这是一个开源平台，该平台用可自定义的用户界面组件，自动注释器资格和可重新使用格式保存的管道标准化数据收集管道。我们表明，Crowdaq在各种数据收集用例中大大简化了数据注释，我们希望它将成为社区的便捷工具。

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题