论文标题
Crowdaq的简单,可重现和质量控制的数据收集
Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq
论文作者
论文摘要
高质量和大规模数据是AI系统成功的关键。但是,大规模的数据注释工作通常面临一系列共同的挑战:(1)设计一个用户友好的注释接口; (2)有效培训足够的注释; (3)可重复性。为了解决这些问题,我们介绍了Crowdaq,这是一个开源平台,该平台用可自定义的用户界面组件,自动注释器资格和可重新使用格式保存的管道标准化数据收集管道。我们表明,Crowdaq在各种数据收集用例中大大简化了数据注释,我们希望它将成为社区的便捷工具。
High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.