论文标题

意图分类的数据增强

Data Augmentation for Intent Classification

论文作者

Chen, Derek, Yin, Claire

论文摘要

培训准确的意图分类器需要标记的数据,这可能是昂贵的。数据增强方法可能会改善此问题,但是生成的数据的质量在各种技术之间差异很大。我们研究了使用多种数据增强技术(包括混合方法)一起系统地生产伪标记数据的过程。我们发现,尽管某些方法显着改善了定性和定量性能,但其他方法的影响最小甚至是负面影响。在生产中实施数据增强方法时,我们还分析了关键注意事项。

Training accurate intent classifiers requires labeled data, which can be costly to obtain. Data augmentation methods may ameliorate this issue, but the quality of the generated data varies significantly across techniques. We study the process of systematically producing pseudo-labeled data given a small seed set using a wide variety of data augmentation techniques, including mixing methods together. We find that while certain methods dramatically improve qualitative and quantitative performance, other methods have minimal or even negative impact. We also analyze key considerations when implementing data augmentation methods in production.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源