作为数据增强的样式转移：关于指定实体识别的案例研究

论文标题

作为数据增强的样式转移：关于指定实体识别的案例研究

Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

论文作者

Chen, Shuguang, Neves, Leonardo, Solorio, Thamar

论文摘要

在这项工作中，我们将英语的指定实体识别任务作为案例研究，并探索样式转移作为数据增强方法，以增加低资源场景中培训数据的大小和多样性。我们提出了一种新方法，可以通过更改与样式相关的属性以生成培训的合成数据，从而有效地将文本从高资源域转变为低资源域。此外，我们设计了一种受约束的解码算法以及一组数据选择的关键成分，以确保生成有效和相干数据的生成。在不同数据制度下，对五个不同领域对的实验和分析表明，与当前的最新数据增强方法相比，我们的方法可以显着改善结果。我们的方法是解决数据稀缺的实际解决方案，我们希望它适用于其他NLP任务。

In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios. We propose a new method to effectively transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes to generate synthetic data for training. Moreover, we design a constrained decoding algorithm along with a set of key ingredients for data selection to guarantee the generation of valid and coherent data. Experiments and analysis on five different domain pairs under different data regimes demonstrate that our approach can significantly improve results compared to current state-of-the-art data augmentation methods. Our approach is a practical solution to data scarcity, and we expect it to be applicable to other NLP tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题