注意您的语气：引入一个新的数据集以重写礼貌的语言

论文标题

注意您的语气：引入一个新的数据集以重写礼貌的语言

Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

论文作者

Wang, Xun, Ge, Tao, Mao, Allen, Li, Yuki, Wei, Furu, Chen, Si-Qing

论文摘要

我们介绍\ textsc {politerewrite} - 一个有礼貌的语言重写的数据集，这是一个新颖的句子重写任务。与以前的文本样式转移任务相比，这些任务主要通过轻微的令牌或短语级别的编辑来解决，礼貌的语言重写需要对令人反感和不礼貌的句子进行深入的理解和广泛的句子级编辑，以便在委婉和有礼貌的情况下传达相同的信息，这对NLP模型来说是更具挑战性的挑战，而且对人类的注释也是如此。为了减轻人类为有效注释的努力，我们首先通过人类注释者和GPT-3.5的合作提出了一个新颖的注释范式，以注释\ textsc {politionerwrite}。已发布的数据集在GPT-3.5和Human协作中重写了10K礼貌的句子，可以用作培训，验证和测试的黄金标准； 100k高质量的礼貌句子通过GPT-3.5改写而无需人类审查。我们希望这项工作（即将发布的数据集（10K+100K）将很快发布）可以为更具挑战性的句子重写做出贡献，并在将来对资源注释范式的更多思考借助大规模预审预周审的模型。

We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题