论文标题

PERT:针对角色转换任务的新解决方案

PERT: A New Solution to Pinyin to Character Conversion Task

论文作者

Xiao, Jinghui, Liu, Qun, Jiang, Xin, Xiong, Yuanfeng, Wu, Haiteng, Zhang, Zhe

论文摘要

拼音转换为角色转换(P2C)任务是亚洲语言商业输入软件中输入方法引擎(IME)的关键任务,例如中文,日语,泰语等。通常将其视为序列标记任务,并通过语言模型(即n-gram或rnn)解决。但是,N-Gram或RNN的低容量限制了其性能。本文引入了一种名为PERT的新解决方案,该解决方案代表来自变形金刚的双向拼音编码器表示。它比基线的性能显着改善。此外,我们将PERT与Markov框架下的N-Cram结合在一起,并进一步提高性能。最后,将外部词典纳入PERT,以解决IME的OOD问题。

Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language and so on. It's usually treated as sequence labelling task and resolved by language model, i.e. n-gram or RNN. However, the low capacity of the n-gram or RNN limits its performance. This paper introduces a new solution named PERT which stands for bidirectional Pinyin Encoder Representations from Transformers. It achieves significant improvement of performance over baselines. Furthermore, we combine PERT with n-gram under a Markov framework, and improve performance further. Lastly, the external lexicon is incorporated into PERT so as to resolve the OOD issue of IME.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源