论文标题
SDCL:中文咒语检查的自我鉴定对比度学习
SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking
论文作者
论文摘要
由于同音词的模棱两可,中国咒语检查(CSC)具有广泛的应用。现有系统通常利用BERT进行文本编码。但是,CSC要求该模型同时考虑语音和石墨性信息。为了适应BERT,我们提出了一个令牌级的自我鉴定对比学习方法。我们采用BERT来编码损坏和相应的正确句子。然后,我们使用对比度学习损失来使损坏的令牌的隐藏状态正规化与正确句子中的对应物更接近。在三个CSC数据集上,我们确认我们的方法可在基准高于基线的情况下进行显着改进。
Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.