论文标题
使用层次嵌入适应性的中文咒语检查使用适应性过滤的域转移调节
Domain-shift Conditioning using Adaptable Filtering via Hierarchical Embeddings for Robust Chinese Spell Check
论文作者
论文摘要
拼写检查是一个有用的应用程序,可处理嘈杂的人类生成的文本。拼写检查中国人构成了尚未解决的问题,这是由于字符数量,错误的稀疏分布以及缺乏资源的缺乏,并具有足够的异质和转移错误域的覆盖范围。对于中文咒语检查,使用混乱集进行过滤会缩小搜索空间,并使查找更正更容易。但是,大多数(如果不是全部)的混乱集已固定,因此不包括新的,转移的错误域。我们提出了一个可扩展的适应过滤器,该过滤器将层次字符嵌入到(1)消除手工混乱集的需求中,以及(2)解决与不频繁错误有关的稀疏问题。我们的方法与竞争基线相比,并在2014年和2015年中国拼写检查烘焙数据集中获得了SOTA结果。
Spell check is a useful application which processes noisy human-generated text. Spell check for Chinese poses unresolved problems due to the large number of characters, the sparse distribution of errors, and the dearth of resources with sufficient coverage of heterogeneous and shifting error domains. For Chinese spell check, filtering using confusion sets narrows the search space and makes finding corrections easier. However, most, if not all, confusion sets used to date are fixed and thus do not include new, shifting error domains. We propose a scalable adaptable filter that exploits hierarchical character embeddings to (1) obviate the need to handcraft confusion sets, and (2) resolve sparsity problems related to infrequent errors. Our approach compares favorably with competitive baselines and obtains SOTA results on the 2014 and 2015 Chinese Spelling Check Bake-off datasets.