论文标题
范式发现问题
The Paradigm Discovery Problem
论文作者
论文摘要
这项工作处理范式发现问题(PDP),这是从未经通知的句子中学习拐点形态系统的任务。我们对PDP进行了正式化并为评审系统开发评估指标。使用当前可用的资源,我们为任务构建数据集。我们还为PDP设计了一个启发式基准,并报告了五种不同语言的经验结果。我们的基准系统首先利用单词嵌入和字符串相似性,通过单元格和范式与群集形式相似。然后,我们在群集数据的顶部引导神经传感器引导神经传感器,以预测单词以实现空范式插槽。对系统的错误分析表明,细胞在不同的拐点范围内聚类是未来工作的最紧迫的挑战。我们的代码和数据可供公开使用。
This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work. Our code and data are available for public use.