论文标题

不受监督的不连续的选区解析,有轻度的上下文敏感语法

Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

论文作者

Yang, Songlin, Levy, Roger P., Kim, Yoon

论文摘要

我们研究语法诱导,具有轻度上下文敏感的语法,以进行无限制的不连续解析。使用概率线性上下文重写系统(LCFRS)形式主义,我们的方法预先修复了规则结构,并专注于最大可能性的参数学习。为了降低解析和参数估计的计算复杂性,我们将语法形式主义限制为LCFRS-2(即带有fan-fan-fan-fan-fan的二进制LCFR),并进一步丢弃需要分析时间O(n^6)的规则,将推断减少到O(n^5)。我们发现,使用大量非末端是有益的,因此可以利用基于张量分解的秩空间动态编程,并具有基于嵌入的规则概率的嵌入参数化来扩大非终端的数量。关于德语和荷兰的实验表明,我们的方法能够用连续和不连续的结构诱导语言有意义的树木

We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing. Using the probabilistic linear context-free rewriting system (LCFRS) formalism, our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood. To reduce the computational complexity of both parsing and parameter estimation, we restrict the grammar formalism to LCFRS-2 (i.e., binary LCFRS with fan-out two) and further discard rules that require O(n^6) time to parse, reducing inference to O(n^5). We find that using a large number of nonterminals is beneficial and thus make use of tensor decomposition-based rank-space dynamic programming with an embedding-based parameterization of rule probabilities to scale up the number of nonterminals. Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源