论文标题
用依赖树的多项式表示量化语法相似性
Quantifying syntax similarity with a polynomial representation of dependency trees
论文作者
论文摘要
我们介绍了一个多项式图,该图可区分树结构以表示依赖性语法和基于多项式表示的度量,以量化语法相似性。多项式编码有关句子中单词的依赖性结构和依赖关系的准确而全面的信息。我们应用基于多项式的方法来分析平行通用依赖性树库中的句子。具体而言,我们用不同语言比较句子及其翻译的语法,并在平行通用依赖性树库中对可用语言进行了句法类型学研究。我们还证明并讨论了方法测量语料库语法多样性的潜力。
We introduce a graph polynomial that distinguishes tree structures to represent dependency grammar and a measure based on the polynomial representation to quantify syntax similarity. The polynomial encodes accurate and comprehensive information about the dependency structure and dependency relations of words in a sentence. We apply the polynomial-based methods to analyze sentences in the Parallel Universal Dependencies treebanks. Specifically, we compare the syntax of sentences and their translations in different languages, and we perform a syntactic typology study of available languages in the Parallel Universal Dependencies treebanks. We also demonstrate and discuss the potential of the methods in measuring syntax diversity of corpora.