论文标题
比较文档语义的拓扑方法
A Topological Method for Comparing Document Semantics
论文作者
论文摘要
比较文档语义是自然语言处理和信息检索中最艰巨的任务之一。迄今为止,一方面,此任务的工具仍然很少见。另一方面,大多数相关方法是根据统计量或向量空间模型的角度设计的,但从拓扑角度看几乎没有。在本文中,我们希望发出不同的声音。提出了一种基于拓扑持久性的新型算法,以比较两个文档之间的语义相似性。我们的实验是在具有人类法官结果的文档数据集上进行的。选择了最先进的方法进行比较。实验结果表明,我们的算法可以产生高度的人类持续的结果,并且通过与NLTK的联系来击败大多数最先进的方法。
Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges' results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.