论文标题
一项关于报纸差异时的图案驱动驱动研究
A Pattern-mining Driven Study on Differences of Newspapers in Expressing Temporal Information
论文作者
论文摘要
本文研究了不同类型的报纸在表达时间信息时的差异,这是一个没有得到太多关注的话题。采用时间处理和模式挖掘领域的技术来研究该主题。首先,作者创建了带有时间信息的语料库。然后,从语料库中提取了与部分词性标签混合的时间信息标签的序列。 TKS算法用于从序列中挖出跳过的图案。通过这些模式,获得了四个报纸的签名。为了使签名独特地描述报纸,我们通过删除参考模式来修改签名。通过检查签名和修订的签名中的模式数量,包含时间信息标签的模式的比例以及包含时间信息标签的特定模式,发现报纸在表达时间信息的方式上有所不同。
This paper studies the differences between different types of newspapers in expressing temporal information, which is a topic that has not received much attention. Techniques from the fields of temporal processing and pattern mining are employed to investigate this topic. First, a corpus annotated with temporal information is created by the author. Then, sequences of temporal information tags mixed with part-of-speech tags are extracted from the corpus. The TKS algorithm is used to mine skip-gram patterns from the sequences. With these patterns, the signatures of the four newspapers are obtained. In order to make the signatures uniquely characterize the newspapers, we revise the signatures by removing reference patterns. Through examining the number of patterns in the signatures and revised signatures, the proportion of patterns containing temporal information tags and the specific patterns containing temporal information tags, it is found that newspapers differ in ways of expressing temporal information.