通过累积单词时间的序列波动来分解图书结构

论文标题

通过累积单词时间的序列波动来分解图书结构

A decomposition of book structure through ousiometric fluctuations in cumulative word-time

论文作者

Fudolig, Mikaela Irene, Alshaabi, Thayer, Cramer, Kathryn, Danforth, Christopher M., Dodds, Peter Sheridan

论文摘要

尽管已使用定量方法来检查书籍中单词用法的变化，但研究集中在整体趋势上，例如独立于书籍长度的叙事形状。相反，我们研究了在书籍过程中的单词如何变化，这是单词数的函数，而不是在任何给定点完成的书籍的比例。我们将此度量定义为“累积单词时间”。使用OUSIOMERTIC，重新解释了从语义差异获得的含义的价值占主导地位的框架，我们将文本转换为累积单词时间中的时间序列和危险分数。然后，每个时间序列都使用经验模式分解分解为组成振荡模式和非振荡趋势的总和。通过将原始力量和危险时间序列的分解与从改组文本中得出的分解，我们发现较短的书籍仅显示出一般趋势，而较长的书籍除了一般趋势外，还具有波动。这些波动通常有几千个单词的时期，无论书籍长度或库分类代码如何，但根据书的内容和结构而有所不同。我们的发现表明，从序列意义上讲，较长的书籍不是较短的书籍的扩展版本，而是结构上的较短文本更相似。此外，它们与需要更长的文本分为几节（例如章节）的编辑实践一致。我们的方法还提供了一种数据驱动的denoising方法，可适用于各个长度的文本，与使用可能会无意间平滑相关信息的大型窗口尺寸的更传统的方法相反，尤其是对于较短的文本。这些结果为计算文学分析的未来工作开辟了途径，尤其是对叙事基本单位的测量。

While quantitative methods have been used to examine changes in word usage in books, studies have focused on overall trends, such as the shapes of narratives, which are independent of book length. We instead look at how words change over the course of a book as a function of the number of words, rather than the fraction of the book, completed at any given point; we define this measure as "cumulative word-time". Using ousiometrics, a reinterpretation of the valence-arousal-dominance framework of meaning obtained from semantic differentials, we convert text into time series of power and danger scores in cumulative word-time. Each time series is then decomposed using empirical mode decomposition into a sum of constituent oscillatory modes and a non-oscillatory trend. By comparing the decomposition of the original power and danger time series with those derived from shuffled text, we find that shorter books exhibit only a general trend, while longer books have fluctuations in addition to the general trend. These fluctuations typically have a period of a few thousand words regardless of the book length or library classification code, but vary depending on the content and structure of the book. Our findings suggest that, in the ousiometric sense, longer books are not expanded versions of shorter books, but are more similar in structure to a concatenation of shorter texts. Further, they are consistent with editorial practices that require longer texts to be broken down into sections, such as chapters. Our method also provides a data-driven denoising approach that works for texts of various lengths, in contrast to the more traditional approach of using large window sizes that may inadvertently smooth out relevant information, especially for shorter texts. These results open up avenues for future work in computational literary analysis, particularly the measurement of a basic unit of narrative.

下载PDF全文

下载文献需遵守相关版权规定

论文标题