论文标题

GPT-2注意模式的熵和距离的预测指标可预测GPT-2超过GPT-2的阅读时间

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

论文作者

Oh, Byung-Doh, Schuler, William

论文摘要

通过培训了基于变形金刚的大型语言模型,以通过其自我注意力的机制汇总以前的代币表示下一个单词的预测。在认知建模领域,这种注意力模式最近被解释为体现了基于提示的检索过程,其中采取了多个目标的注意以在检索过程中产生干扰和潜伏期。在此框架下,这项工作首先定义了一个基于熵的预测指标,该预测指标量化了自我注意力的扩散性,以及基于距离的预测指标,该预测因素捕获了跨时间段的注意力模式的增量变化。此外,在最近的研究质疑注意力体重的信息性之后,我们还尝试了将矢量规范纳入注意力权重的替代方法。使用从GPT-2语言模型中计算出的预测变量的回归实验表明,这些预测因子在严格的基线(包括GPT-2惊人的基线)上可以更好地拟合自定进度的阅读和眼球跟踪数据。此外,基于距离的预测因子通常表现出更高的预测能力,在自定进度的阅读时间(惊人的2.82毫秒相比,每个标准偏差)的效应大小高达6.59毫秒,而每标准偏差为1.05毫秒,在眼睛凝视持续时间内(相比之下为3.81毫秒的惊人)。

Transformer-based large language models are trained to make predictions about the next word by aggregating representations of previous tokens through their self-attention mechanism. In the field of cognitive modeling, such attention patterns have recently been interpreted as embodying the process of cue-based retrieval, in which attention over multiple targets is taken to generate interference and latency during retrieval. Under this framework, this work first defines an entropy-based predictor that quantifies the diffuseness of self-attention, as well as distance-based predictors that capture the incremental change in attention patterns across timesteps. Moreover, following recent studies that question the informativeness of attention weights, we also experiment with alternative methods for incorporating vector norms into attention weights. Regression experiments using predictors calculated from the GPT-2 language model show that these predictors deliver a substantially better fit to held-out self-paced reading and eye-tracking data over a rigorous baseline including GPT-2 surprisal. Additionally, the distance-based predictors generally demonstrated higher predictive power, with effect sizes of up to 6.59 ms per standard deviation on self-paced reading times (compared to 2.82 ms for surprisal) and 1.05 ms per standard deviation on eye-gaze durations (compared to 3.81 ms for surprisal).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源