非平稳变压器：探索时间序列预测的平稳性

论文标题

非平稳变压器：探索时间序列预测的平稳性

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

论文作者

Liu, Yong, Wu, Haixu, Wang, Jianmin, Long, Mingsheng

论文摘要

由于其全球范围建模能力，变形金刚在时间序列预测中表现出很大的力量。但是，它们的性能在非平稳的现实世界数据上可能会出现严重的脱颖而出，其中关节分布会随着时间的流逝而变化。先前的研究主要采用固定化来减轻原始系列的非平稳性，以更好地可预测性。但是，被固有的非平稳性的固定化系列对现实世界中的爆发事件的预测可能会降低。这个问题在本文中被称为安排过度，导致变形金刚针对不同的序列产生无法区分的时间关注，并阻碍了深层模型的预测能力。为了解决串联可预测性和模型能力之间的困境，我们将非平稳变压器作为一个具有两个相互依存模块的通用框架：串联固定化和去稳定的关注。具体而言，串联固定化统一了每个输入的统计数据，并使用恢复的统计量转换输出以获得更好的可预测性。为了解决平台化问题，通过近似从原始系列中学到的可区分注意力来将固定的注意力恢复到时间依赖性中，以将固定的注意力恢复到时间依赖性中。我们的非平稳变压器框架始终将主流变压器提高了很大的利润，这使变压器的MSE降低了49.43％，在Informer上降低了47.34％，改革者的MSE降低了46.34％，使其成为时间序列的最先进的预测。代码可在此存储库中获得：https：//github.com/thuml/nonstationary_transformers。

Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题