HyperMixer：基于MLP的低成本替代变压器

论文标题

HyperMixer：基于MLP的低成本替代变压器

HyperMixer: An MLP-based Low Cost Alternative to Transformers

论文作者

Mai, Florian, Pannatier, Arnaud, Fehr, Fabio, Chen, Haolin, Marelli, Francois, Fleuret, Francois, Henderson, James

论文摘要

基于变压器的体系结构是自然语言理解的首选模型，但是它们的成本很高，因为它们在输入长度上具有二次复杂性，需要大量的培训数据，并且很难调整。为了追求较低的成本，我们研究了简单的基于MLP的架构。我们发现，现有的体系结构（例如MLPMIXER）通过独立应用于每个功能的静态MLP实现令牌混合，与自然语言理解所需的电感偏见相距太多。在本文中，我们提出了一个简单的变体HyperMixer，该变体使用HyperNetworks动态地形成了令牌MLP。从经验上讲，我们证明我们的模型的性能比基于MLP的替代模型更好，并且与变压器相当。与变压器相反，Hyper -Cixer在处理时间，训练数据和高参数调整方面以较低的成本实现这些结果。

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题