论文标题
HyperMixer:基于MLP的低成本替代变压器
HyperMixer: An MLP-based Low Cost Alternative to Transformers
论文作者
论文摘要
基于变压器的体系结构是自然语言理解的首选模型,但是它们的成本很高,因为它们在输入长度上具有二次复杂性,需要大量的培训数据,并且很难调整。为了追求较低的成本,我们研究了简单的基于MLP的架构。我们发现,现有的体系结构(例如MLPMIXER)通过独立应用于每个功能的静态MLP实现令牌混合,与自然语言理解所需的电感偏见相距太多。在本文中,我们提出了一个简单的变体HyperMixer,该变体使用HyperNetworks动态地形成了令牌MLP。从经验上讲,我们证明我们的模型的性能比基于MLP的替代模型更好,并且与变压器相当。与变压器相反,Hyper -Cixer在处理时间,训练数据和高参数调整方面以较低的成本实现这些结果。
Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.