喜悦：深度重量变压器

论文标题

喜悦：深度重量变压器

DeLighT: Deep and Light-weight Transformer

论文作者

Mehta, Sachin, Ghazvininejad, Marjan, Iyer, Srinivasan, Zettlemoyer, Luke, Hajishirzi, Hannaneh

论文摘要

我们介绍了一个深度且轻巧的变压器Deaster，其性能比具有更少参数的标准变压器模型相似或更好。喜悦更有效地在每个变压器的块中分配了参数（1），使用喜悦转换，深度和轻量级的转换以及（2）使用块缩放尺度在块上跨块，从而使输入附近的较浅且较窄的喜悦块在输出附近，更宽。总体而言，喜悦网络比标准变压器模型深2.5至4倍，但参数和操作较少。基准计算机翻译和语言建模任务上的实验表明，喜悦匹配或改善了基线变压器的性能，平均参数少2到3倍。我们的源代码可在：\ url {https://github.com/sacmehta/delight}

We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using the DeLighT transformation, a deep and light-weight transformation, and (2) across blocks using block-wise scaling, which allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. Experiments on benchmark machine translation and language modeling tasks show that DeLighT matches or improves the performance of baseline Transformers with 2 to 3 times fewer parameters on average. Our source code is available at: \url{https://github.com/sacmehta/delight}

下载PDF全文

下载文献需遵守相关版权规定

论文标题