FIDO：融合式二次编码器优化以提高性能和更快的推断

论文标题

FIDO：融合式二次编码器优化以提高性能和更快的推断

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

论文作者

de Jong, Michiel, Zemlyanskiy, Yury, Ainslie, Joshua, FitzGerald, Nicholas, Sanghai, Sumit, Sha, Fei, Cohen, William

论文摘要

Fusion-In-In-Decoder（FID）是一种强大的检索声音模型，可在许多知识密集型NLP任务上设置最先进的语言模型。但是，通过对标准T5模型进行最小化修改来选择用于FID的体系结构，我们的分析表明，对于检索功能的模型来说，这是高度优化的。特别是，FID将大部分Flops分配给编码器，而大多数推理时间是由解码器中的内存带宽约束而产生的。我们建议对FID架构进行两个简单的更改，以减轻内存带宽约束，并加快7倍的推理。这使我们能够以适度的成本使用更大的解码器。我们将FID表示为上述修改为FIDO，并表明它强烈改善了针对广泛推理预算的现有FID模型的性能。例如，FIDO-LARGE-XXL的性能比FID基本更快，并且比FID-LARGE的性能更好。

Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题