salo：有效的空间加速器，使长序列的混合稀疏注意机制

论文标题

salo：有效的空间加速器，使长序列的混合稀疏注意机制

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

论文作者

Shen, Guan, Zhao, Jieru, Chen, Quan, Leng, Jingwen, Li, Chao, Guo, Minyi

论文摘要

变形金刚的注意机制有效地从输入序列中提取了相关信息。但是，自我注意力的二次复杂性W.R.T序列长度会造成沉重的计算和记忆负担，尤其是对于长序列的任务。现有的加速器在这些任务中面临性能降解。为此，我们提议Salo为长序列提供杂交稀疏注意机制。 Salo包含一个数据调度程序，将混合稀疏注意模式映射到硬件和空间加速器上，以执行有效的注意力计算。我们表明，与GPU和CPU实施相比，Salo平均达到了17.66 X和89.33倍的速度，即典型的工作负载，即Longformer和VIL。

The attention mechanisms of transformers effectively extract pertinent information from the input sequence. However, the quadratic complexity of self-attention w.r.t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences. Existing accelerators face performance degradation in these tasks. To this end, we propose SALO to enable hybrid sparse attention mechanisms for long sequences. SALO contains a data scheduler to map hybrid sparse attention patterns onto hardware and a spatial accelerator to perform the efficient attention computation. We show that SALO achieves 17.66x and 89.33x speedup on average compared to GPU and CPU implementations, respectively, on typical workloads, i.e., Longformer and ViL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题