论文标题

热量:用于变压器压缩的硬件有效的自动张量分解

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

论文作者

Gu, Jiaqi, Keller, Ben, Kossaifi, Jean, Anandkumar, Anima, Khailany, Brucek, Pan, David Z.

论文摘要

变压器在自然语言处理和计算机视觉方面取得了卓越的表现。他们的自我注意事项和前馈层过度参数化,限制了推理速度和能源效率。张量分解是一种有前途的技术,可以通过利用张量代数特性来减少参数冗余,以分解形式表达参数。先前的努力使用了手动或启发式分解设置,而无需进行硬件知觉的自定义,从而导致硬件效率不佳和性能较大。 在这项工作中,我们提出了一个被称为热的硬件感知张量分解框架,可有效探索可能分解的指数空间,并可以自动选择张力化形状和分解等级的选择,并使用硬件感知的合作选择。我们共同研究了张量的收缩路径优化和融合的Einsum映射策略,以弥合理论收益与实际硬件效率提高之间的差距。我们的两阶段知识蒸馏流量可以解决瓶颈的瓶颈,从而显着提高了分解变压器的最终精度。总体而言,我们在实验上表明,我们的硬件分解的BERT变体将能量延迟的产品降低了5.7倍,精度损失小于1.1%,并获得了比手动调整和启发式基准的更好的效率 - 准确性帕累托前沿。

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss and achieve a better efficiency-accuracy Pareto frontier than hand-tuned and heuristic baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源