论文标题
MICROREC:硬件和数据结构解决方案的有效建议推断
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
论文作者
论文摘要
深度神经网络被广泛用于个性化推荐系统。与常规的DNN推理工作负载不同,由于查找嵌入表所需的许多随机内存访问,建议推理是内存的。该推论在延迟方面也受到严格限制,因为为用户提供建议必须以大约数十毫秒的速度进行。在本文中,我们提出了Microrec,这是推荐系统的高性能推理引擎。通过(1)重新设计嵌入中涉及的数据结构以减少所需的查找数量以及(2)利用FPGA加速器中的高频带存储器(HBM)可用性来解决延迟度,以减少所需的查找数量,以减少所需的查找数量,从而加速了建议推断,以加速建议。我们已经在FPGA板上实现了最终的设计,包括嵌入式查找步骤以及完整的推理过程。与优化的CPU基线(16个VCPU,启用AVX2)相比,MicroREC仅在嵌入查找时就达到了13.8〜14.7倍的加速,并且在吞吐量方面,整个建议推断的速度为2.5 $ 〜5.4倍。至于延迟,基于CPU的发动机需要毫秒来推断建议,而Microrec仅采用微秒,这是实时建议系统中的重要优势。
Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.