论文标题
用于大型推荐系统嵌入的频率感知软件缓存
A Frequency-aware Software Cache for Large Recommendation System Embeddings
论文作者
论文摘要
深度学习推荐模型(DLRMS)已被广泛应用于互联网公司。 DLRM的嵌入表太大,无法完全适合GPU内存。我们通过利用目标数据集的ID频率统计信息来动态管理CPU和GPU内存空间中的嵌入式表的基于GPU的软件缓存方法。我们提出的软件缓存以同步更新方式有效地在GPU上培训整个DLRM。与广泛使用的混合平行训练方法相结合,它也将其缩放到多个GPU。评估我们的原型系统表明,我们只能保留GPU中嵌入参数的1.5%,以获得体面的端到端训练速度。
Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embedding table in the CPU and GPU memory space by leveraging the id's frequency statistics of the target dataset. Our proposed software cache is efficient in training entire DLRMs on GPU in a synchronized update manner. It is also scaled to multiple GPUs in combination with the widely used hybrid parallel training approaches. Evaluating our prototype system shows that we can keep only 1.5% of the embedding parameters in the GPU to obtain a decent end-to-end training speed.