论文标题
学习对设备推理的压缩嵌入
Learning Compressed Embeddings for On-Device Inference
论文作者
论文摘要
在深度学习中,嵌入被广泛用于代表单词,应用和电影等分类实体。嵌入图层将每个实体映射到唯一的向量,从而导致层的内存要求与实体数量成比例。在推荐域中,给定类别可以拥有数十万个实体,其嵌入层可以占用千兆内的内存。这些网络的规模使它们难以在资源约束环境中部署。在本文中,我们提出了一种新颖的方法,用于减少嵌入式桌子的大小,同时仍将每个实体映射到其独特的嵌入中。我们没有维护完整的嵌入式表,而是使用两个单独的嵌入表构建每个实体的嵌入“飞行”。第一张桌子采用哈希施加迫使多个实体共享嵌入。第二个表包含一个每个实体的可训练重量,从而使模型可以区分共享相同嵌入的实体。由于这两个表是共同训练的,因此网络能够学习每个实体的唯一嵌入,从而帮助其保持了与具有未压缩嵌入表的模型相似的歧视能力。我们称这种方法为memcom(多插图压缩)。我们将与最新的模型压缩技术进行比较,用于多个问题类别,包括分类和排名。在四个流行的推荐系统数据集上,MEMCOM在NDCG中相对损失4%,同时将我们推荐模型的输入嵌入尺寸压缩为16倍,4X,12X和40X。 MEMCOM的表现优于最新技术,在各自的压缩比下,NDCG的相对损失达到了16%,6%,10%和8%。此外,MEMCOM能够在数据集中将RankNet排名模型压缩为32倍,其中数百万用户与游戏的交互作用,而NDCG中只有1%的相对损失。
In deep learning, embeddings are widely used to represent categorical entities such as words, apps, and movies. An embedding layer maps each entity to a unique vector, causing the layer's memory requirement to be proportional to the number of entities. In the recommendation domain, a given category can have hundreds of thousands of entities, and its embedding layer can take gigabytes of memory. The scale of these networks makes them difficult to deploy in resource constrained environments. In this paper, we propose a novel approach for reducing the size of an embedding table while still mapping each entity to its own unique embedding. Rather than maintaining the full embedding table, we construct each entity's embedding "on the fly" using two separate embedding tables. The first table employs hashing to force multiple entities to share an embedding. The second table contains one trainable weight per entity, allowing the model to distinguish between entities sharing the same embedding. Since these two tables are trained jointly, the network is able to learn a unique embedding per entity, helping it maintain a discriminative capability similar to a model with an uncompressed embedding table. We call this approach MEmCom (Multi-Embedding Compression). We compare with state-of-the-art model compression techniques for multiple problem classes including classification and ranking. On four popular recommender system datasets, MEmCom had a 4% relative loss in nDCG while compressing the input embedding sizes of our recommendation models by 16x, 4x, 12x, and 40x. MEmCom outperforms the state-of-the-art techniques, which achieved 16%, 6%, 10%, and 8% relative loss in nDCG at the respective compression ratios. Additionally, MEmCom is able to compress the RankNet ranking model by 32x on a dataset with millions of users' interactions with games while incurring only a 1% relative loss in nDCG.