论文标题
Superbloom:Bloom Filter符合变压器
Superbloom: Bloom filter meets Transformer
论文作者
论文摘要
我们将自然语言模型中的单词碎片的想法扩展到不透明ID上的机器学习任务。这是通过应用哈希函数将每个ID映射到较小空间中的多个哈希令牌的方法来实现的,类似于Bloom滤波器。我们表明,通过将多层变压器应用于这些Bloom滤清器摘要,我们可以以高精度获得型号。他们的表现要优于相似尺寸的模型,而没有哈希的模型,并且在很大程度上,使用具有相同计算预算的采样SoftMax训练了更大尺寸的型号。我们的主要观察结果是,使用多层变压器进行Bloom滤波器消化以消除散布输入中的歧义很重要。我们认为,这为解决词汇大小较大的问题提供了一种替代方法。
We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.