论文标题
随机量化:数据不可知的自我监督学习的通用增强
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
论文作者
论文摘要
自我监督的表示学习遵循保留数据的一部分并任务网络以从其余部分预测其的范式。在许多技术中,数据增强是建立信息差距的核心。为此,蒙版已成为一种通用且强大的工具,其中内容沿顺序维度删除,例如,图像中的空间,音频时空和语言语法。在本文中,我们通过利用精度冗余来探索通用数据扩展的正交通道维度。通过不均匀的量化器对每个通道的数据进行量化,其量化值在随机采样的量化箱中随机采样。从另一个角度来看,量化类似于通道遮罩,因为它可以去除每个垃圾箱内的信息,但可以保留跨箱中的信息。我们的方法显着超过了现有的通用数据增强方法,同时显示出针对特定于模态的增强的PAR性能。我们全面评估了有关视觉,音频,3D点云以及由各种数据模式组成的DABS基准的方法。该代码可在https://github.com/microsoft/random_quantize上找到。
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Among many techniques, data augmentation lies at the core for creating the information gap. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation by exploiting precision redundancy. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. Our approach significantly surpasses existing generic data augmentation methods, while showing on par performance against modality-specific augmentations. We comprehensively evaluate our approach on vision, audio, 3D point clouds, as well as the DABS benchmark which is comprised of various data modalities. The code is available at https: //github.com/microsoft/random_quantize.