预训练的语言变压器是通用图像分类器

论文标题

预训练的语言变压器是通用图像分类器

Pre-Trained Language Transformers are Universal Image Classifiers

论文作者

Goel, Rahul, Sulaiman, Modar, Noorbakhsh, Kimia, Sharifi, Mahdi, Sharma, Rajesh, Jamshidi, Pooyan, Roy, Kallol

论文摘要

面部图像揭示了许多隐藏的个人特征，例如年龄，性别，种族，健康，情感和心理学。了解这些特征将有助于对不同属性进行分类。在本文中，我们提出了一种使用验证的变压器模型对图像进行分类的新方法。我们将验证的变压器应用于犯罪和非犯罪类别中面部图像的二元分类。训练有验证的GPT-2变压器可以生成文本，然后进行微调以对面部图像进行分类。在带有图像的填充过程中，GT-2的大多数层在反向传播过程中被冷冻，并且模型是冷冻预处理的变压器（FPT）。 FPT充当通用图像分类器，本文显示了FPT在面部图像上的应用。我们还在加密图像上使用FPT进行分类。我们的FPT在原始面部图像和加密图像上都显示出很高的精度。我们假设由于其尺寸较大而获得的元学习能力FPT获得了fpt，并通过理论和实验进行了大尺寸的训练。 GPT-2经过训练，可以通过自回归过程一次产生一个单词令牌，被迫进行重尾分配。然后，FPT使用重尾属性作为其元学习能力来分类图像。我们的工作显示出一种避免在图像的机器分类过程中避免偏见的方法。FPT编码世俗的知识，因为它在分类过程中使用了一个文本。由于文本从文本中获得的附加背景，分类的统计误差被减少。我们的论文显示了使用加密数据进行分类的伦理维度。犯罪图像对在边界上的共享很敏感，但在很大程度上撤离了伦理问题，但在很大程度上撤离了伦理问题。在加密图像上显示出良好的分类精确性，可以在加密的图像上精确地研究隐私权研究的有望进一步研究。

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both raw facial images and encrypted images. We hypothesize the meta-learning capacity FPT gained because of its large size and trained on a large size with theory and experiments. The GPT-2 trained to generate a single word token at a time, through the autoregressive process, forced to heavy-tail distribution. Then the FPT uses the heavy-tail property as its meta-learning capacity for classifying images. Our work shows one way to avoid bias during the machine classification of images.The FPT encodes worldly knowledge because of the pretraining of one text, which it uses during the classification. The statistical error of classification is reduced because of the added context gained from the text.Our paper shows the ethical dimension of using encrypted data for classification.Criminal images are sensitive to share across the boundary but encrypted largely evades ethical concern.FPT showing good classification accuracy on encrypted images shows promise for further research on privacy-preserving machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题