微小伯特：通过参数效率递归变压器进行模型蒸馏

论文标题

微小伯特：通过参数效率递归变压器进行模型蒸馏

MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers

论文作者

Nouriborji, Mohammadmahdi, Rohanian, Omid, Kouchaki, Samaneh, Clifton, David A.

论文摘要

由于其在下游应用中的出色表现，近年来，预训练的语言模型（LMS）已成为自然语言处理（NLP）不可或缺的一部分。尽管取得了巨大的成功，但LMS的可用性受到计算和时间复杂性的限制，以及它们的规模增加。一个被称为“过度参数化”的问题。文献中已经提出了不同的策略来减轻这些问题，目的是创建有效的紧凑型模型，几乎与肿的同行的性能与可忽略的绩效损失相匹配。在这一研究领域，最受欢迎的技术之一是模型蒸馏。另一种有效但未充分利用的技术是跨层参数共享。在这项工作中，我们将这两种策略结合在一起，并呈现Minialbert，这是一种将完全参数化的LMS（例如BERT）转换为紧凑的递归学生的技术。此外，我们研究了瓶颈适配器在递归学生的层次适应中的应用，并探索适配器调整的功效，以微调紧凑型模型。我们在许多一般和生物医学NLP任务上测试了建议的模型，以证明它们的生存能力，并将其与最先进的和其他现有紧凑型模型进行比较。实验中使用的所有代码均可在https://github.com/nlpie-research/minialbert上找到。可以从https://huggingface.co/nlpie访问我们的预训练的紧凑型模型。

Pre-trained Language Models (LMs) have become an integral part of Natural Language Processing (NLP) in recent years, due to their superior performance in downstream applications. In spite of this resounding success, the usability of LMs is constrained by computational and time complexity, along with their increasing size; an issue that has been referred to as `overparameterisation'. Different strategies have been proposed in the literature to alleviate these problems, with the aim to create effective compact models that nearly match the performance of their bloated counterparts with negligible performance losses. One of the most popular techniques in this area of research is model distillation. Another potent but underutilised technique is cross-layer parameter sharing. In this work, we combine these two strategies and present MiniALBERT, a technique for converting the knowledge of fully parameterised LMs (such as BERT) into a compact recursive student. In addition, we investigate the application of bottleneck adapters for layer-wise adaptation of our recursive student, and also explore the efficacy of adapter tuning for fine-tuning of compact models. We test our proposed models on a number of general and biomedical NLP tasks to demonstrate their viability and compare them with the state-of-the-art and other existing compact models. All the codes used in the experiments are available at https://github.com/nlpie-research/MiniALBERT. Our pre-trained compact models can be accessed from https://huggingface.co/nlpie.

下载PDF全文

下载文献需遵守相关版权规定

论文标题