小角色模型匹配在内存约束下自动完成的大单词模型

论文标题

小角色模型匹配在内存约束下自动完成的大单词模型

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

论文作者

Jawahar, Ganesh, Mukherjee, Subhabrata, Dey, Debadeepta, Abdul-Mageed, Muhammad, Lakshmanan, Laks V. S., Mendes, Caio Cesar Teodoro, de Rosa, Gustavo Henrique, Shah, Shital

论文摘要

AutoComplete是用户输入一段文本（称为提示）的任务，该文本由模型调节以生成语义相干延续。此任务的现有作品主要集中在具有高频用户提示模式（或重点提示）的数据集（例如，电子邮件，聊天）上，其中基于单词的语言模型非常有效。在这项工作中，我们研究了更具挑战性的开放域设置，该设置包括低频用户及时模式（或广泛提示，例如提示大约93rd Academy Awards），并展示了基于字符的语言模型的有效性。我们在内存约束设置（例如边缘设备和智能手机）下研究此问题，其中基于字符的表示有效减少整体模型大小（就参数而言）。我们使用wikitext-103基准来模拟广泛的提示，并在控制模型大小的情况下，以确切的匹配精度竞争字符模型，以确切的匹配精度匹配精度。例如，我们表明，20m参数字符模型的性能类似于香草设置中的80m参数单词模型。我们进一步提出了新的方法来通过以组成信息的形式合并感应性偏见和从大单词模型转移的形式来改善角色模型。该工作中使用的数据集和代码可在https://github.com/ubc-nlp/char_autocomplete上找到。

Autocomplete is a task where the user inputs a piece of text, termed prompt, which is conditioned by the model to generate semantically coherent continuation. Existing works for this task have primarily focused on datasets (e.g., email, chat) with high frequency user prompt patterns (or focused prompts) where word-based language models have been quite effective. In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e.g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models. We study this problem under memory-constrained settings (e.g., edge devices and smartphones), where character-based representation is effective in reducing the overall model size (in terms of parameters). We use WikiText-103 benchmark to simulate broad prompts and demonstrate that character models rival word models in exact match accuracy for the autocomplete task, when controlled for the model size. For instance, we show that a 20M parameter character model performs similar to an 80M parameter word model in the vanilla setting. We further propose novel methods to improve character models by incorporating inductive bias in the form of compositional information and representation transfer from large word models. Datasets and code used in this work are available at https://github.com/UBC-NLP/char_autocomplete.

下载PDF全文

下载文献需遵守相关版权规定

论文标题