论文标题
可扩展的模型专业框架,用于使用子模型的培训和推理及其在语音模型个性化中的应用
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization
论文作者
论文摘要
模型微调和适应已成为下游任务或域模型专业化的常见方法。使用轻量化适应性对整个模型或参数的子集进行微调显示,在不同的专业任务中取得了很大的成功。为大量域进行微调模型通常需要为每个构成缩放限制的每个域开始新的培训工作。一旦对这些模型进行了培训,部署它们也会为实时应用程序推断出重大的可伸缩性挑战。在本文中,基于先前的轻巧适应技术,我们提出了一个模块化框架,使我们能够基本上提高模型培训和推理的可扩展性。我们介绍可以快速,动态加载的子模型,以进行直接推断。我们还建议使用在同一培训工作中使用嵌入空间并行训练这些子模型的多种方法。我们在极端用例上测试我们的框架,该框架是非典型语音的语音模型个性化,需要每个用户的子模型。我们以固定的计算预算获得128倍的子模型吞吐量,而不会损失准确性。我们还表明,学习扬声器的空间可以进一步扩展并减少每个演讲者所需的个性化培训数据。
Model fine-tuning and adaptation have become a common approach for model specialization for downstream tasks or domains. Fine-tuning the entire model or a subset of the parameters using light-weight adaptation has shown considerable success across different specialization tasks. Fine-tuning a model for a large number of domains typically requires starting a new training job for every domain posing scaling limitations. Once these models are trained, deploying them also poses significant scalability challenges for inference for real-time applications. In this paper, building upon prior light-weight adaptation techniques, we propose a modular framework that enables us to substantially improve scalability for model training and inference. We introduce Submodels that can be quickly and dynamically loaded for on-the-fly inference. We also propose multiple approaches for training those Submodels in parallel using an embedding space in the same training job. We test our framework on an extreme use-case which is speech model personalization for atypical speech, requiring a Submodel for each user. We obtain 128x Submodel throughput with a fixed computation budget without a loss of accuracy. We also show that learning a speaker-embedding space can scale further and reduce the amount of personalization training data required per speaker.