论文标题
tencentpretratain:一种可扩展且灵活的工具包,用于不同模态的预训练模型
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
论文作者
论文摘要
最近,文本域中预训练的成功已完全扩展到视觉,音频和跨模式场景。所提出的不同模式的训练前模型显示,其模型结构中同质性的趋势上升,这为在统一框架内实施不同的训练模型的机会带来了机会。在本文中,我们提出了TencentPretrain,这是一种工具包,支持不同模态的预训练模型。 Tencentpretrain的核心特征是模块化设计。工具包将预训练模型均匀分为5个组件:嵌入,编码器,目标嵌入,解码器和目标。由于每个组件中几乎所有常见的模块都提供,因此用户可以从不同组件中选择所需的模块来构建完整的预训练模型。模块化设计使用户能够有效地复制现有的预训练模型或建立全新的模型。我们在文本,视觉和音频基准测试中测试工具包,并表明它可以与原始实现的性能相匹配。
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.