桌面到文本生成和预训练

论文标题

桌面到文本生成和预训练

Table-To-Text generation and pre-training with TabT5

论文作者

Andrejczuk, Ewa, Eisenschlos, Julian Martin, Piccinno, Francesco, Krichene, Syrine, Altun, Yasemin

论文摘要

仅像小吃一样，仅编码变压器模型已成功地应用于不同的表格理解任务（Herzig et al。，2020）。这些体系结构的主要局限性是它们被限制在类似分类的任务（例如细胞选择或需要检测）上。我们提出TABT5，这是一种编码器模型，该模型基于表和文本输入生成自然语言文本。 TABT5通过合并解码器组件并利用表特定于表的嵌入和预训练来克服仅编码的限制。 TABT5在多个域上实现了新的最新结果，包括序列准确性提高15％的电子表格公式预测，QA序列准确性增加了2.5％，并且数据对文本生成2.5％，而BLEU则增加了2.5％。

Encoder-only transformer models have been successfully applied to different table understanding tasks, as in TAPAS (Herzig et al., 2020). A major limitation of these architectures is that they are constrained to classification-like tasks such as cell selection or entailment detection. We present TABT5, an encoder-decoder model that generates natural language text based on tables and textual inputs. TABT5 overcomes the encoder-only limitation by incorporating a decoder component and leverages the input structure with table specific embeddings and pre-training. TABT5 achieves new state-of-the-art results on several domains, including spreadsheet formula prediction with a 15% increase in sequence accuracy, QA with a 2.5% increase in sequence accuracy and data-to-text generation with a 2.5% increase in BLEU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题