论文标题
文本到SQL的混合排名网络
Hybrid Ranking Network for Text-to-SQL
论文作者
论文摘要
在本文中,我们研究了如何利用文本到SQL中的预训练的语言模型。我们认为,以前的方法通过将所有列与NL问题连接在一起,并将其馈送到编码阶段的基本语言模型中,从而利用了基本语言模型。我们提出了一种称为Hybrid排名网络(Hydranet)的整洁方法,该方法将问题分解为列的排名和解码,最后将列的输出组装到SQL查询中。在这种方法中,为编码器提供了一个NL问题和一个单独的列,该列与原始任务Bert/Roberta的培训完全一致,因此我们避免了任何临时池或其他编码层在先前方法中都是必要的。 WikiSQL数据集的实验表明,提出的方法非常有效,在排行榜上获得了最高位置。
In this paper, we study how to leverage pre-trained language models in Text-to-SQL. We argue that previous approaches under utilize the base language models by concatenating all columns together with the NL question and feeding them into the base language model in the encoding stage. We propose a neat approach called Hybrid Ranking Network (HydraNet) which breaks down the problem into column-wise ranking and decoding and finally assembles the column-wise outputs into a SQL query by straightforward rules. In this approach, the encoder is given a NL question and one individual column, which perfectly aligns with the original tasks BERT/RoBERTa is trained on, and hence we avoid any ad-hoc pooling or additional encoding layers which are necessary in prior approaches. Experiments on the WikiSQL dataset show that the proposed approach is very effective, achieving the top place on the leaderboard.