论文标题
朝着可概括性的文本到SQL解析
Towards Generalizable and Robust Text-to-SQL Parsing
论文作者
论文摘要
文本到SQL解析解决了将自然语言问题映射到可执行的SQL查询的问题。在实践中,文本到SQL解析器经常遇到各种具有挑战性的场景,要求它们具有可推广和稳健性。尽管大多数现有的工作都解决了特定的概括或鲁棒性挑战,但我们旨在以更全面的方式进行研究。具体而言,我们认为文本到SQL解析器应为(1)在概括的三个级别上可以推广,即I.I.D.,零射门和组成,以及(2)(2)强大的对输入扰动的强大。为了增强解析器的这些能力,我们提出了一个新颖的TKK框架,该框架包括任务分解,知识获取和知识组成,以分阶段学习文本到SQL解析。通过将学习过程分为多个阶段,我们的框架提高了解析器获得一般SQL知识而不是捕获虚假模式的能力,从而使其更具普遍性和稳健性。在各种概括和鲁棒性设置下的实验结果表明,我们的框架在所有情况下都有效,并在蜘蛛,SPARC和COSQL数据集上实现最先进的性能。可以在https://github.com/alibabaresearch/damo-convai/tree/main/main/tkk上找到代码。
Text-to-SQL parsing tackles the problem of mapping natural language questions to executable SQL queries. In practice, text-to-SQL parsers often encounter various challenging scenarios, requiring them to be generalizable and robust. While most existing work addresses a particular generalization or robustness challenge, we aim to study it in a more comprehensive manner. In specific, we believe that text-to-SQL parsers should be (1) generalizable at three levels of generalization, namely i.i.d., zero-shot, and compositional, and (2) robust against input perturbations. To enhance these capabilities of the parser, we propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to-SQL parsing in stages. By dividing the learning process into multiple stages, our framework improves the parser's ability to acquire general SQL knowledge instead of capturing spurious patterns, making it more generalizable and robust. Experimental results under various generalization and robustness settings show that our framework is effective in all scenarios and achieves state-of-the-art performance on the Spider, SParC, and CoSQL datasets. Code can be found at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/tkk.