论文标题

Strubert:用于餐桌搜索和匹配的结构感知的BERT

StruBERT: Structure-aware BERT for Table Search and Matching

论文作者

Trabelsi, Mohamed, Chen, Zhiyu, Zhang, Shuo, Davison, Brian D., Heflin, Jeff

论文摘要

大量信息存储在数据表中。用户可以使用基于关键字的查询搜索数据表。表主要由数据值组成,这些数据值是在排成隐式结构信息的行中组织的。表通常伴随着构成文本信息的标题,页面标题等等辅助信息。在表检索中了解文本和结构信息之间的联系是一个重要但被忽视的方面,因为以前的方法独立处理每个信息来源。此外,用户可以搜索与现有表相似的数据表,并且该设置可以看作是基于内容的表检索。在本文中,我们提出了Strubert,这是一种结构感知的BERT模型,该模型融合了数据表的文本和结构信息,以生成数据表的文本和表格内容的上下文感知表示。 Strubert特征集成在新的端到端神经排名模型中,以求解三个与表相关的下游任务:基于关键字和基于内容的表检索以及表相似性。我们使用三个数据集评估了我们的方法,并在检索和分类指标上比最先进的方法证明了实质性改进。

A large amount of information is stored in data tables. Users can search for data tables using a keyword-based query. A table is composed primarily of data values that are organized in rows and columns providing implicit structural information. A table is usually accompanied by secondary information such as the caption, page title, etc., that form the textual information. Understanding the connection between the textual and structural information is an important yet neglected aspect in table retrieval as previous methods treat each source of information independently. In addition, users can search for data tables that are similar to an existing table, and this setting can be seen as a content-based table retrieval. In this paper, we propose StruBERT, a structure-aware BERT model that fuses the textual and structural information of a data table to produce context-aware representations for both textual and tabular content of a data table. StruBERT features are integrated in a new end-to-end neural ranking model to solve three table-related downstream tasks: keyword- and content-based table retrieval, and table similarity. We evaluate our approach using three datasets, and we demonstrate substantial improvements in terms of retrieval and classification metrics over state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源