论文标题
来自可扩展远的监督的大型话语树库
Large Discourse Treebanks from Scalable Distant Supervision
论文作者
论文摘要
话语解析是自然语言处理中必不可少的上游任务,对许多现实世界应用具有很大的影响。尽管其广泛认可的角色,但最近的话语解析器(以及下游任务)仍然依赖于小规模的人类宣传的话语树库,试图从一些狭窄的域中非常有限的数据中推断出通用物质的话语结构。为了克服这种可怕的情况,并允许对较大,更多样化和独立于域的数据集进行培训,我们提出了一个框架,以产生“银色标准”话语树,从遥远的监督对情感分析的辅助任务进行。
Discourse parsing is an essential upstream task in Natural Language Processing with strong implications for many real-world applications. Despite its widely recognized role, most recent discourse parsers (and consequently downstream tasks) still rely on small-scale human-annotated discourse treebanks, trying to infer general-purpose discourse structures from very limited data in a few narrow domains. To overcome this dire situation and allow discourse parsers to be trained on larger, more diverse and domain-independent datasets, we propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.