论文标题

比较自动新闻的计算体系结构

Comparing Computational Architectures for Automated Journalism

论文作者

Sym, Yan V., Campos, João Gabriel M., José, Marcos M., Cozman, Fabio G.

论文摘要

大多数NLG系统都是按照基于模板或基于管道的架构设计的。已经提出了最新的数据到文本生成神经模型,并具有端到端的深度学习风味,该风味在没有明确的中介表示的情况下处理非语言的非语言输入。这项研究比较了从结构化数据中生成巴西葡萄牙文本的最常使用的方法。结果表明,生成过程中的显式中间步骤比神经端到端体系结构产生的文本更好,避免了数据幻觉,同时更好地推广了看不见的输入。代码和语料库公开可用。

The majority of NLG systems have been designed following either a template-based or a pipeline-based architecture. Recent neural models for data-to-text generation have been proposed with an end-to-end deep learning flavor, which handles non-linguistic input in natural language without explicit intermediary representations. This study compares the most often employed methods for generating Brazilian Portuguese texts from structured data. Results suggest that explicit intermediate steps in the generation process produce better texts than the ones generated by neural end-to-end architectures, avoiding data hallucination while better generalizing to unseen inputs. Code and corpus are publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源