论文标题
MacSum:具有混合属性的可控摘要
MACSum: Controllable Summarization with Mixed Attributes
论文作者
论文摘要
可控摘要允许用户生成具有指定属性的自定义摘要。但是,由于缺乏对受控摘要的指定注释,现有的作品必须通过调整通用摘要基准来制作伪数据集。此外,大多数研究重点是单独控制单个属性(例如,简短的摘要或高度抽象的摘要),而不是将属性混合在一起(例如,简短而高度抽象的摘要)。在本文中,我们提出了Macsum,这是第一个用于控制混合属性的人类通知数据集。它包含来自两个域,新闻文章和对话的源文本,其中包含由五个设计属性(长度,挖掘,特殊性,主题和说话者)控制的人类通知摘要。我们提出了两种简单有效的参数效率方法,用于基于硬及时调整和软前缀调整的新任务的新任务。结果和分析表明,硬及时模型对所有指标和人类评估产生最佳性能。但是,对于摘要任务,混合属性控制仍然具有挑战性。我们的数据集和代码可在https://github.com/psunlpgroup/macsum上找到。
Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.