评估和改善多模式抽象摘要中的事实

论文标题

评估和改善多模式抽象摘要中的事实

Evaluating and Improving Factuality in Multimodal Abstractive Summarization

论文作者

Wan, David, Bansal, Mohit

论文摘要

当前用于评估抽象文档摘要事实的指标与人类判断力达到了很高的相关性，但它们不考虑视觉方式，因此不足以进行视觉和语言摘要。我们提出了ClipbertScore，这是一个简单的加权组合和BertScore的简单加权组合，以分别利用图像 - 萨默省和文档 - 萨默里之间的稳健性和强大的事实检测性能。接下来，由于缺乏元评估基准来评估多模式事实指标的质量，因此我们收集了关于文件和图像的人类事实判断。我们表明，在零射击设置中两个指标的简单组合比现有的事实度量指标实现了更高的相关性，以胜过现有的多模式汇总度量指标，并且具有强大的多模式事实性指标的竞争性，并且专门针对任务进行了微调。我们的详尽分析表明，ClipbertScore及其在四个事实度量评估基准上的鲁棒性和高相关性。最后，我们证明了我们的ClipbertScore指标的两个实际下游应用：在训练过程中选择要关注的重要图像，并作为增强学习的奖励，以提高多模式摘要生成W.R.T自动和人类评估的事实。我们的数据和代码可在https://github.com/meetdavidwan/faithful-multimodal-summ上公开获取

Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTScore, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively with strong multimodal factuality metrics specifically fine-tuned for the task. Our thorough analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks. Finally, we demonstrate two practical downstream applications of our CLIPBERTScore metric: for selecting important images to focus on during training, and as a reward for reinforcement learning to improve factuality of multimodal summary generation w.r.t automatic and human evaluation. Our data and code are publicly available at https://github.com/meetdavidwan/faithful-multimodal-summ

下载PDF全文

下载文献需遵守相关版权规定

论文标题