阿波罗：一种用于长形数值推理的优化训练方法

论文标题

阿波罗：一种用于长形数值推理的优化训练方法

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

论文作者

Sun, Jiashuo, Zhang, Hang, Lin, Chen, Su, Xiangdong, Gong, Yeyun, Guo, Jian

论文摘要

财务分析中的长形数值推理旨在生成一个推理计划，以计算给定问题的正确答案。先前的工作遵循了回猎者生成器框架，在该框架中，检索器从长形文档中选择关键事实，而发电机则基于检索事实生成推理程序。但是，他们同样对所有事实进行了处理，而没有考虑有或没有数字的事实的不同贡献。同时，在监督培训中忽略了该计划的一致性，从而降低了培训准确性和多样性。为了解决这些问题，我们提出了阿波罗以改善长期数值推理框架。对于猎犬而言，我们采用数字感知的负抽样策略，使得猎犬能够对关键的数值事实更具歧视性。对于发电机，我们基于程序执行结果的一致性设计了基于一致性的增强学习和目标程序增强策略。 FinQA和Convfinqa排行榜的实验结果验证了我们提出的方法的有效性，从而实现了新的最新技术。

Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题