论文标题
两阶段建模以置信度
Two-stage Modeling for Prediction with Confidence
论文作者
论文摘要
神经网络的使用在各种应用中都非常成功。但是,最近已经观察到,在分布转移的条件下,很难概括神经网络的性能。已经做出了几项努力来确定潜在的分数输入。尽管现有文献在图像和文本数据方面取得了重大进展,但财务已被忽略。本文的目的是调查信用评分问题的分配变化,这是金融最重要的应用之一。对于潜在的分布转移问题,我们提出了一个新型的两阶段模型。使用分布外检测方法,首先将数据分为自信和不自信的集合。作为第二步,我们利用域知识进行了均值优化的优化,以便为不自信的样本提供可靠的界限。使用经验结果,我们证明了我们的模型为绝大多数数据集提供了可靠的预测。这只是数据集的一小部分,很难判断,我们将其留在了人类的判断中。基于两个阶段模型,已经做出了高度自信的预测,并且与该模型相关的潜在风险已大大降低。
The use of neural networks has been very successful in a wide variety of applications. However, it has recently been observed that it is difficult to generalize the performance of neural networks under the condition of distributional shift. Several efforts have been made to identify potential out-of-distribution inputs. Although existing literature has made significant progress with regard to images and textual data, finance has been overlooked. The aim of this paper is to investigate the distribution shift in the credit scoring problem, one of the most important applications of finance. For the potential distribution shift problem, we propose a novel two-stage model. Using the out-of-distribution detection method, data is first separated into confident and unconfident sets. As a second step, we utilize the domain knowledge with a mean-variance optimization in order to provide reliable bounds for unconfident samples. Using empirical results, we demonstrate that our model offers reliable predictions for the vast majority of datasets. It is only a small portion of the dataset that is inherently difficult to judge, and we leave them to the judgment of human beings. Based on the two-stage model, highly confident predictions have been made and potential risks associated with the model have been significantly reduced.