论文标题
减少了在机器学习模型中降低稳健的随机切割森林以进行分布的检测
Reduced Robust Random Cut Forest for Out-Of-Distribution detection in machine learning models
论文作者
论文摘要
大多数基于机器学习的回归剂从过去的观察结果有限的观察结果中提取信息,以便将来做出预测。因此,当对这些训练的模型的输入是具有与用于培训的数据明显不同的数据时,无法保证准确的预测。因此,使用这些模型在分布外输入数据上可能会导致与所需的预测结果完全不同,这不仅是错误的,而且在某些情况下也可能是危险的。这些机器学习模型在任何系统中的成功部署都需要一个检测系统,该系统应该能够区分分布和分配数据(即类似于培训数据)。在本文中,我们使用降低的鲁棒随机切割森林(RRRCF)数据结构引入了一种新的检测过程方法,该方法可用于小型和大数据集。与强大的随机切割森林(RRCF)相似,RRRCF是一种结构化的,但训练数据子空间的表示形式减少了。该方法对低维数据和高维数据的经验结果表明,有关数据的推断可以有效地进行/退出训练分布,并且该模型很容易训练,而无需不困难的高参数调整。本文讨论了两个不同的用例来测试和验证结果。
Most machine learning-based regressors extract information from data collected via past observations of limited length to make predictions in the future. Consequently, when input to these trained models is data with significantly different statistical properties from data used for training, there is no guarantee of accurate prediction. Consequently, using these models on out-of-distribution input data may result in a completely different predicted outcome from the desired one, which is not only erroneous but can also be hazardous in some cases. Successful deployment of these machine learning models in any system requires a detection system, which should be able to distinguish between out-of-distribution and in-distribution data (i.e. similar to training data). In this paper, we introduce a novel approach for this detection process using a Reduced Robust Random Cut Forest (RRRCF) data structure, which can be used on both small and large data sets. Similar to the Robust Random Cut Forest (RRCF), RRRCF is a structured, but a reduced representation of the training data sub-space in form of cut trees. Empirical results of this method on both low and high-dimensional data showed that inference about data being in/out of training distribution can be made efficiently and the model is easy to train with no difficult hyper-parameter tuning. The paper discusses two different use-cases for testing and validating results.