通过谐波分析了解影响功能和数据模型

论文标题

通过谐波分析了解影响功能和数据模型

Understanding Influence Functions and Datamodels via Harmonic Analysis

论文作者

Saunshi, Nikunj, Gupta, Arushi, Braverman, Mark, Arora, Sanjeev

论文摘要

影响功能估计单个数据点对模型预测对测试数据的预测的影响，并适用于Koh和Liang [2017]中的深度学习。它们已用于检测数据中毒，检测有用和有害的例子，数据点组的影响等。最近，Ilyas等人。 [2022]引入了一种线性回归方法，它们称为数据amodels，以预测训练点对输出对测试数据的影响。当前的论文旨在为这种有趣的经验现象提供更好的理论理解。主要工具是谐波分析和噪声稳定性的想法。贡献包括：（a）根据傅立叶系数的精确表征。（b）一种有效的方法，用于估算最佳线性数据模型的残余误差和质量，而无需训练数据模型。（c）何时可能会或可能不会线性地添加数据群体的影响何时的新见解。

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into when influences of groups of datapoints may or may not add up linearly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题