论文标题
对方差分析平均维度的有效估计,并应用于神经网络分类
Efficient estimation of the ANOVA mean dimension, with an application to neural net classification
论文作者
论文摘要
$ d $变量的黑匣子功能的平均维度是总结高订单相互作用支配的程度的便捷方法。它以$ 2^d-1 $差异组件表示,但可以写为$ d $ sobol'索引的总和,可以通过out off ost oft ot of oth of ot of oth of of oth of of of of of of of of of of tof。我们比较了这些留下一种方法的方差:一个称为绕组楼梯的吉布斯采样器,一个径向采样器,一次将每个变量从基线更改为每个变量,而一个从不重复功能评估的天真采样器,因此花费了其他两倍的方法。对于添加功能,径向和蜿蜒的楼梯最有效。对于乘法函数,如果因素具有较高的峰度,那么幼稚的方法很容易是最有效的。作为例证,我们考虑了MNIST数据集中的数字神经网络分类器的平均维度。分类器的功能为$ 784 $像素。对于这个问题,蜿蜒的楼梯是最好的算法。我们发现,最终软马克斯层的输入的尺寸从$ 1.35 $到$ 2.0 $不等。
The mean dimension of a black box function of $d$ variables is a convenient way to summarize the extent to which it is dominated by high or low order interactions. It is expressed in terms of $2^d-1$ variance components but it can be written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods. We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations and so costs about double the other methods. For an additive function the radial and winding stairs are most efficient. For a multiplicative function the naive method can easily be most efficient if the factors have high kurtosis. As an illustration we consider the mean dimension of a neural network classifier of digits from the MNIST data set. The classifier is a function of $784$ pixels. For that problem, winding stairs is the best algorithm. We find that inputs to the final softmax layer have mean dimensions ranging from $1.35$ to $2.0$.