论文标题
具有方差感知网络的时间动作本地化
Temporal Action Localization with Variance-Aware Networks
论文作者
论文摘要
这项工作解决了使用方差感知网络(VAN)的时间动作本地化问题,即在输入和/或回归任务输出中使用二阶统计信息的DNN。我们首先提出了一个网络(VANP),当呈现输入的二阶统计信息时,即每个样本都有一个平均值和差异,它会传播整个网络的平均值和方差,以交付具有二阶统计信息的输出。在此框架中,输入和输出都可以解释为高斯人。为此,我们得出了可区分的分析解决方案或合理的近似值,以在常用的NN层之间传播。为了训练网络,我们根据预测的高斯和高斯在地面真相行动边界之间的KL差异定义了可区分的损失,并使用标准的背部传播。重要的是,VANP中的差异不需要任何其他参数,在测试期间,也不需要任何其他计算。在行动定位中,在汇总操作中计算了输入的均值和方差,这些方差通常用于将任意长的视频带到具有固定维度的向量。其次,我们提出了两种替代公式,以增加具有附加参数的回归网络的第一个(分别是最后一个)层,以便分别接受输入(分别在输出中预测)均值和变异。在动作定位问题中的结果表明,二阶统计量的合并在基线网络上有所改善,VANP超过了几乎所有其他两阶段网络的准确性,而无需涉及任何其他参数。
This work addresses the problem of temporal action localization with Variance-Aware Networks (VAN), i.e., DNNs that use second-order statistics in the input and/or the output of regression tasks. We first propose a network (VANp) that when presented with the second-order statistics of the input, i.e., each sample has a mean and a variance, it propagates the mean and the variance throughout the network to deliver outputs with second order statistics. In this framework, both the input and the output could be interpreted as Gaussians. To do so, we derive differentiable analytic solutions, or reasonable approximations, to propagate across commonly used NN layers. To train the network, we define a differentiable loss based on the KL-divergence between the predicted Gaussian and a Gaussian around the ground truth action borders, and use standard back-propagation. Importantly, the variances propagation in VANp does not require any additional parameters, and during testing, does not require any additional computations either. In action localization, the means and the variances of the input are computed at pooling operations, that are typically used to bring arbitrarily long videos to a vector with fixed dimensions. Second, we propose two alternative formulations that augment the first (respectively, the last) layer of a regression network with additional parameters so as to take in the input (respectively, predict in the output) both means and variances. Results in the action localization problem show that the incorporation of second order statistics improves over the baseline network, and that VANp surpasses the accuracy of virtually all other two-stage networks without involving any additional parameters.