论文标题
稳定尖峰神经元训练
Stabilizing Spiking Neuron Training
论文作者
论文摘要
稳定论证通常用于防止学习算法的活动和权重增加,从而阻碍了概括。但是,稳定性条件可以与提高尖峰神经元能源效率所需的稀疏性冲突。尽管如此,它也可以提供解决方案。实际上,尖峰神经形态计算使用二元活动来提高人工智能能源效率。但是,其非平滑度需要近似梯度,称为替代梯度(SG),才能通过深度学习缩小性能差距。文献中已经提出了几种SG,但尚不清楚如何确定给定任务和网络的最佳SG。因此,我们旨在通过稳定论证来定义最佳的SG,以减少对网格搜索的需求。实际上,我们表明,即使总体而言,快速sigmoid的衍生品倾向于超越对方,因此更复杂的任务和网络需要更仔细的SG选择。因此,我们设计了一种基于稳定性的理论方法,可以在训练最常见的尖峰神经元,泄漏的集成和火(LIF)之前选择初始化和SG形状。由于我们的稳定性方法表明在初始化时使用高点火速率,这在神经形态文献中是非标准的,因此我们表明,根据SG形状,高的初始点火速率加上逐渐引入的稀疏性损失术语,可以逐渐引入损失术语。我们基于稳定性的理论解决方案,找到了一种SG和初始化,从实验上可以提高准确性。我们展示了如何使用它来减少SG的潮湿,清晰度和尾巴的广泛网格搜索。我们还表明,我们的稳定性概念可以扩展到适用于不同的LIF变体,例如Decolle和波动驱动的初始化。
Stability arguments are often used to prevent learning algorithms from having ever increasing activity and weights that hinder generalization. However, stability conditions can clash with the sparsity required to augment the energy efficiency of spiking neurons. Nonetheless it can also provide solutions. In fact, spiking Neuromorphic Computing uses binary activity to improve Artificial Intelligence energy efficiency. However, its non-smoothness requires approximate gradients, known as Surrogate Gradients (SG), to close the performance gap with Deep Learning. Several SG have been proposed in the literature, but it remains unclear how to determine the best SG for a given task and network. Thus, we aim at theoretically define the best SG, through stability arguments, to reduce the need for grid search. In fact, we show that more complex tasks and networks need more careful choice of SG, even if overall the derivative of the fast sigmoid tends to outperform the other, for a wide range of learning rates. We therefore design a stability based theoretical method to choose initialization and SG shape before training on the most common spiking neuron, the Leaky Integrate and Fire (LIF). Since our stability method suggests the use of high firing rates at initialization, which is non-standard in the neuromorphic literature, we show that high initial firing rates, combined with a sparsity encouraging loss term introduced gradually, can lead to better generalization, depending on the SG shape. Our stability based theoretical solution, finds a SG and initialization that experimentally result in improved accuracy. We show how it can be used to reduce the need of extensive grid-search of dampening, sharpness and tail-fatness of the SG. We also show that our stability concepts can be extended to be applicable on different LIF variants, such as DECOLLE and fluctuations-driven initializations.