论文标题
稀疏的深神经网络,用于建模铝电解动力学
Sparse deep neural networks for modeling aluminum electrolysis dynamics
论文作者
论文摘要
深度神经网络在建模复杂的非线性过程中非常流行,因为它们的非凡能力可以从数据中拟合任意非线性功能,并以最少的专家干预。但是,由于内部复杂性,它们几乎总是过度参数化和挑战。此外,由于该过程被卡在本地最小值中,因此找到学习模型参数的优化过程可能是不稳定的。在这项工作中,我们演示了稀疏正则化技术的价值,以显着降低模型的复杂性。我们对铝提取过程的情况进行了证明,该过程是具有许多相互关联的子过程的高度非线性系统。我们训练了一个密集连接的深神经网络,以建模过程,然后比较促进L1正则化对概括性,可解释性和训练稳定性的稀疏性影响。我们发现,与相应的密集神经网络相比,正则化显着降低了模型的复杂性。我们认为这使模型更容易解释,并表明训练具有不同参数初始化的稀疏神经网络的集合通常会收敛到具有相似学习的输入特征的相似模型结构。此外,这项实证研究表明,所产生的稀疏模型从小型训练集中比其密集的训练概述更好。
Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they are almost always overparameterized and challenging to interpret due to their internal complexity. Furthermore, the optimization process to find the learned model parameters can be unstable due to the process getting stuck in local minima. In this work, we demonstrate the value of sparse regularization techniques to significantly reduce the model complexity. We demonstrate this for the case of an aluminium extraction process, which is highly nonlinear system with many interrelated subprocesses. We trained a densely connected deep neural network to model the process and then compared the effects of sparsity promoting l1 regularization on generalizability, interpretability, and training stability. We found that the regularization significantly reduces model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable, and show that training an ensemble of sparse neural networks with different parameter initializations often converges to similar model structures with similar learned input features. Furthermore, the empirical study shows that the resulting sparse models generalize better from small training sets than their dense counterparts.