树的合奏层：可不同性符合条件计算

论文标题

树的合奏层：可不同性符合条件计算

The Tree Ensemble Layer: Differentiability meets Conditional Computation

论文作者

Hazimeh, Hussein, Ponomareva, Natalia, Mol, Petros, Tan, Zhenyu, Mazumder, Rahul

论文摘要

神经网络和树木合奏是最先进的学习者，每个学习者都有其独特的统计和计算优势。我们的目的是通过引入由由可区分决策树（又称软树）组成的神经网络的新层来结合这些优势。尽管可区分的树在文献中表现出令人鼓舞的结果，但它们通常在训练和推理方面速度缓慢，因为它们不支持有条件的计算。我们通过引入新的稀疏激活功能来缓解此问题，并通过开发利用稀疏性的专门向前和后传播算法来实现真正的条件计算。我们有效的算法为使用一阶方法（例如SGD）在深层和宽树集合上共同训练铺平了道路。与文献中使用的可区分树相比，在23个分类数据集上进行的实验表明，与梯度增强的树相比，参数数量的速度超过20倍，同时保持了竞争性能。此外，对CIFAR，MNIST和时尚MNIST进行的实验表明，用我们的树层代替CNN中的密集层可将测试损失减少7-53％，而参数数量则减少了8倍。我们提供带有keras api的开源张量实现。

Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e.g., SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.

下载PDF全文

下载文献需遵守相关版权规定

论文标题