自动分化不是用于系统发育梯度计算的灵丹妙药

论文标题

自动分化不是用于系统发育梯度计算的灵丹妙药

Automatic differentiation is no panacea for phylogenetic gradient computation

论文作者

Fourment, Mathieu, Swanepoel, Christiaan J., Galloway, Jared G., Ji, Xiang, Gangavarapu, Karthik, Suchard, Marc A., Matsen IV, Frederick A.

论文摘要

概率模型的可能性相对于其参数的梯度对于现代计算统计和机器学习至关重要。这些计算很容易通过在通用机器学习库中实现的自动分化（例如Tensorflow和Pytorch）实现。尽管这些库是高度优化的，但尚不清楚它们的通用性质是否会限制其算法复杂性或与系统发育特异性代码相比的系统发育情况的算法复杂性或实施速度。在本文中，我们比较了分离的系统发育可能性函数的六个梯度实现，也比较了变异推理程序的一部分。我们发现，尽管自动差异可以在树大小上大致线性扩展，但它比对树的可能性和比率转换操作的精心实现的梯度计算要慢得多。我们得出的结论是，将系统发育库与机器学习库相结合的混合方法将提供速度和模型灵活性向前发展的最佳组合。

Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via automatic differentiation implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully-implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题