Dirichlet工艺混合物中的基于CPU和GPU的分布式抽样，用于大规模分析

论文标题

Dirichlet工艺混合物中的基于CPU和GPU的分布式抽样，用于大规模分析

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

论文作者

Dinari, Or, Zamir, Raz, Fisher III, John W., Freifeld, Oren

论文摘要

在无监督学习的领域中，用Dirichlet工艺混合模型（DPMM）举例说明的贝叶斯非参数混合模型提供了一种原则性的方法，可将模型的复杂性适应数据。此类模型在群集数量未知的聚类任务中特别有用。尽管具有潜力和数学优雅，但DPMM尚未成为从业者广泛采用的主流工具。这可以说是由于误解是这些模型的扩展性较差，并且缺乏高性能（和用户友好）的软件工具，这些软件工具可以有效地处理大型数据集。在本文中，我们通过提出一个新的，易于使用的统计软件包来弥合这一实用差距，以用于可扩展的DPMM推理。更具体地说，我们为基于DPMM的高性能分布式采样推断提供了有效且易于修改的实现，在这些推论中，用户可以在多个机器，多核，多核，CPU实现（编写朱莉娅）和多种流式GPU实施（书面cuda/cuda/c++）之间进行自由选择。 CPU和GPU实现都带有一个常见（和可选的）Python包装器，为用户提供具有相同接口的单个入口点。在算法方面，我们的实现利用了来自（Chang and Fisher III，2013年）的领先DPMM采样器。虽然Chang和Fisher III的实现（用MATLAB/C ++编写）仅使用CPU，并且是为一台多核机器设计的，但我们在此提出的包装可以在多个多核机器上有效地分发计算，或者跨多个多核机或跨Mutiple GPU流。这会导致加速，减轻内存和存储限制，并使我们适合DPMMS明显更大的数据集和更高的维度，而不是先前（Chang and Fisher III，2013）或其他DPMM方法。

In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are particularly useful in clustering tasks where the number of clusters is unknown. Despite their potential and mathematical elegance, however, DPMMs have yet to become a mainstream tool widely adopted by practitioners. This is arguably due to a misconception that these models scale poorly as well as the lack of high-performance (and user-friendly) software tools that can handle large datasets efficiently. In this paper we bridge this practical gap by proposing a new, easy-to-use, statistical software package for scalable DPMM inference. More concretely, we provide efficient and easily-modifiable implementations for high-performance distributed sampling-based inference in DPMMs where the user is free to choose between either a multiple-machine, multiple-core, CPU implementation (written in Julia) and a multiple-stream GPU implementation (written in CUDA/C++). Both the CPU and GPU implementations come with a common (and optional) python wrapper, providing the user with a single point of entry with the same interface. On the algorithmic side, our implementations leverage a leading DPMM sampler from (Chang and Fisher III, 2013). While Chang and Fisher III's implementation (written in MATLAB/C++) used only CPU and was designed for a single multi-core machine, the packages we proposed here distribute the computations efficiently across either multiple multi-core machines or across mutiple GPU streams. This leads to speedups, alleviates memory and storage limitations, and lets us fit DPMMs to significantly larger datasets and of higher dimensionality than was possible previously by either (Chang and Fisher III, 2013) or other DPMM methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题