大规模机器学习的调查

论文标题

大规模机器学习的调查

A Survey on Large-scale Machine Learning

论文作者

Wang, Meng, Fu, Weijie, He, Xiangnan, Hao, Shijie, Wu, Xindong

论文摘要

机器学习可以提供对数据的深入见解，从而使机器可以做出高质量的预测，并已广泛用于现实世界应用程序，例如文本挖掘，视觉分类和推荐系统。但是，在大规模数据上运行时，大多数复杂的机器学习方法都遭受了巨大的时间成本。这个问题要求需要{大型机器学习}（LML），该}旨在从大数据中学习具有可比性能的大数据模式。在本文中，我们提供了有关现有LML方法的系统调查，以为该领域的未来发展提供蓝图。我们首先根据改善可扩展性的方式对这些LML方法进行分配：1）对计算复杂性的简化模型，2）计算效率的优化近似值； 3）计算能力上的计算并行性。然后，我们根据每个角度根据其目标场景对方法进行分类，并根据内在策略介绍代表性方法。最后，我们分析了他们的局限性，讨论了潜在的方向以及将来有望解决的开放问题。

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of {Large-scale Machine Learning} (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题