论文标题

高效的基于FPGA的深林加速器

An Efficient FPGA-based Accelerator for Deep Forest

论文作者

Zhu, Mingyu, Luo, Jiapeng, Mao, Wendong, Wang, Zhongfeng

论文摘要

深森林是一种著名的机器学习算法,以其在预测方面的高精度而闻名。与深度神经网络相比,深森林几乎没有乘法操作,并且在小数据集上具有更好的性能。但是,由于结构的深层和较大的森林数量,它遭受了大量的计算和记忆消耗。在本文中,提出了一个有效的硬件加速器,用于深森林模型,这也是第一个在FPGA上实施深森林的工作。首先,精致的节点计算单元(NCU)旨在提高推理速度。其次,基于NCU,提出了有效的体系结构和自适应数据流,以减轻分类过程中节点计算不平衡的问题。此外,此设计中的优化存储方案还提高了硬件利用率和功率效率。提出的设计是在FPGA板Intel Stratix V上实现的,并通过两个典型的数据集(成人和面罩检测)进行评估。实验结果表明,与40个核心高性能X86 CPU相比,所提出的设计可以达到40倍的速度。

Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA. Firstly, a delicate node computing unit (NCU) is designed to improve inference speed. Secondly, based on NCU, an efficient architecture and an adaptive dataflow are proposed, in order to alleviate the problem of node computing imbalance in the classification process. Moreover, an optimized storage scheme in this design also improves hardware utilization and power efficiency. The proposed design is implemented on an FPGA board, Intel Stratix V, and it is evaluated by two typical datasets, ADULT and Face Mask Detection. The experimental results show that the proposed design can achieve around 40x speedup compared to that on a 40 cores high performance x86 CPU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源