CFU游乐场：FPGAS上的小机器学习（Tinyml）加速度的全栈开源框架

论文标题

CFU游乐场：FPGAS上的小机器学习（Tinyml）加速度的全栈开源框架

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

论文作者

Prakash, Shvetank, Callahan, Tim, Bushagour, Joseph, Banbury, Colby, Green, Alan V., Warden, Pete, Ansell, Tim, Reddi, Vijay Janapa

论文摘要

对神经网络的有效处理的需求已导致硬件加速器的开发。专业硬件的采用越来越多，这表明需要对硬件软件共同设计和特定于域特异性优化进行更敏捷的设计流。在本文中，我们介绍了CFU Playground：一个全堆放的开源框架，可用于嵌入式ML系统的机器学习（ML）加速器的快速设计和评估。我们的工具为FPGA和未来系统研究的硬件软件共同设计提供了完全开源的端到端流。这个完整的堆栈框架使用户可以访问探索实验和定制体系结构，这些体系结构是对嵌入式ML进行定制和优化的。我们快速，部署的优化反馈循环使ML硬件和软件开发人员从相对较小的定制投资中获得了可观的回报。使用CFU Playground的设计和评估循环，我们显示了55 $ \ times $和75 $ \ times $的大幅加速。柔软的CPU加上加速器，在我们以自动化的方式使用Vizier（开源的黑盒优化服务Vizier）探索的两个组件之间为新的，丰富的设计空间打开了一个新的设计空间。

Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. In this paper, we present CFU Playground: a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground's design and evaluation loop, we show substantial speedups between 55$\times$ and 75$\times$. The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, an open-source black-box optimization service.

下载PDF全文

下载文献需遵守相关版权规定

论文标题