多层感知器和FPGA共同设计的汽车

论文标题

多层感知器和FPGA共同设计的汽车

AutoML for Multilayer Perceptron and FPGA Co-design

论文作者

Colangelo, Philip, Segal, Oren, Speicher, Alex, Margala, Martin

论文摘要

最先进的神经网络体系结构（NNAS）在硬件中设计和实施的设计和实施具有挑战性。在过去的几年中，这导致了自动神经体系结构搜索（NAS）工具的研究和开发爆炸。现在，自动工具用于实现最新的NNA设计状态，并尝试优化用于硬件使用和设计。 NNAS自动设计的最新研究都集中在卷积网络和图像识别上，而忽略了数据中心中很大一部分工作量的事实是通用的深层神经网络。在这项工作中，我们开发和测试了一般的多层感知器（MLP）流，该流程可以将任意数据集作为输入，并自动生成优化的NNA和硬件设计。我们测试了六个基准测试的流量。我们的结果表明，我们超出了当前发布的MLP准确性结果的性能，并且具有基于非MLP的结果具有竞争力。我们将一般和常见的GPU体系结构与我们的可扩展FPGA设计进行了比较，并表明我们可以在大多数数据集中实现更高的效率和更高的吞吐量（每秒输出）。对精确网络和高性能硬件的设计空间的进一步见解，通过将精度与吞吐量与吞吐量，网络尺寸与精度相关联，以及与高性能设备的扩展相关联，表明了共同设计的功能。

State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutomML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题