两者中最好的：CNN及其硬件加速器的Automl CodeSign

论文标题

两者中最好的：CNN及其硬件加速器的Automl CodeSign

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

论文作者

Abdelfattah, Mohamed S., Dudziak, Łukasz, Chau, Thomas, Lee, Royson, Kim, Hyeji, Lane, Nicholas D.

论文摘要

神经体系结构搜索（NAS）在精确度上超过了人类设计的卷积神经网络（CNN），并且在存在硬件信息时也非常成功。但是，NAS设计的CNN通常具有复杂的拓扑结构，因此，为此类CNN设计定制硬件（HW）加速器可能很难。我们使用NAS自动化HW-CNN代码，包括来自CNN模型和HW加速器的参数，并共同搜索提高准确性和效率的最佳模型加速器对。我们称此CodeSign-NAS。在本文中，我们着重于定义CodeSign-NAS多目标优化问题，证明其有效性，并探索导航CodeSign搜索空间的不同方式。对于CIFAR-10图像分类，我们列举了近40亿个模型加速器对，并在该较大的搜索空间中找到帕累托前沿。这使我们能够评估三种不同的基于强化学习的搜索策略。最后，与从HW设计空间内的最佳HW加速器重新连接相比，我们将CIFAR-100分类精度提高了1.3％，而同时在约1000 GPU小时运行的CodeSign-NAS中，同时将性能/面积提高了41％。

Neural architecture search (NAS) has been very successful at outperforming human-designed convolutional neural networks (CNN) in accuracy, and when hardware information is present, latency as well. However, NAS-designed CNNs typically have a complicated topology, therefore, it may be difficult to design a custom hardware (HW) accelerator for such CNNs. We automate HW-CNN codesign using NAS by including parameters from both the CNN model and the HW accelerator, and we jointly search for the best model-accelerator pair that boosts accuracy and efficiency. We call this Codesign-NAS. In this paper we focus on defining the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space. For CIFAR-10 image classification, we enumerate close to 4 billion model-accelerator pairs, and find the Pareto frontier within that large search space. This allows us to evaluate three different reinforcement-learning-based search strategies. Finally, compared to ResNet on its most optimal HW accelerator from within our HW design space, we improve on CIFAR-100 classification accuracy by 1.3% while simultaneously increasing performance/area by 41% in just~1000 GPU-hours of running Codesign-NAS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题