TCN-CUTIE：1036 TOP/S/W，2.72 UJ/推理，12.2 MW全位数三元加速器22 nm FDX技术

论文标题

TCN-CUTIE：1036 TOP/S/W，2.72 UJ/推理，12.2 MW全位数三元加速器22 nm FDX技术

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

论文作者

Scherer, Moritz, Di Mauro, Alfio, Fischer, Tim, Rutishauser, Georg, Benini, Luca

论文摘要

微小的机器学习（Tinyml）应用施加了UJ/推理限制，最大的功率消耗为数十MW。以合理的精度水平满足这些要求是极具挑战性的。这项工作通过基于RISC-V的芯片（SOC）的灵活，完全数字三元神经网络（TNN）加速器来应对挑战。除了支持三元卷积神经网络外，我们还为加速器设计引入了扩展，该设计使时间删除的时间卷积神经网络（TCN）可以处理。该设计的基于动态视觉传感器（DVS）的TCN为5.5 UJ/推理，12.2 MW，在0.5 V时为0.5 V时，8000个推论/秒，准确性为94.5％和2.72 UJ/推理，12.2 MW，12.2 MW，3200在0.5 vers in 0.5 v in 0.5 v efterical in 0.5 v efference in 96 lay-d layer newlation in 0.5 lay-secters in 96 lay-pers-pers sectif ins verne sectersife in 96666666666666666666频道的准确性/秒。％。峰值能效率为1036 TOP/S/W，表现优于最先进的硅预先固定的Tinyml量化加速器，同时实现竞争精度。

Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based System-on-Chip (SoC). Besides supporting Ternary Convolutional Neural Networks, we introduce extensions to the accelerator design that enable the processing of time-dilated Temporal Convolutional Neural Networks (TCNs). The design achieves 5.5 uJ/Inference, 12.2 mW, 8000 Inferences/sec at 0.5 V for a Dynamic Vision Sensor (DVS) based TCN, and an accuracy of 94.5 % and 2.72 uJ/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67x while achieving competitive accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题