论文标题
TCN-CUTIE:1036 TOP/S/W,2.72 UJ/推理,12.2 MW全位数三元加速器22 nm FDX技术
TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology
论文作者
论文摘要
微小的机器学习(Tinyml)应用施加了UJ/推理限制,最大的功率消耗为数十MW。以合理的精度水平满足这些要求是极具挑战性的。这项工作通过基于RISC-V的芯片(SOC)的灵活,完全数字三元神经网络(TNN)加速器来应对挑战。除了支持三元卷积神经网络外,我们还为加速器设计引入了扩展,该设计使时间删除的时间卷积神经网络(TCN)可以处理。该设计的基于动态视觉传感器(DVS)的TCN为5.5 UJ/推理,12.2 MW,在0.5 V时为0.5 V时,8000个推论/秒,准确性为94.5%和2.72 UJ/推理,12.2 MW,12.2 MW,3200在0.5 vers in 0.5 v in 0.5 v efterical in 0.5 v efference in 96 lay-d layer newlation in 0.5 lay-secters in 96 lay-pers-pers sectif ins verne sectersife in 96666666666666666666频道的准确性/秒。 %。峰值能效率为1036 TOP/S/W,表现优于最先进的硅预先固定的Tinyml量化加速器,同时实现竞争精度。
Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based System-on-Chip (SoC). Besides supporting Ternary Convolutional Neural Networks, we introduce extensions to the accelerator design that enable the processing of time-dilated Temporal Convolutional Neural Networks (TCNs). The design achieves 5.5 uJ/Inference, 12.2 mW, 8000 Inferences/sec at 0.5 V for a Dynamic Vision Sensor (DVS) based TCN, and an accuracy of 94.5 % and 2.72 uJ/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67x while achieving competitive accuracy.