论文标题
关于异质神经形态SOC的机器学习操作的实时安排
Real-Time Scheduling of Machine Learning Operations on Heterogeneous Neuromorphic SoC
论文作者
论文摘要
片上神经形态系统(NSOCS)正在通过在同一SOC上整合通用处理器(GPP)和神经加工单元(NPU),从而变得异质。对于嵌入式系统,NSOC可能需要执行使用各种机器学习模型构建的用户应用程序。我们提出了一个名为Prism的实时调度程序,该调度程序可以单独或同时在异构NSOC上安排机器学习模型,以提高其系统性能。棱镜包括以下四个关键步骤。首先,它从映射和自定位时间表中构造了机器学习模型的解释器通信(IPC)图。其次,它为通信参与者创建了交易顺序,并将此顺序嵌入到IPC图中。第三,它通过与计算重叠的通信在NSOC上安排图形。最后,它使用爬坡启发式方法来探索在GPP和NPU上映射操作的设计空间,以提高性能。与仅使用NSOC的NPU的现有调度程序不同,Prism通过利用平台的异质性来启用批量,管道和操作并行性来提高性能。对于具有并发应用程序的用例,Prism使用启发式资源共享策略和非抢先计划,以减少预期的等待时间,然后在争夺资源上安排并发操作。我们对20个机器学习工作负载进行的广泛评估表明,与最新的调度程序相比,PRISM显着改善了单个应用和用例的每瓦性能。
Neuromorphic Systems-on-Chip (NSoCs) are becoming heterogeneous by integrating general-purpose processors (GPPs) and neural processing units (NPUs) on the same SoC. For embedded systems, an NSoC may need to execute user applications built using a variety of machine learning models. We propose a real-time scheduler, called PRISM, which can schedule machine learning models on a heterogeneous NSoC either individually or concurrently to improve their system performance. PRISM consists of the following four key steps. First, it constructs an interprocessor communication (IPC) graph of a machine learning model from a mapping and a self-timed schedule. Second, it creates a transaction order for the communication actors and embeds this order into the IPC graph. Third, it schedules the graph on an NSoC by overlapping communication with the computation. Finally, it uses a Hill Climbing heuristic to explore the design space of mapping operations on GPPs and NPUs to improve the performance. Unlike existing schedulers which use only the NPUs of an NSoC, PRISM improves performance by enabling batch, pipeline, and operation parallelism via exploiting a platform's heterogeneity. For use-cases with concurrent applications, PRISM uses a heuristic resource sharing strategy and a non-preemptive scheduling to reduce the expected wait time before concurrent operations can be scheduled on contending resources. Our extensive evaluations with 20 machine learning workloads show that PRISM significantly improves the performance per watt for both individual applications and use-cases when compared to state-of-the-art schedulers.