论文标题
编程与非均匀线程模型的裸金属加速器:矩阵-3000的案例研究
Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000
论文作者
论文摘要
随着硬件行业朝着使用专门的异质多核来避免电源墙的影响时,软件开发人员发现很难应对这些系统的复杂性。本文在开发编程模型及其支持编译器和库库时分享了我们的经验,该编译器和库是为下一代Exascale SuperComputer设计的,但具有复杂的内存层次结构和处理器组织。为了协助其软件开发,我们从头开始开发了一个软件堆栈,其中包括低级编程界面和高级OpenCL编译器。我们的低级编程模型为使用矩阵-3000的裸机加速器提供了本机编程支持,而高级模型则允许程序员使用OpenCL编程标准。我们详细介绍了设计选择,并突出了从开发系统软件中学到的经验教训,以实现裸机加速器的编程。我们的编程模型已部署到Exascale原型系统的生产环境中。
As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when developing a programming model and its supporting compiler and libraries for Matrix-3000, which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization. To assist its software development, we developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler. Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000, while the high-level model allows programmers to use the OpenCL programming standard. We detail our design choices and highlight the lessons learned from developing systems software to enable the programming of bare-metal accelerators. Our programming models have been deployed to the production environment of an exascale prototype system.