垃圾谈话：在集成GPU上加速垃圾收集是毫无价值的

论文标题

垃圾谈话：在集成GPU上加速垃圾收集是毫无价值的

Trash Talk: Accelerating Garbage Collection on Integrated GPUs is Worthless

论文作者

Dashti, Mohammad, Fedorova, Alexandra

论文摘要

将异构处理器与统一记忆相结合的系统在这些处理器之间提供了无缝集成，其开发复杂性最小。这些系统将加速器（例如GPU）与CPU内核相同，以适应具有不同级别的并行性的运行并行应用。这种集成在现代芯片体系结构上变得非常普遍，并且在应用程序和系统程序员身上负担负担（或机会），以利用这种集成芯片的全部潜力。在本文中，我们评估是否可以通过在集成的GPU系统上运行垃圾收集来获得任何绩效收益，并讨论为程序员实现这些收益的困难。从手持移动设备到数据中心的各种平台上运行的垃圾收集语言的扩散使垃圾收集成为一个有趣的目标，可以在此类平台上进行检查，并可以为其他应用程序提供有价值的课程。我们介绍了对集成系统运行垃圾收集的分析，发现这些系统的当前状态并不能为加速此类任务提供优势。我们构建了一个框架，使我们能够从JVM内部卸载集成GPU系统上的垃圾收集任务。我们确定垃圾收集的主要阶段，并研究将它们卸载到集成GPU的生存能力。我们表明，性能优势是有限的，部分原因是集成的GPU在内存带宽方面的优势比CPU的优势有限，部分原因是由于原子能运算昂贵。

Systems integrating heterogeneous processors with unified memory provide seamless integration among these processors with minimal development complexity. These systems integrate accelerators such as GPUs on the same die with CPU cores to accommodate running parallel applications with varying levels of parallelism. Such integration is becoming very common on modern chip architectures, and it places a burden (or opportunity) on application and system programmers to utilize the full potential of such integrated chips. In this paper we evaluate whether we can obtain any performance benefits from running garbage collection on integrated GPU systems, and discuss how difficult it would be to realize these gains for the programmer. Proliferation of garbage-collected languages running on a variety of platforms from handheld mobile devices to data centers makes garbage collection an interesting target to examine on such platforms and can offer valuable lessons for other applications. We present our analysis of running garbage collection on integrated systems and find that the current state of these systems does not provide an advantage for accelerating such a task. We build a framework that allows us to offload garbage collection tasks on integrated GPU systems from within the JVM. We identify dominant phases of garbage collection and study the viability of offloading them to the integrated GPU. We show that performance advantages are limited, partly because an integrated GPU has limited advantage in memory bandwidth over the CPU, and partly because of costly atomic operations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题