挤压：有效的张量核心GPU的紧凑分形

论文标题

挤压：有效的张量核心GPU的紧凑分形

Squeeze: Efficient Compact Fractals for Tensor Core GPUs

论文作者

Quezada, Felipe A., Navarro, Cristóbal A., Hitschfeld, Nancy, Bustos, Benjamin

论文摘要

这项工作提出了挤压，这是一种有效的张量核心GPU的紧凑分形加工方案。通过在紧凑型和扩展的表单之间结合离散空间转换，可以在分形访问中进行数据并行计算，而无需在内存中扩展分形。空间变换被配制为两个GPU张量核心加速螺纹图，$λ（ω）$和$ν（ω）$，它们分别充当紧凑到膨胀和扩展到连接的空间函数。地图的成本为$ \ MATHCAL {O}（\ log_2 \ log_s（n））$时间，其中$ n $是$ n \ times n $ n $嵌入的面积，其扩展形式为fractal，而$ s $ s $ s $是线性缩放系数。所提出的方法适用于属于离散分形的非重叠结合盒（NBB）类别的任何分形，并且也可以扩展到三个维度。相对于基于GPU的扩展空间边界盒方法，使用离散的Sierpinski三角形作为案例研究的实验结果最多显示高达$ \ sim12 \ times $ speedup和最多$ \ sim 315 \ times $。这些结果表明，所提出的紧凑型方法将使科学界能够有效解决现在无法适应GPU记忆的问题。

This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between compact and expanded forms, one can do data-parallel computation on a fractal with neighborhood access without needing to expand the fractal in memory. The space transformations are formulated as two GPU tensor-core accelerated thread maps, $λ(ω)$ and $ν(ω)$, which act as compact-to-expanded and expanded-to-compact space functions, respectively. The cost of the maps is $\mathcal{O}(\log_2 \log_s(n))$ time, with $n$ being the side of a $n \times n$ embedding for the fractal in its expanded form, and $s$ the linear scaling factor. The proposed approach works for any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Experimental results using a discrete Sierpinski Triangle as a case study shows up to $\sim12\times$ of speedup and a memory reduction factor of up to $\sim 315\times$ with respect to a GPU-based expanded-space bounding box approach. These results show that the proposed compact approach will allow the scientific community to efficiently tackle problems that up to now could not fit into GPU memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题