论文标题

用AVX-512转码Unicode字符说明

Transcoding Unicode Characters with AVX-512 Instructions

论文作者

Clausecker, Robert, Lemire, Daniel

论文摘要

英特尔在其最近的处理器中包括一套强大的指令,能够使用单个指令(AVX-512)处理512位寄存器。其中一些说明在早期的指令集中没有等效。我们利用这些说明有效地在最常见的格式之间进行有效反编码字符串:UTF-8和UTF-16。借助我们的新算法,我们的速度通常是以前最佳解决方案的两倍。例如,我们使用每个字符少于2个CPU指令以超过5 GIB/s的速度将中文文本从UTF-8转到UTF-16。为了确保可重复性,我们将软件作为开源库免费提供。我们的库是流行节点的一部分。JSJavaScript运行时。

Intel includes in its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF-8 and UTF-16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF-8 to UTF-16 at more than 5 GiB/s using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open source library. Our library is part of the popular Node.js JavaScript runtime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源