论文标题
用于基于晶格的签名和钥匙交换的统一的加密处理器
A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange
论文作者
论文摘要
我们提出了设计方法,用于构建一种紧凑,统一和可编程的隐式处理器体系结构,该架构可以计算Quantum关键协议和数字签名。两种类型的加密原语中的协同作用用于使加密处理器紧凑。作为一个案例研究,对签名方案“晶体 - 迪基”和关键封装机制(KEM)的签名方案(saber''的签名方案已经进行了优化,这是NIST量词后的密码标准化项目的决赛入围者。可编程的加密处理器执行关键世代,封装,分解,签名世代和签名验证,以实现Dilithium和Saber的所有安全级别。在Xilinx Ultrascale+ FPGA上,拟议的隐式处理器消耗了18,406个LUTS,9,323 FFS,4个DSP和24 BRAM。它达到了200 MHz时钟频率,并完成了Lightseraber的CCA-SEC-SECURE钥匙生成/封装/封装/拆卸操作,以29.6/40.4/58.3 $μ$ s;对于54.9/69.7/94.9 $μ$ s的SABER;以及分别为87.6/108.0/139.4 $μ$ s的FireSaber。它完成了70.9/151.6/75.2 $μ$ s的密钥生成/符号/验证操作;对于114.7/237/127.6 $μ$ s;对于194.2/342.1/228.9 $μ$ S的Dilithium-5,对于最佳情况。在ASIC的UMC 65nm库上,由于时钟频率增加了2倍,潜伏期提高了两倍。
We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme 'CRYSTALS-Dilithium' and the key encapsulation mechanism (KEM) 'Saber', both finalists in the NIST's post-quantum cryptography standardization project. The programmable cryptoprocessor executes key generations, encapsulations, decapsulations, signature generations, and signature verifications for all the security levels of Dilithium and Saber. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 18,406 LUTs, 9,323 FFs, 4 DSPs, and 24 BRAMs. It achieves 200 MHz clock frequency and finishes CCA-secure key-generation/encapsulation/decapsulation operations for LightSaber in 29.6/40.4/58.3$μ$s; for Saber in 54.9/69.7/94.9$μ$s; and for FireSaber in 87.6/108.0/139.4$μ$s, respectively. It finishes key-generation/sign/verify operations for Dilithium-2 in 70.9/151.6/75.2$μ$s; for Dilithium-3 in 114.7/237/127.6$μ$s; and for Dilithium-5 in 194.2/342.1/228.9$μ$s, respectively, for the best-case scenario. On UMC 65nm library for ASIC the latency is improved by a factor of two due to a 2x increase in clock frequency.