4.6 Article

Optimization of Advanced Encryption Standard on Graphics Processing Units

Journal

IEEE ACCESS
Volume 9, Issue -, Pages 67315-67326

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3077551

Keywords

Graphics processing units; Encryption; Optimization; Throughput; Instruction sets; Central Processing Unit; Registers; Cryptography; encryption; cryptanalysis

Ask authors/readers for more resources

This work focuses on the performance of the AES algorithm on GPUs, achieving significant breakthroughs in optimization to provide higher encryption throughput and surpassing CPU performance with hardware instructions and traditional FPGA clusters. Transitioning from AES-128 to AES-256 on GPUs has been proven to offer increased security without sacrificing performance.
Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than 2.56x speed-up against the best GPU results in the literature. Our optimized AES implementations on GPUs even outperform any CPU using the hardware level AES New Instructions (AES-NI) and legacy FPGA-based cluster architectures like COPACOBANA and RIVYERA. Even on a low-end GPU like MX 250, we obtained 60.0 Gbps throughput for AES-256 which is generally faster than the read/write speeds of solid disks. Thus, transition from AES-128 to AES-256 when using GPUs would provide military grade security with no visible performance loss. With these breakthrough performances, GPUs can be used as a cryptographic co-processor for file or full disk encryption to remove performance loss coming from CPU encryption. With a single GPU as a co-processor, busy SSL servers can be free from the burden of encryption and use their whole CPU power for other operations. Moreover, these optimizations can help GPUs to practically verify theoretically obtained cryptanalysis results or their reduced versions in reasonable time.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available