Abstract
In this article, the Author will talk about identifying computationally intensive operations within classifications of algorithms, such as symmetric-key ciphers. These operations require many instructions to implement when targeting a general-purpose processor. The concept of instruction set extensions will be introduced to accelerate these operations by off-loading them to custom hardware attached to the processor’s datapath that is accessed via newly defined instructions in the processor’s control logic.
The article is authored by Dr. Adam Elbirt a long time CRYPTOcrat and who is currently working as an Assistant Professor at University of Massachusetts Lowell.
You can find more information about Dr. Elbirt on his LI Profile.
Creating A Symmetric-Key Crypto-Processor
Most algorithms can be broken down into a finite number of core operations. When implementing an algorithm in software targeting a general-purpose processor, some core operations are easy to implement, requiring few instructions, while others are significantly more complex, requiring numerous instructions. An example of a core operation easily implemented in software is key addition, typically achieved by bit-wise XORing a round key with data. Examples of more complex core operations are bit-level permutations and long number arithmetic. Numerous instructions are required because the datapath of a general-purpose processor does not directly support the implementation of these operations due to limited processor word size, the requirement that data be operated upon in bytes or multiple of bytes instead of bits, the lack of a required ALU unit, etc.
When using a general-purpose processor to implement symmetric-key cryptographic algorithms such as block ciphers, even the fastest software implementations cannot satisfy the required bulk data encryption data rates for high-end applications such as ATM networks which require an encryption throughput of 622 Mbps. As a result, hardware implementations are necessary for block ciphers to achieve this required performance level. Although traditional hardware implementations lack flexibility, configurable hardware devices offer a promising alternative for the implementation of processors via the use of IP cores in Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) technology. To illustrate, Altera Corporation offers IP core implementations of the Intel 8051 microcontroller and the Motorola 68000 processor in addition to their own Nios®-II embedded processor. Similarly, Xilinx Inc. offers IP core implementations of the PowerPC processor in addition to their own MicroBlazeTM and PicoBlazeTM embedded processors. ASIC and FPGA technologies provide the opportunity to augment the existing datapath of a processor implemented via an IP core to add acceleration modules supported through newly defined instruction set extensions targeting performance-critical functions. Moreover, many licensable and extendible processor cores are also available for the same purpose.
The use of instruction set extensions follows the hardware/software co-design paradigm to achieve the performance and physical security associated with hardware implementations while providing the portability and flexibility traditionally associated with software implementations. Moreover, when considering alternative solutions, instruction set extensions result in significant performance improvements versus traditional software implementations with considerably reduced logic resource requirements versus hardware-only solutions such as co-processors. The idea is to “improve the wheel” rather than to “reinvent the wheel”.
Examples of instruction set extensions designed to improve the performance of cryptographic algorithms include those implemented to perform arithmetic over the Galois Field GF(2m), usually targeting elliptic curve cryptography (ECC) systems. Word-level polynomial multiplication was shown to be the time-critical operation when targeting an ARM processor and a special Galois Field multiplication instruction resulted in significant performance improvement. Instruction set extensions targeting a SPARC V8 processor core and a 16-bit RISC processor core were used to accelerate the multiplication of binary polynomials for arithmetic in GF(2m). An implementation targeting a MIPS32 architecture attempts to accelerate word-level polynomial multiplication through the use of Comba’s method of handling the inner loops of the multiplication operation. Numerous generalized Galois Field multipliers have also been proposed for use in elliptic curve cryptosystems. These implementations focus on accelerating exponentiation and inversion in Galois Fields GF(2m) where m ? 160-256.
Instruction set extensions designed to minimize the number of memory accesses and accelerate the performance of AES implementations have been proposed for a wide range of processors. Extensions targeting a general-purpose RISC architecture with multimedia instructions yield strategies to implement AES using multimedia instructions while specifically attempting to minimize the number of memory accesses. While the processor is datapath-scalable, the strategies do not map well to 32-bit architectures. Extensions designed to combine the SubBytes andMixColumns AES functions into one T table look-up operation to speed up algorithm execution have also been proposed. However, the functional unit requires a significant amount of hardware to implement and cannot be used for either the final AES round (where the MixColumns function is not used) or key expansion (where the SubBytes function is used without the MixColumns function), and T table performance is heavily dependent upon available cache size. Extensions targeting the Xtensa 32-bit processor improve the performance of AES encryption but worsen the performance of decryption. An implementation targeting a LEON2 processor core combines the SubBytes and ShiftRows AES functions through the use of an instruction set extension termed sbox. Special instructions are also provided to efficiently compute the MixColumns AES function through the use of ECC instruction set extensions.
Clearly, the use of instruction set extensions allows existing processor technologies to be leveraged in combination with custom functionality to vastly improve the performance of the targeted algorithms. However, even within classifications of algorithms, such as symmetric-key algorithms, a wide range of additional functionality may be required to accelerate the entire suite. A trade-off analysis of hardware resource requirements versus expected performance improvement is critical when evaluating which core elements of each algorithm to accelerate via added hardware units. Relevant references that review the discussed implementations are included below.
References
1. S. Bartolini, I. Branovic, R. Giorgi, and E. Martinelli, “A Performance Evaluation of ARM ISA Extension for Elliptic Curve Cryptography Over Binary Finite Fields,” in Proceedings of the Sixteenth Symposium on Computer Architecture and High Performance Computing - SBC-PAD 2004, Foz do Igua¸cu, Brazil, October 27-29 2004, pp. 238-245.
2. J. Großchädl and G.-A. Kamendje, “Instruction Set Extension for Fast Elliptic Curve Cryptography Over Binary Finite Fields GF(2m),” in Proceedings of the Fourteenth IEEE International Conference on Application- Specific Systems, Architectures and Processors - ASAP 2003, The Hague, The Netherlands, June 24-26 2003, pp. 455-468.
3. J. Großchädl and E. Savas, “Instruction Set Extensions for Fast Arithmetic in Finite Fields GF(p) and GF(2m),” in Workshop on Cryptographic Hardware and Embedded Systems - CHES 2004, M. Joye and J.-J. Quisquater, Eds., Cambridge, Massachusetts, USA, August 11-13 2004, vol. LNCS 3156, pp. 133-147, Springer-Verlag.
4. J. Irwin and D. Page, “Using Media Processors for Low-Memory AES Implementation,” in Proceedings of the Fourteenth IEEE International Conference on Application-Specific Systems, Architectures and Processors - ASAP 2003, The Hague, The Netherlands, June 24-26 2003, pp. 144-154.
5. K. Nadehara, M. Ikekawa, and I. Kuroda, “Extended Instructions for the AES Cryptography and Their Efficient Implementation,” in Proceedings of the Eighteenth IEEE Workshop on Signal Processing Systems - SIPS 2004, Austin, Texas, USA, October 13-15 2004, pp. 152-157.
6. S. O’Melia, Instruction Set Extensions for Enhancing the Performance of Symmetric Key Cryptographic Algorithms, MSEE Thesis, University of Massachusetts Lowell, 2005.
7. S. Ravi, A. Raghunathan, N. Potlapally, and M. Sankaradass, “System Design Methodologies for a Wireless Security Processing Platform,” in Proceedings of the 2002 Design Automation Conference - DAC 2002, New Orleans, Louisiana, USA, June 10-14 2002, pp. 777-782.
8. S. Tillich and J. Großchädl, “A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography Over Binary Finite Fields GF(2m),” in Proceedings of the Ninth Asia-Pacific Conference on Advances in Computer Systems Architecture - ACSAC 2004, Beijing, China, September 7-9 2004, vol. LNCS 3189, pp. 282-295, Springer-Verlag.
9. S. Tillich and J. Großchädl, “Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography,” in International Conference on Computational Science and Its Applications - ICCSA 2005, O. Gervasi, M. L. Gavrilova, V. Kumar, A. Laganà, H. P. Lee, Y. Mun, D. Taniar, and C. J. K. Tan, Eds., Singapore, May 9-12 2005, vol. LNCS 3481, pp. 665-675, Springer-Verlag.
10. S. Tillich and J. Großchädl, “Instruction Set Extensions for Efficient AES Implementation on 32-bit Processors,” in Workshop on Cryptographic Hardware and Embedded Systems - CHES 2006, L. Goubin and M. Matsui, Eds., Yokohama, Japan, October 10-13 2006, vol. LNCS 4249, pp. 270-284, Springer-Verlag.
11. S. Tillich and J. Großchädl and A. Szekely, “An Instruction Set Extension for Fast and Memory-Efficient AES Implementation,” in Proceedings of the Ninth International Conference on Communications and Multimedia Security - CMS 2005, J. Dittmann, S. Katzenbeisser, and A. Uhl, Eds., Salzburg, Austria, September 19-21 2005, vol. LNCS 3677, pp. 11-21, Springer-Verlag.
All the products mentioned herein which have trademarks and/or registered trademarks belong to their respective owners.
7 comments so far.