Instruction Set Extensions – Creating A Symmetric-Key Crypto-Processor

Posted by CRYPTOcrat | Encryption | Sunday 29 June 2008 6:29 pm


In this article, the Author will talk about identifying computationally intensive operations within classifications of algorithms, such as symmetric-key ciphers.  These operations require many instructions to implement when targeting a general-purpose processor.  The concept of instruction set extensions will be introduced to accelerate these operations by off-loading them to custom hardware attached to the processor’s datapath that is accessed via newly defined instructions in the processor’s control logic.

The article is authored by Dr. Adam Elbirt a long time CRYPTOcrat and who is currently working as an Assistant Professor at University of Massachusetts Lowell.

You can find more information about Dr. Elbirt on his LI Profile.

Creating A Symmetric-Key Crypto-Processor

Most algorithms can be broken down into a finite number of core operations.  When implementing an algorithm in software targeting a general-purpose processor, some core operations are easy to implement, requiring few instructions, while others are significantly more complex, requiring numerous instructions.  An example of a core operation easily implemented in software is key addition, typically achieved by bit-wise XORing a round key with data.    Examples of more complex core operations are bit-level permutations and long number arithmetic.  Numerous instructions are required because the datapath of a general-purpose processor does not directly support the implementation of these operations due to limited processor word size, the requirement that data be operated upon in bytes or multiple of bytes instead of bits, the lack of a required ALU unit, etc.

When using a general-purpose processor to implement symmetric-key cryptographic algorithms such as block ciphers, even the fastest software implementations cannot satisfy the required bulk data encryption data rates for high-end applications such as ATM networks which require an encryption throughput of 622 Mbps. As a result, hardware implementations are necessary for block ciphers to achieve this required performance level. Although traditional hardware implementations lack flexibility, configurable hardware devices offer a promising alternative for the implementation of processors via the use of IP cores in Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) technology. To illustrate, Altera Corporation offers IP core implementations of the Intel 8051 microcontroller and the Motorola 68000 processor in addition to their own Nios®-II embedded processor. Similarly, Xilinx Inc. offers IP core implementations of the PowerPC processor in addition to their own MicroBlazeTM and PicoBlazeTM embedded processors. ASIC and FPGA technologies provide the opportunity to augment the existing datapath of a processor implemented via an IP core to add acceleration modules supported through newly defined instruction set extensions targeting performance-critical functions. Moreover, many licensable and extendible processor cores are also available for the same purpose.

The use of instruction set extensions follows the hardware/software co-design paradigm to achieve the performance and physical security associated with hardware implementations while providing the portability and flexibility traditionally associated with software implementations. Moreover, when considering alternative solutions, instruction set extensions result in significant performance improvements versus traditional software implementations with considerably reduced logic resource requirements versus hardware-only solutions such as co-processors.  The idea is to “improve the wheel” rather than to “reinvent the wheel”.

Examples of instruction set extensions designed to improve the performance of cryptographic algorithms include those implemented to perform arithmetic over the Galois Field GF(2m), usually targeting elliptic curve cryptography (ECC) systems. Word-level polynomial multiplication was shown to be the time-critical operation when targeting an ARM processor and a special Galois Field multiplication instruction resulted in significant performance improvement. Instruction set extensions targeting a SPARC V8 processor core and a 16-bit RISC processor core were used to accelerate the multiplication of binary polynomials for arithmetic in GF(2m). An implementation targeting a MIPS32 architecture attempts to accelerate word-level polynomial multiplication through the use of Comba’s method of handling the inner loops of the multiplication operation. Numerous generalized Galois Field multipliers have also been proposed for use in elliptic curve cryptosystems. These implementations focus on accelerating exponentiation and inversion in Galois Fields GF(2m) where m ? 160-256.

Instruction set extensions designed to minimize the number of memory accesses and accelerate the performance of AES implementations have been proposed for a wide range of processors. Extensions targeting a general-purpose RISC architecture with multimedia instructions yield strategies to implement AES using multimedia instructions while specifically attempting to minimize the number of memory accesses. While the processor is datapath-scalable, the strategies do not map well to 32-bit architectures. Extensions designed to combine the SubBytes andMixColumns AES functions into one T table look-up operation to speed up algorithm execution have also been proposed. However, the functional unit requires a significant amount of hardware to implement and cannot be used for either the final AES round (where the MixColumns function is not used) or key expansion (where the SubBytes function is used without the MixColumns function), and T table performance is heavily dependent upon available cache size. Extensions targeting the Xtensa 32-bit processor improve the performance of AES encryption but worsen the performance of decryption. An implementation targeting a LEON2 processor core combines the SubBytes and ShiftRows AES functions through the use of an instruction set extension termed sbox. Special instructions are also provided to efficiently compute the MixColumns AES function through the use of ECC instruction set extensions.

Clearly, the use of instruction set extensions allows existing processor technologies to be leveraged in combination with custom functionality to vastly improve the performance of the targeted algorithms.  However, even within classifications of algorithms, such as symmetric-key algorithms, a wide range of additional functionality may be required to accelerate the entire suite.  A trade-off analysis of hardware resource requirements versus expected performance improvement is critical when evaluating which core elements of each algorithm to accelerate via added hardware units.  Relevant references that review the discussed implementations are included below.


1. S. Bartolini, I. Branovic, R. Giorgi, and E. Martinelli, “A Performance Evaluation of ARM ISA Extension for Elliptic Curve Cryptography Over Binary Finite Fields,” in Proceedings of the Sixteenth Symposium on Computer Architecture and High Performance Computing – SBC-PAD 2004, Foz do Igua¸cu, Brazil, October 27-29 2004, pp. 238-245.

2. J. Großchädl and G.-A. Kamendje, “Instruction Set Extension for Fast Elliptic Curve Cryptography Over Binary Finite Fields GF(2m),” in Proceedings of the Fourteenth IEEE International Conference on Application- Specific Systems, Architectures and Processors – ASAP 2003, The Hague, The Netherlands, June 24-26 2003, pp. 455-468.

3. J. Großchädl and E. Savas, “Instruction Set Extensions for Fast Arithmetic in Finite Fields GF(p) and GF(2m),” in Workshop on Cryptographic Hardware and Embedded Systems – CHES 2004, M. Joye and J.-J. Quisquater, Eds., Cambridge, Massachusetts, USA, August 11-13 2004, vol. LNCS 3156, pp. 133-147, Springer-Verlag.

4. J. Irwin and D. Page, “Using Media Processors for Low-Memory AES Implementation,” in Proceedings of the Fourteenth IEEE International Conference on Application-Specific Systems, Architectures and Processors – ASAP 2003, The Hague, The Netherlands, June 24-26 2003, pp. 144-154.

5. K. Nadehara, M. Ikekawa, and I. Kuroda, “Extended Instructions for the AES Cryptography and Their Efficient Implementation,” in Proceedings of the Eighteenth IEEE Workshop on Signal Processing Systems – SIPS 2004, Austin, Texas, USA, October 13-15 2004, pp. 152-157.

6. S. O’Melia, Instruction Set Extensions for Enhancing the Performance of Symmetric Key Cryptographic Algorithms, MSEE Thesis, University of Massachusetts Lowell, 2005.

7. S. Ravi, A. Raghunathan, N. Potlapally, and M. Sankaradass, “System Design Methodologies for a Wireless Security Processing Platform,” in Proceedings of the 2002 Design Automation Conference – DAC 2002, New Orleans, Louisiana, USA, June 10-14 2002, pp. 777-782.

8. S. Tillich and J. Großchädl, “A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography Over Binary Finite Fields GF(2m),” in Proceedings of the Ninth Asia-Pacific Conference on Advances in Computer Systems Architecture – ACSAC 2004, Beijing, China, September 7-9 2004, vol. LNCS 3189, pp. 282-295, Springer-Verlag.

9. S. Tillich and J. Großchädl, “Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography,” in International Conference on Computational Science and Its Applications – ICCSA 2005, O. Gervasi, M. L. Gavrilova, V. Kumar, A. Laganà, H. P. Lee, Y. Mun, D. Taniar, and C. J. K. Tan, Eds., Singapore, May 9-12 2005, vol. LNCS 3481, pp. 665-675, Springer-Verlag.

10. S. Tillich and J. Großchädl, “Instruction Set Extensions for Efficient AES Implementation on 32-bit Processors,” in Workshop on Cryptographic Hardware and Embedded Systems – CHES 2006, L. Goubin and M. Matsui, Eds., Yokohama, Japan, October 10-13 2006, vol. LNCS 4249, pp. 270-284, Springer-Verlag.

11. S. Tillich and J. Großchädl and A. Szekely, “An Instruction Set Extension for Fast and Memory-Efficient AES Implementation,” in Proceedings of the Ninth International Conference on Communications and Multimedia Security – CMS 2005, J. Dittmann, S. Katzenbeisser, and A. Uhl, Eds., Salzburg, Austria, September 19-21 2005, vol. LNCS 3677, pp. 11-21, Springer-Verlag.

All the products mentioned herein which have trademarks and/or registered trademarks belong to their respective owners.

Milestone for CRYPTOcrats: Now 100 members worldwide!

Posted by Amit | Launch | Thursday 26 June 2008 11:18 pm

NAC market picking up

Posted by Mayuresh | Security | Friday 20 June 2008 7:08 am

I am sure many of you, who are working or worked with NAC vendors, would love to hear this. After a lot of talk about NAC market being dead, Infonetics has taken a fresh view of NAC market and predicts strong forecast ahead. Ref: Reports of NAC’s death have been greatly exaggerated; market up 16% in 1Q08

According to the research report, NAC market jumped 16% in 1Q08 to $62.7 million which means $10 million more over the previous quarter.

Though NAC market is still dominated by out-of-band appliances mainly from Cisco and Juniper, Infonetics predicts shift towards Ethernet switch based NAC appliances and in-line (bump in the wire) products. It predicts that purpose-built products from Consentry Networks and Nevis Networks will make up 25% of the NAC market. Being an ex-Nevis employee, I am really happy to know this and wish that it happens!!

Comments welcome!!

Service providers to patrol internet ?

Posted by Mayuresh | Security | Wednesday 18 June 2008 9:14 am

AT&T is planning to offer security services to inspect and stop malicious traffic before entering the corporate network.

In all these years service providers are always looked upon as just thick pipes that do their best to get every piece of data to customers. Fundamental question that arises here is should they be allowed to look into the traffic passing through their network? One of the intended use is to stop spams before reaching the corporate networks. According to AT&T, 80% of emails that it delivers is spam. Argument looks very attractive and seems logical to stop all digital debris, malicious content at backbone itself.

But what if ISPs misuse this power. There are endless possibilities. With so many sophisticated solutions available today, it’s not hard to dig more to characterize behavior of subscribers which then can be used for targeted content OR to keep particular traffic away. Remember Comcast delaying BitTorrent traffic generated enough flames!! This needs to be studied deeply from legal aspects.

Nevertheless, I feel this would open lot of opportunities for security vendors who are struggling to sell in crowded enterprise market.

Comments welcome!!

Attacking NFC Mobile Phones at EUSecWest

Posted by Amit | Security | Tuesday 10 June 2008 7:15 am

Near Field Communications is the RFID-based standard being built into mobile phones to allow them greater interaction with the physical world. NFC-enabled handsets can be used to pay for bus or train journeys, replacing existing contact less cards. They can read tags embedded in (Smart) posters that trigger a URL to be loaded or a phone number to be called.

At the recently concluded EUSecWest Conference in London Collin Mulliner demonstrated two most interesting hacks which involved replacing the NFC tag on a vending machine, and spoofing a URI in a Smart Poster to connect the user to somewhere other than they wished.

Sean Comeau conducted this interview with Collin Mulliner. The complete interview is available on this link. I am copying few interesting questions here.

Sean Comeau: What new threats exist against NFC services and phones?

Collin Mulliner: I’ve basically analyzed THE NFC phone available in Europe (the Nokia 6131 NFC) and found that it allows spoofing of RFID tag content. This is quite interesting since some of the European systems exactly use the part that is spoofable. I’ve also done some fuzzing on the Nokia 6131 NFC and found some smaller bugs.

I’ve also conducted a small survey of NFC systems that are in use in Germany and Austria. This should be quite interesting.

Sean Comeau: What kinds of things are possible when you can spoof tags?

Collin Mulliner: All of these attacks are based on the exploitation of the trust the user has in the RFID/NFC tags (e.g. because the user has used the system for some time and he know what to expect – if everything looks ok he will believe it is ok).

So now if an attack can tamper with these tags (there are multiple ways to do this – e.g. through using a sticky tag on top of the original tag or by modifying the original tag) the user can be tricked into doing things that are bad for him.

There are multiple SMS-based services in the field. These can be attacked because we can spoof the phone number so the SMS is send to a other phone number then the user expects (e.g. premium rate number – other attacks are possible too :-) .


Sean Comeau: Have you been in contact with any members of the NFC member companies regarding these issues and if so what response have you received?

Collin Mulliner: I have extensive contact with Nokia. They already started fixing the spoofing issues. Nokia seems to care a lot about the issues I reported.

Our fellow CRYPTOcrat, Jan Brands, an expert in NFC security has generously provided few comments for this blog. Please find these comments in the “Comments” section below. Jan also sent us the link to the complete presentation about the experiment performed by Mulliner. It seems the experiment much more than the details given in the interview. You can download the presentation from this link.

Soon to be published book authored by a fellow CRYPTOcrat

Posted by Amit | Security | Monday 9 June 2008 9:16 am

Dr. Adam J. Elbirt will soon be unveiling his new book titled “Understanding and Applying Cryptography and Data Security” and has graciously provided us a snippet of what the book is all about. Here is what Dr Adam wrote to us.

There are numerous books available that present cryptography and data security concepts from a variety of perspectives. While useful as reference texts when examining specifics of cryptographic algorithm and protocol implementation, these texts tend to be written from a mathematics perspective versus engineering and computer science viewpoints. Even books such as Applied Cryptography, by Bruce Schneier, are not truly suited to classroom environments though they are written to be accessible to those with a less formal mathematics background. Moreover, mathematics-based books fail to provide real-world examples that span the implementation domains of hardware, software, and embedded systems. This book describes cryptography and data security from the “how do I implement the algorithms and protocols” point of view, with relevant examples and homework problems that will be coded in software languages, such as assembly and C, as well as hardware description languages, such as VHDL and Verilog, to evaluate implementation results. The goal of these implementation comparisons is to provide students with a feel for what they may encounter in actual job situations, examining tradeoffs between code size, hardware logic resource requirements, memory usage, speed and throughput, power consumption, etc.

I am sure this book will be useful to many of us. If you wish to pre-order the book here is the link to its home on amazon’s website.

Dr. Adam is a long time CRYPTOcrat and an expert in NTRU Cryptosystems. He is currently serving as an Assistant Professor at University of Massachusetts Lowell.

Here is the link to Dr. Adam’s profile on Linkedin. We do look forward to seeing this book published soon. Here is wishing Dr. Adam from all of us at CRYPTOcrats a grand success for his new book and many more to come.

CRYPTOcrats statistics

Posted by Amit | Launch | Sunday 8 June 2008 2:04 pm

Updated statistics available as of 26 June 2008 and can be found under About US section here.

Only 6 months ago few of us got together and decided to have a web representation for our CRYPTOcrats group. Ever since that time in Dec’07 we have been growing up in numbers and I am happy to report some membership statistics now. Its quite interesting to see that such a specialized group got this great response and the credit goes to all of you members (CRYPTOcrats).

There are two distinct views* of the statistics. One is based on the profile of the members that shows the varied mix of people we have in the group. The other one is based on their geographical locations again quite varied and it is really nice to see that we are known in so many parts of the world now.

1. CRYPTOcrats and their job function

CRYPTOcrats & their Profiles, click to view larger size

There is a strong representation of Architects, Developers and Researchers on the group. It is also nice to see that quite a few CEOs & CTOs bring business leader level perspective to the group.

2. CRYPTOcrats and their geographical locations

CRYPTOcrats & their locations, click to view larger size

There is strong representation from India, however, this includes few members who are presently working in other parts of the word. Followed by India there is a strong representation from US and France. Quite interestingly lots of the senior CRYPTOcrats come from US & France and will soon be contributing to this website. We would look forward to having more representation from Israel, Switzerland and UK where lots of good work in Security and Cryptography is taking place.


NVIDIA CUDA Competition

Posted by CRYPTOcrat | Crypto Application, Encryption | Tuesday 3 June 2008 7:55 pm

Contributed by Abhishek

This might be of interest to some of you. There is a contest held by NVIDIA, the Graphics Card leader, to experience the power of their GPUs by way of using NVIDIA CUDA™ Technology. CUDA is the world’s only C language environment that provides access to processing power of NVIDIA GPUs. This enables developers to utilize NVIDIA GPUs to solve the most complex computation-intensive challenges such as oil and gas exploration, financial risk management, product design, medical imaging, and scientific research.

The CUDA Contest welcomes all types of mainstream stand alone applications or plug-ins, running in Windows, Linux or MacOS environments, but is looking to reward innovative, useful apps that make the best use of the GPU processing power. Scientific applications are excluded. I thought this could be of interest to the CRYPTOcrats if they would like to test/verify their algorithms or even do cryptanalysis.

Find below the flyer of this competition. You can contact NVIDIA directly or use the comments section if you need more information.

NVIDIA CUDA Competition Flyer

soccerine Wordpress Theme