Constant-Time Bitsliced Rijndael-256 on ARM Cortex-M4: On the Limitations of Fixslicing Beyond AES-128

Authors

DOI:

https://doi.org/10.31436/iiumej.v27i2.4405

Keywords:

ARM Cortex-M4, bitslicing, constant-time implementation, fixslicing, Rijndael-256

Abstract

Wider-block ciphers are increasingly needed in high-volume applications, because 128-bit blocks in modes such as Galois/Counter Mode (GCM) limit each invocation to roughly 64 GiB of plaintext per key-nonce pair, forcing complex re-keying strategies. Rijndael-256, the 256-bit-block variant of Rijndael with a 256-bit key, has therefore attracted renewed interest as a natural wider-block companion to Advanced Encryption Standard (AES). At the same time, 32-bit ARM Cortex-M microcontrollers dominate the IoT and embedded landscape, yet, to the best of our knowledge, no constant-time software implementation of Rijndael-256 targeting this platform has been published. This paper addresses that gap. We present a constant-time bitsliced implementation of Rijndael-256 on the ARM Cortex-M4 and provide a systematic structural analysis explaining why fixslicing, the technique that achieves the best-known AES-128 performance on this platform, becomes suboptimal when applied to Rijndael-256. Specifically, the irregular ShiftRows offsets (0, 1, 3, 4) of Rijndael-256 break the uniform register rotation exploited by fixslicing, requiring eight distinct MixColumns compensation variants instead of four. We demonstrate that these compensation variants cost 3.00× as much as executing an explicit, in-place ShiftRows routing using ARM's bitfield instructions. Our macro-inlined assembly variant achieves 6,199 cycles (193.7 cycles/byte) at -O2, including packing and unpacking. We provide benchmarks across five compiler optimization levels, constant-time verification over  samples via DUDECT (maximum t-statistic well below the vulnerability threshold), and per-component cycle breakdowns, showing that the optimal bitslicing strategy is inherently cipher-specific and architecture-dependent.

ABSTRAK: Sifer blok yang lebih lebar semakin diperlukan dalam aplikasi berisipadu tinggi kerana blok 128-bit dalam mod seperti Galois/Counter Mode (GCM) mengehadkan setiap invokasi kepada kira-kira 64 GiB teks biasa bagi setiap pasangan kunci-nonce, sekali gus memerlukan strategi penukaran kunci yang lebih kompleks. Rijndael-256, iaitu varian Rijndael dengan blok 256-bit dan kunci 256-bit, telah menarik semula perhatian sebagai alternatif pelengkap berblok lebih lebar kepada Advanced Encryption Standard (AES). Pada masa yang sama, mikropengawal ARM Cortex-M 32-bit mendominasi landskap Internet of Things (IoT) dan sistem terbenam. Namun, setakat pengetahuan kami, tiada pelaksanaan perisian Rijndael-256 secara bitslicing dan masa-tetap yang menyasarkan platform ini telah diterbitkan. Makalah ini menangani jurang tersebut. Kami membentangkan pelaksanaan Rijndael-256 secara bitslicing dan masa-tetap pada ARM Cortex-M4 serta menyediakan analisis struktur yang sistematik bagi menjelaskan mengapa fixslicing, iaitu teknik yang mencapai prestasi terbaik yang diketahui bagi AES-128 pada platform ini, menjadi kurang optimum apabila diterapkan pada Rijndael-256. Secara khusus, anjakan ShiftRows Rijndael-256 yang tidak seragam, iaitu (0, 1, 3, 4), mengganggu putaran daftar seragam yang dimanfaatkan oleh fixslicing, lalu memerlukan lapan varian pampasan MixColumns yang berbeza berbanding hanya empat. Kami menunjukkan bahawa varian pampasan ini memerlukan kos 3.00× lebih tinggi berbanding pelaksanaan penghalaan ShiftRows secara eksplisit di tempat menggunakan arahan bitfield ARM. Varian kod himpunan tersisip-makro kami mencapai 6,199 kitaran, bersamaan 193.7 kitaran/bait, pada tahap pengoptimuman -O2, termasuk proses pembungkusan dan penyahbungkusan. Kami turut menyediakan penanda aras merentasi lima tahap pengoptimuman pengkompil, pengesahan masa-tetap ke atas 10? sampel menggunakan DUDECT dengan statistik-t maksimum yang jauh di bawah ambang kerentanan, serta pecahan kitaran bagi setiap komponen. Dapatan ini menunjukkan bahawa strategi bitslicing yang optimum adalah khusus kepada sifer dan sangat bergantung pada seni bina perkakasan.

Downloads

Download data is not yet available.

References

National Institute of Standards and Technology. (2023) Advanced Encryption Standard (AES). Federal Information Processing Standards Publication (FIPS) 197-upd1. doi:10.6028/nist.fips.197-upd1

Dworkin MJ. (2007) Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. NIST Special Publication 800-38D. doi:10.6028/NIST.SP.800-38D

McGrew DA, Viega J. (2005) The Security and Performance of the Galois/Counter Mode (GCM) of Operation. In: Progress in Cryptology -- INDOCRYPT 2004. Springer Berlin Heidelberg: 343-355. doi:10.1007/978-3-540-30556-9_27

Avanzi R, Chakraborti A, Chakraborty B, List E. (2025) The Large Block Cipher Vistrutah. IACR Transactions on Symmetric Cryptology. 2025(3): 1-150. doi:10.46586/tosc.v2025.i3.1-150

National Institute of Standards and Technology. (2024) Initial Pre-Draft Call for Comments on SP 800-197. NIST Cybersecurity and Privacy Reference Tool. URL: https://csrc.nist.gov/pubs/sp/800/197/iprd

Saarinen MJO. (2025) Brief Comments on Rijndael-256 and the Standard RISC-V Cryptography Extensions. Cryptology ePrint Archive, Paper 2025/1198. URL: https://eprint.iacr.org/2025/1198

Mordor Intelligence. (2025) IoT Microcontroller Market Size & Share Analysis–Growth Trends and Forecast (2025–2030). Online; accessed Apr. 2026. URL: https://www.mordorintelligence.com/industry-reports/iot-microcontroller-market

IoT Analytics. (2025) IoT Microcontrollers Market Report 2025–2030. Market Report, Oct. 2025. URL: https://iot-analytics.com/product/iot-microcontroller-market-report-2025-2030/

Adomnicai A, Peyrin T. (2021) Fixslicing AES-like Ciphers: New Bitsliced AES Speed Records on ARM-Cortex M and RISC-V. IACR Transactions on Cryptographic Hardware and Embedded Systems. 2021(1): 402-425. doi:10.46586/tches.v2021.i1.402-425

Reparaz O, Balasch J, Verbauwhede I. (2017) Dude, is my code constant time? In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017: 1697-1702. doi:10.23919/DATE.2017.7927267

Daemen J, Rijmen V. (2002) The Design of Rijndael: AES — The Advanced Encryption Standard. Springer-Verlag, Berlin, Germany.

Biham E. (1997) A Fast New DES Implementation in Software. In: Fast Software Encryption, Lecture Notes in Computer Science, vol. 1267. Springer Berlin Heidelberg: 260-272. doi:10.1007/BFb0052352

Boyar J, Peralta R. (2012) A Small Depth-16 Circuit for the AES S-Box. In: Information Security and Privacy Research, IFIP Advances in Information and Communication Technology, vol. 376. Springer Berlin Heidelberg: 287-298. doi:10.1007/978-3-642-30436-1_24

Schwabe P, Stoffelen K. (2017) All the AES You Need on Cortex-M3 and M4. In: Selected Areas in Cryptography -- SAC 2016, Lecture Notes in Computer Science, vol. 10532. Springer: 180-194. doi:10.1007/978-3-319-69453-5_10

Bernstein DJ. (2005) Cache-Timing Attacks on AES. Technical report. URL: https://cr.yp.to/antiforgery/cachetiming-20050414.pdf

Almeida JB, Barbosa M, Barthe G, Dupressoir F, Emmi M. (2016) Verifying Constant-Time Implementations. In: 25th USENIX Security Symposium. USENIX Association: 53-70.

Adomnicai A, Najm Z, Peyrin T. (2020) Fixslicing: A New GIFT Representation: Fast Constant-Time Implementations of GIFT and GIFT-COFB on ARM Cortex-M. IACR Transactions on Cryptographic Hardware and Embedded Systems. 2020(3): 402-427. doi:10.13154/tches.v2020.i3.402-427

ARM Limited. (2010) Cortex-M4 Technical Reference Manual. DDI 0439B. URL: https://developer.arm.com/documentation/ddi0439/b

STMicroelectronics. (2024) RM0090 Reference Manual: STM32F405/415, STM32F407/417, STM32F427/437 and STM32F429/439 Advanced Arm-based 32-bit MCUs. RM0090, Rev. 21.

Käsper E, Schwabe P. (2009) Faster and Timing-Attack Resistant AES-GCM. In: Cryptographic Hardware and Embedded Systems -- CHES 2009, Lecture Notes in Computer Science, vol. 5747. Springer Berlin Heidelberg: 1-17. doi:10.1007/978-3-642-04138-9_1

Downloads

Published

2026-05-10

How to Cite

Lestari, A. A., MT, S., Ramli, K., Gunawan, T. S., Agustina, E. R., & Windarta, S. (2026). Constant-Time Bitsliced Rijndael-256 on ARM Cortex-M4: On the Limitations of Fixslicing Beyond AES-128. IIUM Engineering Journal, 27(2), 340–362. https://doi.org/10.31436/iiumej.v27i2.4405

Issue

Section

Electrical, Computer and Communications Engineering

Most read articles by the same author(s)

1 2 > >>