Finally, Rijndael's internal round structure appears to have good potential to benefit from instruction-level parallelism.
5.3.3 Instruction-Level Parallelism. The Alpha 21164 processor, on which our experiments were run, is a superscalar machine that can execute up to four instructions per cycle, provided that various scheduling constraints are satisfied.
A Chip-Multiprocessor (CMP) [4] is a static, highly distributed design that exploits moderate amounts of the instruction-level parallelism (ILP) on a fixed number of threads.

