The combination of the high bandwidth demands from the processing cores, together with the high cost of cache misses
, makes this issue especially critical for future chip-multiprocessors.
As the mean query generate time increases the number of uplink requests decreases because there will be less number of cache misses
The AVR32 core was designed from the ground up as a low clock frequency, low-power CPU with special emphasis on 1) maximizing the use of computational resources with a 7 stage pipeline and three parallel sub- pipelines that supports automatic data forwarding and out-of-order execution, 2) single-cycle load/store instructions with pointer arithmetic that reduces cycles required for load/store, 3) accurate branch prediction with zero-penalty branches and 4) maximizing code density to reduce cache misses
Direct cache-miss measurements indicate that the difference in performance is largely due to differences in the number of level-2 cache misses
that the two algorithms generate.
Then, in Section 3 we depict some important research efforts for accelerating cache misses
in these architectures.
The 500 MHz LX4580 implements Hardware Multi-Threading (HMT) in order to minimize idle time due to cache misses
, achieving a 3x performance improvement compared to other 32-bit CPUs in High-Touch applications.
As the first RDIMM compatible memory designed specifically to overcome the bottlenecks manifested in today's 64-bit processors, HyperCloud reduces cache misses
that are common when simulating large models.
Unfortunately, due to the limited size of the cache, three types of cache misses
occur in a single processor system: compulsory, capacity, and conflict.
The block index is sent to the cluster indicated by its higher order bit (it also indicates the cache way) and the two 512-entry tag tables are used in parallel for checking way and cache misses
Since all RISC processors must access data from external sources, even a modest 1% access rate due to cache misses
can translate to more than 20% performance difference between these two CPUs.
On-chip memory significantly reduces overall CPU time requirements by eliminating external bus cycles for cache misses
Profile Viewer: The Profile Viewer window displays easy links for optimization, as well as performance metrics including cycle count, pipeline stalls and cache misses