Next generation branch prediction
technology, deeper buffers for more instruction parallelism, and increased clock speeds combine to help increase performance while maintaining overall TDP.
In recent years, research has explored more advanced branch prediction
techniques such as neural networks  and other forms of machine learning.
All these proposals rely on the use of confidence estimation, a mechanism that assesses the quality of branch predictions
by means of estimating the probability of a dynamic branch prediction
to be correct or incorrect.
In addition, productivity applications include code which is not especially ordered or predictable, which will necessarily limit gains in the chip's branch prediction
mechanisms, mechanisms that rely on predictable code to increase performance.
Keywords: Branch Prediction
, Two-Level Predictor, Gshare, Generalized Branch Predictor
Also addressed are programming fundamentals such as arithmetic instructions, memory accesses, control flow instructions, caching and performance, and intermediate and advanced microprocessor concepts, such as pipelining, superscalar execution, branch prediction
The AVR32 core was designed from the ground up as a low clock frequency, low-power CPU with special emphasis on 1) maximizing the use of computational resources with a 7 stage pipeline and three parallel sub-pipelines that supports automatic data forwarding and out-of-order execution, 2) single-cycle load/store instructions with pointer arithmetic that reduces cycles required for load/store, 3) accurate branch prediction
with zero-penalty branches and 4) maximizing code density to reduce cache misses.
The researchers propose an event-driven multithreaded dynamic optimization framework, a distributed control path architecture for VLIW processors, a simple divide and conquer approach for neural-class branch prediction
, and a hardware prefetching technique for chip multiprocessors.
A dynamic branch prediction
circuit eliminates idle cycles during execution of change-of-flow instructions, thereby accelerating new and existing StarCore programs by an average of 10%.
Other themes are coherence, applying compilers and debugging support, chip multiprocessor memory hierarchies, runahead and branch prediction
, inconnection networks, and load and store queues.
We statically partition the conditional branch prediction
tables by half and provide two independent units supporting a prediction rate of two conditional branches per cycle (one per cluster).
ARC was able to achieve clock speeds of 400MHz and above through a sophisticated pipeline structure that supports out-of-order completion, non-blocking access, 2-level hit-under-miss scheduling, and configurable dynamic branch prediction
for maximum throughput.