By reformulating the problem into the simultaneous processing of a data and a control stream, cache miss penalties could be significantly reduced.
As we have stated in Section 1, private L2 cache organizations suffer from lower L1 cache miss latencies than shared L2 cache architectures at the expense of poor cache storage utilization.
Due to cache miss, the UIR strategy broadcasts the requested data items only after the next IR, whereas our strategy also broadcasts the requested data items after every UIR (as part of RR).
Suppose that during the processing of supernode j, the algorithm references a datum that is already in the cache, so no cache miss occurs.
This paper presents a survey of some of the proposals that have recently appeared focusing on two of these factors: the increased cost in terms of hardware overhead that the use of directories entails, and the long cache miss latencies observed in these designs as a consequence of the indirection introduced by the access to the directory.