An efficient cache design for scalable glueless shared-memory multiprocessors. Ros, A.; Acacio, M., E.; and García, J., M. Proceedings of the 3rd Conference on Computing Frontiers 2006, CF '06, 2006:321-330, 2006.
abstract   bibtex   
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is generally put in the critical path of every cache miss, increasing its latency. Considering the ever-increasing distance to memory, these cache coherence protocols are far from being optimal from the perspective of performance. On the other hand, shared-memory multiprocessors formed by connecting chips that integrate the processor, caches, coherence logic, switch and memory controller through a low-cost, low-latency point-to-point network (glueless shared-memory multiprocessors) are a reality. In this work, we propose a novel design for the L2 cache level, at which coherence has to be maintained, aimed at being used in glueless shared-memory multiprocessors. Our proposal splits the cache structure into two different parts: one for storing data and directory information for the blocks requested by the local processor, and another one for storing only directory information for blocks accessed by remote processors. Using this cache scheme we remove the directory from main memory. Besides saving memory space, our proposal brings very significant reductions in terms of latency of the cache misses (speed-ups of 3.0 on average), which translate into reductions in applications' execution time of 31% on average. Copyright 2006 ACM.
@article{
 title = {An efficient cache design for scalable glueless shared-memory multiprocessors},
 type = {article},
 year = {2006},
 identifiers = {[object Object]},
 keywords = {Cache coherence,Directory structure,Glueless shared-memory multiprocessors,L2 cache,Memory wall},
 pages = {321-330},
 volume = {2006},
 id = {ba0f5510-756d-354e-ba08-f2a83a4c6a0a},
 created = {2020-12-28T19:42:21.229Z},
 file_attached = {false},
 profile_id = {510a24b0-13c9-315d-ad34-c763f18f9d3e},
 group_id = {b2013bd2-d1ee-3382-aeb3-a063d2537a44},
 last_modified = {2020-12-28T19:42:21.229Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Ros2006},
 private_publication = {false},
 abstract = {Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is generally put in the critical path of every cache miss, increasing its latency. Considering the ever-increasing distance to memory, these cache coherence protocols are far from being optimal from the perspective of performance. On the other hand, shared-memory multiprocessors formed by connecting chips that integrate the processor, caches, coherence logic, switch and memory controller through a low-cost, low-latency point-to-point network (glueless shared-memory multiprocessors) are a reality. In this work, we propose a novel design for the L2 cache level, at which coherence has to be maintained, aimed at being used in glueless shared-memory multiprocessors. Our proposal splits the cache structure into two different parts: one for storing data and directory information for the blocks requested by the local processor, and another one for storing only directory information for blocks accessed by remote processors. Using this cache scheme we remove the directory from main memory. Besides saving memory space, our proposal brings very significant reductions in terms of latency of the cache misses (speed-ups of 3.0 on average), which translate into reductions in applications' execution time of 31% on average. Copyright 2006 ACM.},
 bibtype = {article},
 author = {Ros, Alberto and Acacio, Manuel E. and García, José M.},
 journal = {Proceedings of the 3rd Conference on Computing Frontiers 2006, CF '06}
}
Downloads: 0