TC-Release++: an efficient timestamp-based coherence protocol for many-core architectures
Version 2 2024-06-03, 11:49Version 2 2024-06-03, 11:49
Version 1 2017-07-26, 15:35Version 1 2017-07-26, 15:35
journal contribution
posted on 2024-06-03, 11:49authored byY Yao, W Chen, T Mitra, Y Xiang
As we enter the era of many-core, providing the shared memory abstraction through cache coherence has become progressively difficult. The standard directory-based coherence does not scale well with increasing core count. Timestamp-based hardware coherence protocols introduced recently offer an attractive alternative solution. This paper proposes a timestamp-based coherence protocol, called TC-Release++, that efficiently supports cache coherence in large-scale systems. Our approach is inspired by TC-Weak, a recently proposed timestamp-based coherence protocol targeting GPU architectures. We first design TC-Release in an attempt to straightforwardly port TC-Weak to general-purpose many-cores. But re-purposing TC-Weak for general-purpose many-core architectures is challenging due to significant differences both in architecture and the programming model. Indeed the performance of TC-Release turns out to be worse than conventional directory protocols. We overcome the limitations and overheads of TC-Release by exploiting simple hardware support to eliminate frequent memory stalls, and an optimized lifetime prediction mechanism to improve cache performance. The resulting optimized coherence protocol TC-Release++ is highly scalable (storage scales logarithmically with core count) and shows better performance (3.0%) and comparable network traffic (within 1.3%) relative to the baseline MESI directory protocol. We use Murphi to formally verify that TC-Release++ is error-free and imposes small verification cost.
History
Journal
IEEE transactions on parallel and distributed systems