Decoupled DIMM: Building High-Bandwidth Memory System Using Low-Speed DRAM Devices

Hongzhong Zheng, Jiang Lin, Zhao Zhang, and Zhichun Zhu.

In Proceedings of the 36th International Symposium on Computer Architecture (ISCA), Austin, TX, June 20-24, 2009. PDF

Abstract

The widespread use of multicore processors has dramatically increased the demands on high bandwidth and large capacity from memory systems. In a conventional DDR2/DDR3 DRAM memory system, the memory bus and DRAM devices run at the same data rate. To improve memory bandwidth, we propose a new memory system design called decoupled DIMM that allows the memory bus to operate at a data rate much higher than that of the DRAM devices. In the design, a synchronization buffer is added to relay data between the slow DRAM devices and the fast memory bus; and memory access scheduling is revised to avoid access conflicts on memory ranks. The design not only improves memory bandwidth beyond what can be supported by current memory devices, but also improves reliability, power efficiency, and cost effectiveness by using relatively slow memory devices. The idea of decoupling, precisely the decoupling of bandwidth match between memory bus and a single rank of devices, can also be applied to other types of memory systems including FB-DIMM.

Our experimental results show that a decoupled DIMM system of 2667MT/s bus data rate and 1333MT/s device data rate improves the performance of memory-intensive workloads by 51% on average over a conventional memory system of 1333MT/s data rate. Alternatively, a decoupled DIMM system of 1600MT/s bus data rate and 800MT/s device data rate incurs only 8% performance loss when compared with a conventional system of 1600MT/s data rate, with 16% reduction on the memory power consumption and 9% saving on memory energy.