Zhichun Zhu, Zhao Zhang, and Xiaodong Zhang, "Fine-grain Priority Scheduling on Multi-channel Memory Systems," In the Proceedings of the 8th International Symposium on High Performance Computer Architecture, Cambridge, MA, February 2-6, 2002, pp. 107-116.

Abstract

Configurations of contemporary DRAM memory systems become increasingly complex.  A recent study shows that application performance is highly
sensitive to choices of configurations, and suggests that tuning burst sizes and channel configurations be an effective way to optimize the DRAM performance for a given memory-intensive workload.  However, this approach is workload dependent.  In this study we show that, by
utilizing fine-grain priority access scheduling, we are able to find a workload independent configuration that achieves optimal performance
on a multi-channel memory system.  Our approach can well utilize the available high concurrency and high bandwidth on such memory systems,
and effectively reduce the memory stall time of memory-intensive applications.  Conducting execution-driven simulation of a 4-way issue,
2 GHz processor, we show that the average performance improvement for fifteen memory-intensive SPEC2000 programs by using an optimized
fine-grain priority scheduling is about 13% and 8% for a 2-channel and a 4-channel Direct Rambus DRAM memory systems, respectively, compared
with gang scheduling.  Compared with burst scheduling, the average performance improvement is 16% and 14% for the 2-channel and 4-channel
memory systems, respectively.