Cache-Optimal Methods for Bit-Reversals
Z. Zhang and X. Zhang
Proceedings of Supercomputing'99, November, 1999.
Bit-reversals are representative and important data reordering operations
in many scientific computations. Performance degradation is mainly
caused by cache conflict misses. Bit-reversals are often repeatedly
used as fundamental subroutines for many scientific programs. Thus, in
order to gain the best performance, cache-optimal methods and their
implementations should be carefully and precisely done at the programming
level. This type of performance programming for some special programs,
such as the data reorderings, may significantly outperform an optimization
from an automatic tool, such as a compiler.
In this paper, we examine different methods using techniques of
blocking, buffering, and padding for efficient implementations. We
evaluate the merits and limits of each technique and their application and
architecture-dependent conditions for developing cache-optimal methods.
We present two contributions in this paper: (1) Our integrated blocking
methods, which match cache associativity and TLB cache size and which
fully use the available registers are cache-optimal and fast. (2)
We show that our padding methods outperform other software oriented
methods, and believe they are the fastest in terms of minimizing both
CPU and memory access cycles. Since the padding methods are almost
independent of hardware, they could be widely used on many uniprocessor
workstations and SMP multiprocessors.