You should find a partner and work in a group. A group should consist of two people. You can work on a suggested topic, or any other topic in computer architecture that interests you (talk with the instructor in the latter case).
II. You are suggested to apply the following procedure of literature search (feel free to adapt it to your comfortable style):
Now you have known the topic, and you are ready to write the survey.
III. Here is the list of suggested topics. They will be reserved FCFS in the order of request. Note: In general, a topic of larger index is newer than a topic of a small index. For example, topic 8 and 9 are the most recent work.
Value prediction was proposed to remove (speculative) some RAW data dependences by predicting the output of load instructions. It exploits value locality, a program property that in many cases a load instruction returns the same value. Value prediction is not accurate for every application; however, when it works, it may improve performance significantly.
Two major approaches to improve pipeline performance is to increase pipeline stages and to increase pipeline width. Most pipeline operations, e.g., instruction fetch, register renaming, register read, ALU operations and cache access, can be split into multiple pipeline stages. However, the issue logic is an obstacle in this trend: to issue dependent instructions back-to-back, the wakeup-select process in the issue logic must finish within one cycle. (Recall a drawback of the original Tomasulo algorithm: one cycle is wasted on CDB tag/data broadcast; modern pipelines uses data forwarding to address this issue, but requires issue logic of one-cycle latency.) Unfortunately, the latency of issue logic increases more than linearly with the pipeline width. Alternative designs of issue logic to reduce its complexity have been proposed in recent years.
J. Stark, M. D. Brown, and Yale N. Patt,
On pipelining dynamic instruction scheduling logic. Micro 2000.D. Ernst and T. Austin. Efficient dynamic scheduling through
tag elimination. ISCA 2002.
Wide-issue processors may execute multiple basic blocks in one cycle, raising two issues for high-bandwidth instruction delivery. First, the instruction fetched within one cycle (a fetch group) may contain more than one branches, thus the branch prediction needs to predict more than once per cycle. Second, instruction fetching unit may have to fetch instructions from non-continuous locations in the cache, increasing the pressure on instruction cache bandwidth. Trace cache was proposed to address this issue, and has been used in Intel Pentium 4 processor. In trace cache, multiple basic blocks that are likely to execute in sequence are combined together to form an instruction trace. The branch prediction is now to predict which trace will be used. The idea is simple, but the design requires in-depth studies.
As processor-memory speed gap continues to widen and with the advance of VLSI technology, it becomes very attractive to integrate DRAM main memory into the processor chip. The technology challenge was that, because of the changes in manufacturing process, processors with integrated DRAM could not run as fast as conventional processor (in terms of clock frequency). Even though, the reduction on memory stall time is impressive. Many research and development efforts have been made in this direction.
Correlation-based prefetching recognizes correlations between memory addresses (either reference addresses or miss addresses) and then predicts future miss addresses for prefetching. Stream buffer is a well know example. More complicated techniques have been developed to recognize complex memory access patterns.
For many applications, correlation-based prefetching, such as stream buffer and Markov prefetching, is an effective and relatively simple approach for both instruction and data cache misses. However, this approach has a major drawback: it is not very accurate for applications with mixed access patterns. The low accuracy has at least two consequences: some cache misses can not be removed by prefetching, and memory traffic increases significantly (which increases queueing delay at system memory interface). Precomputation-based prefetching for OOO processors with redundant resources. It treats the normal execution as main thread, and runs one or more speculative, precomputation threads for prefetching. In other words, the prefetching address comes from a highly confident speculative execution of the original code, so the prefetching accuracy is high. The approach is still an active research target. (Note: it works for data prefetching but not instruction prefetching.)
As the complexity of out-of-order superscalar processor increases, analyzing and predicting application performance is increasingly complicated. Application performance is dependent on its inherent ILP, branch prediction performance, cache performance, and other factors such as TLB miss rates, even if we ignore OS activities and I/O performance. Simulation can accurately report performance statistics; however, the computation cost is so high that only a limited portion of applications and a limited portion of execution may be studied.
Security is a big concern in the increasingly
networked world. For example, if one submits a job to a remote machine, how
does he or she trust that the remote machine would run the program genuinely?
How to prevent the program from being copied by other (a secondary
consideration)? Here is some recent work:
Virus and worms have made substantial damages to the networked world. Most of those attacks utilize the buffer overflow attacks, in which an adversary injects the attack code into the program address space and makes the code run. Traditionally, compiler approach is the major defense against those attacks, but that results in significant performance penalty. Recently a number of architectural approaches have been proposed.
- Crispin Cowan et al. "StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks." In Proceedings of the 7th USENIX Security Conference, 1998. (This is the most famous compiler approach; it is not an architectural work, but provides a good instruction to the problem.)
- C. Pyo and G. Lee. Encoding function pointers and memory arrangement checking against buffer overflow attack. In Proceedings of the 4th International Conference on Information and Communications Security, pages 25–36, 2002.
- J. Xu, R. K. Iyer, S. Patel, and Z. Kalbarczyk. Architecture support for defending against buffer overflow attacks. In Proceedings of the 2nd Wrokshop on Evaluating and Architecting System dependability, 2002.
- Y.-J. Park and G. Lee. Repairing return address stack for buffer overflow protection. In Proceedings of the 1st Conference on Computing Frontiers, pages 335–342, Ischia, Italy, 2004.