Survey Project

I. In the survey project, you will pick up a topic and the study the current state of research on that topic. The workload includes a presentation and a survey report. Here are the requirements:
  1. Presentation: You should deliver a big picture of the research, in a short time, to the rest of the class. A good presentation usually includes the problem description, representative designs, critical working details, evaluation methodology and results, and a brief discussions of other related work. The presentation time length is 35 minutes including questions and answers.
  2. Report: The report should achieve the same and have similar contents, but should be comprehensive in details and coverage. (For example, to be clear the presentation may be sketchy in design details or ignore some related work. The report cannot.) The report should be easy to understand. The length should be 5000-6000 words excluding the reference. (I strongly suggest that you use LaTex instead of Microsoft Word.)

You should find a partner and work in a group. A group should consist of two people. You can work on a suggested topic, or any other topic in computer architecture that interests you (talk with the instructor in the latter case).

II. You are suggested to apply the following procedure of literature search (feel free to adapt it to your comfortable style):

  1. Read the suggested papers. Read the abstract thoroughly; repeat reading it until you get a good feeling on what the paper is talking about. Then read the introduction carefully, understand what the research is about (scope), what specific issue interested the researchers (problem), what the researchers wanted to do (motivation), how they would do it (approach), what they finally got (result), and how they compared their work with others (related work).
  2. Put those papers in a set S and marked them as read.
  3. Pick up a marked paper P from S that you thought is the most important so far. Look for the papers in the reference part of P. Get those papers and read the abstracts. If you feel a paper is of significance, apply step 1 to that paper, and put it into S as marked.
  4. Again pick up a marked paper P from S that you thought is the most important so far. Use ACM digital library, IEEE digital library, or CiteSeer.IST, search for the papers that cited P, e.g. papers that put P in their reference part. If you feel a paper is of significance, apply step 1 to that paper, and put it into S as marked.
  5. Periodically, you may want to reread some marked papers in set S. Repeat that until you feel comfortable about the subject and the topic.

Now you have known the topic, and you are ready to write the survey.

III. Here is the list of suggested topics. They will be reserved FCFS in the order of request. Note: In general, a topic of larger index is newer than a topic of a small index. For example, topic 8 and 9 are the most recent work.

  1. Value Prediction

    Value prediction was proposed to remove (speculative) some RAW data dependences by predicting the output of load instructions. It exploits value locality, a program property that in many cases a load instruction returns the same value. Value prediction is not accurate for every application; however, when it works, it may improve performance significantly.

  2. Reducing the complexity of issue logic (assigned to  David Lastine and Ganesh T. Subramanian)

    Two major approaches to improve pipeline performance is to increase pipeline stages and to increase pipeline width. Most pipeline operations, e.g., instruction fetch, register renaming, register read, ALU operations and cache access, can be split into multiple pipeline stages. However, the issue logic is an obstacle in this trend: to issue dependent instructions back-to-back, the wakeup-select process in the issue logic must finish within one cycle. (Recall a drawback of the original Tomasulo algorithm: one cycle is wasted on CDB tag/data broadcast; modern pipelines uses data forwarding to address this issue, but requires issue logic of one-cycle latency.) Unfortunately, the latency of issue logic increases more than linearly with the pipeline width. Alternative designs of issue logic to reduce its complexity have been proposed in recent years.

  3. Trace cache (assigned to Ka-Ming Keung and Swamy Ponpandi)

    Wide-issue processors may execute multiple basic blocks in one cycle, raising two issues for high-bandwidth instruction delivery. First, the instruction fetched within one cycle (a fetch group) may contain more than one branches, thus the branch prediction needs to predict more than once per cycle. Second, instruction fetching unit may have to fetch instructions from non-continuous locations in the cache, increasing the pressure on instruction cache bandwidth. Trace cache was proposed to address this issue, and has been used in Intel Pentium 4 processor. In trace cache, multiple basic blocks that are likely to execute in sequence are combined together to form an instruction trace. The branch prediction is now to predict which trace will be used. The idea is simple, but the design requires in-depth studies.

  4. Processor with integrated DRAM main memory (assigned to Sam Heng Xu and Ziyu Zhang)

    As processor-memory speed gap continues to widen and with the advance of VLSI technology, it becomes very attractive to integrate DRAM main memory into the processor chip. The technology challenge was that, because of the changes in manufacturing process, processors with integrated DRAM could not run as fast as conventional processor (in terms of clock frequency). Even though, the reduction on memory stall time is impressive. Many research and development efforts have been made in this direction.

  5. Correlation-based prefetching techniques

    Correlation-based prefetching recognizes correlations between memory addresses (either reference addresses or miss addresses) and then predicts future miss addresses for prefetching. Stream buffer is a well know example. More complicated techniques have been developed to recognize complex memory access patterns.

     
  6. Precomputation-based prefetching schemes
  7. For many applications, correlation-based prefetching, such as stream buffer and Markov prefetching, is an effective and relatively simple approach for both instruction and data cache misses. However, this approach has a major drawback: it is not very accurate for applications with mixed access patterns. The low accuracy has at least two consequences: some cache misses can not be removed by prefetching, and memory traffic increases significantly (which increases queueing delay at system memory interface). Precomputation-based prefetching for OOO processors with redundant resources. It treats the normal execution as main thread, and runs one or more speculative, precomputation threads for prefetching. In other words, the prefetching address comes from a highly confident speculative execution of the original code, so the prefetching accuracy is high. The approach is still an active research target. (Note: it works for data prefetching but not instruction prefetching.)

  8. Predict performance of out-of-order superscalar processors (by Srinivas Neginhal and Anantharaman Kalyanaraman).

    As the complexity of out-of-order superscalar processor increases, analyzing and predicting application performance is increasingly complicated. Application performance is dependent on its inherent ILP, branch prediction performance, cache performance, and other factors such as TLB miss rates, even if we ignore OS activities and I/O performance. Simulation can accurately report performance statistics; however, the computation cost is so high that only a limited portion of applications and a limited portion of execution may be studied.

  9. Architecture Support for Secure Computing (assigned to Mikel Bezdek and Chun Yee Yu)

    Security is a big concern in the increasingly networked world. For example, if one submits a job to a remote machine, how does he or she trust that the remote machine would run the program genuinely? How to prevent the program from being copied by other (a secondary consideration)? Here is some recent work:
     

  10. Architectural Mechanisms against Buffer Overflow Attacks (Russ Graves and Steve Jawarski)

Virus and worms have made substantial damages to the networked world. Most of those attacks utilize the buffer overflow attacks, in which an adversary injects the attack code into the program address space and makes the code run. Traditionally, compiler approach is the major defense against those attacks, but that results in significant performance penalty. Recently a number of architectural approaches have been proposed.