Assignment #6: Processor Architecture, Datapaths, and Pipelining

Overview

Topics: Fundamentals of Datapath Organization, Sequential Datapath, Pipelining, Pipelined Datapath, Data Dependencies and Hazards, and Cache Organization
Related Reading: Chapter 4 (predominantly Sections 4.1 - 4.5), and class notes

Practice Problems

Practice problems from the textbook (answers are at the end of the chapter):

Practice Problem 4.13 on pp. 387-388.
Practice Problem 4.14 on p. 390.
Practice Problem 4.15 on p. 391.
Practice Problem 4.16 on p. 392.
Practice Problem 4.18 on p. 394.
Practice Problem 4.28 on p. 417.

Problems to be Submitted (25 points)

You are allowed to submit your assignment via email, but if you choose to do so, you must bring a hardcopy of your assignment along with a completed cover sheet to the instructor at the next class. (Note: Do not email the instructor any .zip file attachments, as SLU's email may not accept these emails; i.e. the instructor may not receive your email.)

(4 points)
First, complete Practice Problem 4.13 on pp. 387-388 ( you do not need to submit an answer, since the answer is given at the back of chapter; but be sure you understand and can reproduce the same answer)

Then, using the figure of the SEQ datapath (PPT) diagram the execution of popl %ecx by:
- highlighting (via a colored pen or highlighter) the active wires and hardware units in the datapath
- indicating (on the figure) the specific values on each of the active wires
For highlighting the datapath, assume the following initial values: PC = 0x5C4, %esp = 0x80C4
(5 points)
Repeat problem #1 for the instruction: rmmovl %edi, 12(%ebp)

Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)

Assume the following initial values: PC = 0x7C2, %edi = -18, %ebp = 0x914A

Also assume that the initial values for each 4-byte value in memory is the value equivalent to half of its address (e.g. the 4-byte value at M[0x200] is 0x100).
(5 points)
Repeat problem #1 for the instruction: jg L5

Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)

Assume the following initial values: PC = 0x4B13 and the address of L5 = 0x4B02

Also assume that condition codes (computed via cmpl in the previous instruction) evaluate to TRUE for jg

(5 points)

A group of computer architects is designing a new hardware datapath implementation and have determined that the following circuit elements have these delays:

            instruction memory       180ps

            decode 90ps

            register fetch 150ps

            ALU 170ps

            data memory 230ps

            register writeback 150ps

Note: For each of the questions below, be sure to include the time for the 20ps register at the end of each clock cycle (such as is done in the notes).

How long is the clock cycle for a single-cycle (sequential) datapath implementation? What is the corresponding frequency of the processor?
How long would the fastest clock cycle be for a 5-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 5-stage pipelined datapath (based on ratio of clock frequencies)?

Note: When combining circuits into stages, circuits MUST be combined in the given order (e.g. you can't combine decode with data memory), since each circuit must execute in the correct time order.
How long would the fastest clock cycle be for a 9-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 9-stage pipelined datapath than the single-cycle pipeline (based on ratio of clock frequencies)?

Note: You may assume any of the given circuit elements above may be split equally into halves, thirds, or quarters.

(6 points)
For the following sequence of code:
```
         popl     %esi
         subl     %esi, %eax
         irmovl   $0x16, %edx
         mrmovl   8(%esp), %ebx
         addl     %ebx, %eax
         addl     %edx, %esi
         xorl     %eax, %edx
```
1. Identify all the true (a.k.a. read-after-write) data dependencies. Identify them by drawing circles around the dependent operands and drawing arrows between them (i.e. draw an arrow from each register write to the next read of that register... in a fashion similar to that used in class, and demonstrated on p. 419).
2. Using a high-level pipeline representation (such as used in Figures 4.44 or 4.54), for the sequence of code given above, show the state of the pipeline from cycles t through t + 16 (i.e. until you run out of room on the paper). Assume the popl instruction is in the Fetch stage in cycle t. You are welcome to use the following high-level pipeline diagram.
       Note: Assume that forwarding/bypassing is being used.
       Note: Use pipeline stalls and bubbling to ensure correctness.
       Hint: Be mindful of potential load-use hazards.
3. At the end of the t + 5 cycle of execution, which registers are being read and which are being written?

instruction memory	180ps
decode	90ps
register fetch	150ps
ALU	170ps
data memory	230ps
register writeback	150ps