Topics: Fundamentals of Datapath Organization, Sequential Datapath, Pipelining, Pipelined Datapath, Data Dependencies and Hazards, and Cache Organization
Related Reading: Chapter 4 (predominantly Sections 4.1 - 4.5), and class notes
Practice problems from the textbook (answers are at the end of the chapter):
When you turn in your assignment, you must include a signed cover sheet (PDF version) with your assignment (your assignment will not be graded without a completed cover sheet).
You are allowed to submit your assignment via email, but if you choose to do so, you must bring a hardcopy of your assignment along with a completed cover sheet to the instructor at the next class. (Note: Do not email the instructor any .zip file attachments, as SLU's email may not accept these emails; i.e. the instructor may not receive your email.)
For Practice Problem 4.13 on pp. 387-388, first complete the problem ( you do not need to submit an answer, since the answer is given at the back of chapter; but be sure you understand and can reproduce the same answer)
Then, using the figure of the SEQ datapath (PPT)
For highlighting the datapath, assume the following initial values: PC = 0x5C4, %esp = 0x80C4
Repeat problem #1 for the instruction: rmmovl %edi, 12(%ebp)
Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)
Assume the following initial values: PC = 0x7C2, %edi = -18, %ebp = 0x914A
Also assume that the initial values for each 4-byte value in memory is the value equivalent to half of its address (e.g. the 4-byte value at M[0x200] is 0x100).
Repeat problem #1 for the instruction: jg L5
Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)
Assume the following initial values: PC = 0x4B13 and the address of L5 = 0x4B02
Also assume that condition codes (computed via cmpl in the previous instruction) evaluate to TRUE for jg
A group of computer architects is designing a new hardware datapath implementation and have determined that the following circuit elements have these delays:
instruction memory | 180ps |
decode | 90ps |
register fetch | 150ps |
ALU | 170ps |
data memory | 230ps |
register writeback | 150ps |
Note: For each of the questions below, be sure to include the time for the 20ps register at the end of each clock cycle (such as is done in the notes).
How long is the clock cycle for a single-cycle (sequential) datapath implementation? What is the corresponding frequency of the processor?
How long would the fastest clock cycle be for a 5-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 5-stage pipelined datapath (based on ratio of clock frequencies)?
Note: When combining circuits into stages, circuits MUST be combined in the given order (e.g. you can't combine decode with data memory), since each circuit must execute in the correct time order.
How long would the fastest clock cycle be for a 9-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 9-stage pipelined datapath than the single-cycle pipeline (based on ratio of clock frequencies)?
Note: You may assume any of the given circuit elements above may be split equally into halves, thirds, or quarters.
For the following sequence of code:
popl %esi subl %esi, %eax irmovl $0x16, %edx mrmovl 8(%esp), %ebx addl %ebx, %eax addl %edx, %esi xorl %eax, %edx
Identify all the true (a.k.a. read-after-write) data dependencies. Identify them by drawing circles around the dependent operands and drawing arrows between them (i.e. draw an arrow from each register write to the next read of that register... in a fashion similar to that used in class, and demonstrated on p. 419).
Using a high-level pipeline representation (such as used in Figures 4.44 or 4.54), for the sequence of code given above, show the state of the pipeline from cycles t through t + 16 (i.e. until you run out of room on the paper). Assume the popl instruction is in the Fetch stage in cycle t. You are welcome to use the following high-level pipeline diagram.
Note: Assume that forwarding/bypassing is being used.
Hint: Be mindful of potential load-use hazards.
At the end of the t + 5 cycle of execution, which registers are being read and which are being written?