Assignment #6: Processor Architecture, Datapaths, and Pipelining

Contents:


Overview

Topics: Fundamentals of Datapath Organization, Sequential Datapath, Pipelining, Pipelined Datapath, Data Dependencies and Hazards, and Cache Organization
Related Reading: Chapter 4 (predominantly Sections 4.1 - 4.5), and class notes


Practice Problems

Practice problems from the textbook (answers are at the end of the chapter):


Problems to be Submitted (25 points)

You are allowed to submit your assignment via email, but if you choose to do so, you must bring a hardcopy of your assignment along with a completed cover sheet to the instructor at the next class. (Note: Do not email the instructor any .zip file attachments, as SLU's email may not accept these emails; i.e. the instructor may not receive your email.)

  1. (4 points)

    First, complete Practice Problem 4.13 on pp. 387-388 ( you do not need to submit an answer, since the answer is given at the back of chapter; but be sure you understand and can reproduce the same answer)

    Then, using the figure of the SEQ datapath (PPT) diagram the execution of popl %ecx by:

    For highlighting the datapath, assume the following initial values:   PC = 0x5C4, %esp = 0x80C4

  2. (5 points)

    Repeat problem #1 for the instruction:      rmmovl %edi, 12(%ebp)

    Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)

    Assume the following initial values:   PC = 0x7C2, %edi = -18, %ebp = 0x914A

    Also assume that the initial values for each 4-byte value in memory is the value equivalent to half of its address (e.g. the 4-byte value at M[0x200] is 0x100).

  3. (5 points)

    Repeat problem #1 for the instruction:      jg L5

    Be sure to turn in both the highlighted datapath diagram and the specific datapath processing results (in terms of icode, ifun, rA, rB, valA, valB, valC, valE, valP, etc., akin to the table in the Aside on p. 388)

    Assume the following initial values:   PC = 0x4B13 and the address of L5 = 0x4B02

    Also assume that condition codes (computed via cmpl in the previous instruction) evaluate to TRUE for jg

  4. (5 points)

    A group of computer architects is designing a new hardware datapath implementation and have determined that the following circuit elements have these delays:

                instruction memory       180ps
                decode 90ps
                register fetch 150ps
                ALU 170ps
                data memory 230ps
                register writeback 150ps

        Note: For each of the questions below, be sure to include the time for the 20ps register at the end of each clock cycle (such as is done in the notes).

    1. How long is the clock cycle for a single-cycle (sequential) datapath implementation? What is the corresponding frequency of the processor?

    2. How long would the fastest clock cycle be for a 5-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 5-stage pipelined datapath (based on ratio of clock frequencies)?

           Note: When combining circuits into stages, circuits MUST be combined in the given order (e.g. you can't combine decode with data memory), since each circuit must execute in the correct time order.

    3. How long would the fastest clock cycle be for a 9-stage pipelined datapath? What is the corresponding frequency of the processor? How much faster is the 9-stage pipelined datapath than the single-cycle pipeline (based on ratio of clock frequencies)?

           Note: You may assume any of the given circuit elements above may be split equally into halves, thirds, or quarters.

  5. (6 points)

    For the following sequence of code:

             popl     %esi
             subl     %esi, %eax
             irmovl   $0x16, %edx
             mrmovl   8(%esp), %ebx
             addl     %ebx, %eax
             addl     %edx, %esi
             xorl     %eax, %edx
    
    1. Identify all the true (a.k.a. read-after-write) data dependencies. Identify them by drawing circles around the dependent operands and drawing arrows between them (i.e. draw an arrow from each register write to the next read of that register... in a fashion similar to that used in class, and demonstrated on p. 419).

    2. Using a high-level pipeline representation (such as used in Figures 4.44 or 4.54), for the sequence of code given above, show the state of the pipeline from cycles t through t + 16 (i.e. until you run out of room on the paper). Assume the popl instruction is in the Fetch stage in cycle t. You are welcome to use the following high-level pipeline diagram.
           Note: Assume that forwarding/bypassing is being used.
           Note: Use pipeline stalls and bubbling to ensure correctness.
           Hint: Be mindful of potential load-use hazards.

    3. At the end of the t + 5 cycle of execution, which registers are being read and which are being written?