Assignment #7: ARM Assembly and Functions, and Virtual Memory and Memory Hierarchies

Assigned:   Wednesday, Apr. 24
Due:   Wednesday, May 1   by midnight

Contents:


Overview

Topic: The ARM v7 ISA, ARM Assembly Programming, Procedure Calls, and the Stack Frame
Related Reading: rpi-asm Chap 10, Chap 13, Chap. 5 of zyBooks, and class lecture notes


Problems to be Submitted (25 points)

When you turn in your assignment, you must include a signed cover sheet (PDF version) with your assignment (you're assignment will not be graded without a completed cover sheet).

You are allowed to submit your assignment via email, but if you choose to do so, you must bring a hardcopy of your assignment along with a completed cover sheet to the instructor at the next class. (Note: Do not email the instructor any .zip file attachments, as SLU's email may not accept these emails; i.e. the instructor may not receive your email.)

Note: Code for all the problems below can be downloaded along with a Makefile in hw7_arm_code.zip.

  1.   (6 points)

    Given the following ARM assembly code (sum_nums_asm.s) that sums the numbers from 1 to N, answer the questions below:

            mov     r1, #0                  // sum = 0
            mov     r3, #1                  // i = 1
    
        .loop:
            cmp     r3, #20
            bgt     .endloop
            add     r1, r1, r3
            add     r3, r3, #1
            b       .loop
    
        .endloop:
            ...
    

    Note:   Assume the variables of i and sum are stored in registers r3 and r1, respectively.

    Questions:

    1.   What one line would you need to add and/or change in order to sum the following number sequence?
             1 4 7 10 13 16 ...

    2.   What two lines (of the original code) would you need to add and/or change in order to compute the factorial of N?

    3.   What two lines (of the original code) would you need to add and/or change in order to compute the sum of squares from 1 to N?
             i.e. 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + ...
                    = 1 + 4 + 9 + 16 + 25 + ...

    4.   Add a conditional to the loop that checks if the number is even or odd, and negates the value if it's odd. In essence, this will generate the sum:
             -1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + ...

      Hint: What one bit in a binary integer representation indicates whether the number is even or odd? And what bit-wise Boolean operation can you use to isolate that number?

  2.   (4 points)

    Given the following assembly program, some_func.s, determine the equivalent C code corresponding to the procedure some_func(), and fill in the blanks below:

    These blanks may include assignment statements, comparisons, and/or other conditional statements (if-then-else). Also, you may not need to use all the blanks.

    Many of the blanks will be of the form:   result = ...

    Note: This file can be downloaded along with a Makefile in hw7_arm_code.zip.

        int some_func (int x, int y)
        {
            int result = ________________________________________;
    
            if (( x ___________________________) || ( ___________________________))
            {
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
            }
            else
            {
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
    
                _________________________________________________;
            }
    
            return result;
        }
    

  3.   (5 points)

    Create a function in ARM assembly that computes the first n values of the Padovan sequence., where n is the input value to the funciton. The equation for the Padovan sequence is:

          P(k) = 1                   for k = 0, 1, 2
          P(k) = P(k-2) + P(k-3)     for k >= 3
    

    Here's the corresponding C code:

          int p[100];
    
          void pad_seq (int n)
          {
    
              p[0] = 1;				// P(0)
              p[1] = 1;				// P(1)
              p[2] = 1;				// P(2)
    
              for (k = 3; k < n; i++)
                  p[k] = p[k-2] + p[k-3];		// P(k)
          }
    

    Give the equivalent sequence of assembly instructions for computing the first n values of the Padovan sequence.

    Use the file pad_seq_asm.s as the starting point for your code, as it also declares and allocates the space for the p[] array, where you'll save the resulting values. Likewise, the Makefile expects to find this file, and compiles it in combination with pad_main.c to produce the pad_seq_asm executable, which will print out the Padovan values.

    The program will need a number of registers and/or variables to manage the values of k, n, P(k), P(k-2), and P(k-3) for the current k.

    You should turn in a .s file that can be compiled and run on a Pi.

  4.   (4 points)

    Assume a virtual memory organization with:

    Following is a sampling of a portion of the page table for this virtual memory organization:

    Given this table, convert the specified virtual address to the corresponding physical address for each of the following virtual addresses. If no valid mapping exists, indicate it's a page fault (i.e. the page is not in physical memory):

    1.   0x08C54A9
    2.   0x08C465C
    3.   0x08C236E
    4.   Finally, determine another valid virtual address (with a different VPN than in parts a - c) and give the corresponding physical address to which it maps.

      Note: For part d, you may choose whichever valid VPN and page offset you'd like (though I'd prefer you avoid really simple page offsets like 0x040).

  5.   (6 points)

    A given computer system has the following cache access times:

    1.   Program A has the following miss rates:

      • L1 cache:   5%
      • L2 cache:   30%
      • L3 cache:   60%
      • Main memory:   (assume it always hits in main memory; i.e. miss rate is 0%)

      1.   What are the hit rates for each level of cache?

      2.   What is the average memory access time (AMAT) for Program A?

    2.   Program B has the following hit rates:

      • L1 cache:   93%
      • L2 cache:   80%
      • L3 cache:   60%
      • Main memory:   (assume it always hits in main memory; i.e. hit rate is 100%)

      1.   What are the miss rates for each level of cache?

      2.   What is the average memory access time (AMAT) for Program B?

    3.   Oftentimes, the program with the lower L1 cache miss rate has the better (lower) average memory access time (AMAT)? Is that true in this case? If not, why?

    4.   For Program A, what speedup would be achieved if you were able to optimize the program to cut the miss rate of the L2 cache in half (i.e. to 15%) ?
           Note: "Speedup" refers to the ratio (fraction) of the original AMAT (with 30% miss rate) to the new AMAT (with 15% miss rate).

    5.   For Program B, if the L1 cycle access time was increased from 3 cycles to 4 cycles (during architecture design), what minimum hit rate would be needed in the L1 cache to achieve the same AMAT?