CSCI 3500 - Operating Systems

CSCI 3500: Studio 14

OpenMP Configuration

In this studio, you will:

Configure the way that OpenMP executes a parallel program

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers in a text file. When finished, submit your work via the Git repository.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
Create a parallel-for loop using the previous studio as a template. We are interested in how this loop is actually partitioned across different threads. To help us with this, OpenMP assigns its threads a unique identifying number, and provides functions to access this. For each iteration of your loop you should print out the loop index and the currently executing thread in a single printf() statement. Use the function omp_get_thread_num() to access the currently executing thread's number. Copy and paste your program output.

Note- the OpenMP functions are not included in Linux's man pages, but you can access a reference sheet at the following link. Be sure to include omp.h and compile with -fopenmp

http://www.openmp.org/mp-documents/OpenMP-4.5-1115-CPP-web.pdf
You might wonder just how many threads OpenMP will make. You can query this as well. Print out the maximum number of threads OpenMP will use with the function omp_get_max_threads(). Print out your result.
The maximum number of threads on Hopper is great for high performance, but that many threads will become confusing quickly. Set the maximum number of threads OpenMP should use to five (5) with the function omp_set_num_threads(), and then re-run your loop from the first exercise. Copy and paste your results.
You might wonder how fairly OpenMP schedules its work. Set your loop to have 25 iterations and re-run your loop. How many iterations does each thread handle?
Think about a situation in which it would be undesirable or unfair for OpenMP to assign the same number of loop iterations to each thread. When might that be a bad idea?
Let's simulate a bad situation for OpenMP by making the first five loop iterations take much longer than the others. Use the Linux sleep() function to cause the first five loop iterations to sleep for one second. That is, inside your loop insert the statement: if ( index <= 4 ) sleep(1);.

What do you think will happen? If the work is split evenly across five threads then the program should take about one second. However, if the first five loop iterations are allocated to a single thread, then the program will take about five seconds.

Run your program and confirm or deny your hypothesis. Time your program with the time command.
The above behavior is caused by OpenMP's default scheduling policy, which statically assigns work to each thread. This means that work is assigned at compile time, which can be very efficient, but it is not possible to adapt the behavior of the system after that point. However, OpenMP supports many different scheduling strategies. Configure your system to dynamically assign work by modifying your parallel for loop declaration:

#pragma omp parallel for schedule( dynamic, 1 )

How long does your program take now?
What else has changed about how your program is scheduled?
The second parameter to the schedule() modifier is called the chunk size. Modify this value and observe the effects. What do you think the chunk size specifies?

Optional Enrichment Exercises

No optional exercises