CSCI 3500 - Operating Systems

CSCI 3500: Studio 12

Atomic Instructions

It is common to use mutexes, semaphores, or other generic locking methods to synchronize access to variables in general purpose parallel programming. However, when the data to be modified is small, it may be possible to use an atomic instruction. These are machine operations such as addition, subtraction, bitwise AND, bitwise OR, etc. that are guaranteed to happen atomically via special support between the compiler and the computer hardware. Note that such atomic instructions are not part of the C standard, but rather a special extension provided by GCC. This studio must be done on GCC. However, standard support for atomic operations has been added in C++ 11 under std::atomic.

In this studio, you will:

Use atomic intstructions to synchronize access to a variable
Measure the overhead of using these atomic instructions

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers in a text file. When finished, submit your work by sending your text file and source code to dferry_submit@slu.edu with the phrase Atomics in the subject line.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
The atomic instructions we are going to use today are not part of the C standard, so they are not documented as well as other language features that we have used. Go to the GCC reference page to look see these functions and their usage. As the answer to this question, describe the effect of the function __sync_add_and_fetch() and this function's interface.
Create a new program to test the behavior of your atomic instructions. To do so, declare a variable with initial value zero, and then use the instruction __sync_add_and_fetch() to add to it. Print out the value of your variable before and after your atomic instruction to verify that your program behaves as expected. Copy and paste your program output, and attach this file to your submission.
Now, make a copy of your code from the previous studio where we created a race condition. Previously we fixed this race condition by using a Pthreads mutex. Today we will fix the race condition with these atomic instructions.

Use the functions __sync_add_and_fetch() and __sync_sub_and_fetch() to make your adder and subtractor functions thread-safe. Run your program to verify that there is no race condition. Copy and paste your program output.
Now we want to quantify the overhead of using these atomic instructions. The special hardware support needed is not free, and we pay a price for each atomic instruction you use. As in the previous studio, modify your atomic instruction program (if necessary) so that it performs twenty million (20,000,000) atomic increments and decrements. Then time your program with the time command. Take at least three measurements and average them. Copy and paste your program results.
One question we can ask is whether the atomic instruction is more efficient than the mutex for this task, or is it the other way around? Go to your previous studio writeup and find how long it took your mutex program to run when you locked and unlocked the mutex for each individual increment and decrement. Alternately, you can re-run this experiment- if you do so, make sure you are performing twenty million increments and decrements, and make sure that you are locking and unlocking the mutex for each individual increment and decrement.
Is the mutex locking strategy faster, or are the atomic instructions faster? By how much? Express the speed difference as a ratio (for example, you could say procedure A is four times faster than procedure B).
Go back to your last studio writeup where you timed how long your racy program takes (the one with incorrect results) and record this value here. Alternately, you can re-run the experiment- make sure you are doing twenty million increments and decrements.

Compared to the racy program, how much longer does it take to run the program with atomic instructions?
Given what you now know about atomic instructions and mutex locking, give one scenario where mutex locking is preferred, and give one scenario where atomic instructions are preferred.

Optional Enrichment Exercises

No optional exercises