CSE 522S - Advanced Operating Systems

CSE 522S: Studio 7

Process Family Tree

There were many Bagginses and Boffins, and also many Tooks and Brandybucks; there were various Grubbs (relations of Bilbo Baggins' grandmother), and various Chubbs (connexions of his Took grandfather); and a selection of Burrowses, Bolgers, Bracegirdles, Brockhouses, Goodbodies, Hornblowers and Proudfoots. Some of these were only very distantly connected with Bilbo, and some of them had hardly ever been in Hobbiton before, as they lived in remote corners of the Shire.

—The Fellowship of the Ring, Book 1, Chapter 1

Processes are one of the most critical abstractions in any operating system. They are the basis for scheduling, memory management, accounting, and more. Even the kernel itself runs as a process!

In this studio, you will:

Write simple userspace programs working with processes
Learn how to do simple kernel module I/O
Write a kernel module that explores the process data structure, task_struct

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers, and when finished email your results to dferry@email.wustl.edu with the phrase Process Family Tree in the subject line.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

As the answer to the first exercise, list the names of the people who worked together on this studio.
Write a short program called fork.c. Use the fork() function to spawn a child process from a parent process. The parent process should print a statement before the fork and print out the child's PID after the fork. The child process should print out a statement after it has been spawned.
Write a second program called tree.c that imitates a family tree. Your first process should spawn two children, and then each of those two children should spawn their own two children, and so forth. Take a single integer parameter from the command line that controls the total number of generations. Before exiting, all processes should sleep for 120 seconds. (Warning: Bear in mind that this will create 2^(n+1) - 1 processes, where n is the number you provide. For n=5 you will create 63 processes, for n=10 you will create 2047, etc. Large numbers will crash and/or starve your system!)
Now we're going to take a detour back into kernel module design. From the root of your source code tree, find and copy the file /samples/kobject/kobject-example.c into your current working directory. This is a kernel module that uses a feature called kobjects to provide an interface to exchange data between the kernel and userspace. Each data item is called an attribute, and for each attribute you provide a show and store function that is called when userspace reads and writes these values, respectively.

This particular module provides three attributes: foo, baz, and bar. Once loaded, you can find them in the sysfs filesystem under /sys/kernel/kobject-example/. Modify this file so that a system log message is printed when any of foo, baz, or bar are updated. This log message should include the old value as well as the new value.

You can read values out of these attributes with the command cat, and you can write values into these attributes with the echo command, e.g. "echo 42 > foo" will write the value 42 into the attribute foo. Note that you must have a root terminal to write into these commands (i.e. sudo echo doesn't work). You can get a root terminal with the command sudo bash.
Now we're going to write a kernel module that reads a PID through the sysfs interface and prints that process' family tree in the system log.

Make a new kernel module called family_reader.c that is based off of your modified kobject-example.c. This module should create a single system attribute under /sys/kernel/fam_reader/. When you write a PID to this attribute, your module should try to print out the PID's family tree. There are a few steps involved:

Side note: The modern Linux kernel makes a distinction between "real" and "virtual" PIDs for the benefit of migrating processes across different virtual hosts. The virtual PID is the PID that a process sees from userspace. You can read more here.
1. You will need to convert your integer input into a proper kernel PID. Use the function find_vpid(), which returns a pid*. This function can fail!
2. Next you can convert your pid* to a task_struct* with the function pid_task(). The task_struct is the primary record-keeping component of a process in Linux. This function can fail!
3. Once you have a task_struct*, you can access any of the data it stores. In particular, the real_parent field stores a task_struct* pointer to the process that cloned it, and the comm field is a string that gives the command name. Note: there is a separate field called parent, which is not what we want for this project. The parent is the logical parent that shares process group signals and allows for waiting between parent and child.
4. Work your way back up the family tree, printing out each task's PID and command name, all the way back to the init task with a PID of one.
When you pass a PID to this module, it should output that task's PID and command name, and then the PID and command name of every task that came before it, all the way back to init. Note, the command name for PID one will actually be systemd. When you pass an invalid PID, your module should fail gracefully and print an appropriate error message.
Test your module. What happens when you pass a PID that doesn't exist? What happens when you pass zero?
Create a long-running process like top and pass it's PID to your module. Copy and paste the results here.
Execute tree.c with an input of five generations. Pass one of the last PIDs to your module. Copy and paste the results here.

Things to turn in

The above exercises
fork.c
tree.c
Your modified kobject_example.c
family_reader.c

Optional Enrichment Exercises

The task struct includes a lot of interesting process data and process accounting. Try printing out some other fields.
The file sched.c includes a lot of facilities for working with tasks, including the ability to modify specific tasks or iterate over every task in the system with macros such as do_each_process(). See what you can dig up!