POSIX Process Management

All operating systems provide mechanisms for creating new processes, terminating existing processes, and performing related actions. The details vary from system to system. To provide a concrete example, I will present relevant features of the POSIX API, which is used by Linux and UNIX, including by Mac OS X.

In the POSIX approach, each process is identified by a process ID number, which is a positive integer. Each process (with one exception) comes into existence through the forking of a parent process. The exception is the first process created when the operating system starts running. A process forks off a new process whenever one of the threads running in the parent process calls the fork procedure. In the parent process, the call to fork returns the process ID number of the new child process. (If an error occurs, the procedure instead returns a negative number.) The process ID number may be important to the parent later, if it wants to exert some control over the child or find out when the child terminates.

Meanwhile, the child process can start running. The child process is in many regards a copy of the parent process. For protection purposes, it has the same credentials as the parent and the same capabilities for such purposes as access to files that have been opened for reading or writing. In addition, the child contains a copy of the parent's address space. That is, it has available to it all the same executable program code as the parent, and all of the same variables, which initially have the same values as in the parent. However, because the address space is copied instead of shared, the variables will start having different values in the two processes as soon as either performs any instructions that store into memory. (Special facilities do exist for sharing some memory; I am speaking here of the normal case.)

Because the child process is nearly identical to the parent, it starts off by performing the same action as the parent; the fork procedure returns to whatever code called it. However, application programmers generally don't want the child to continue executing all the same steps as the parent; there wouldn't be much point in having two processes if they behaved identically. Therefore, the fork procedure gives the child process an indication that it is the child so that it can behave differently. Namely, fork returns a value of 0 in the child. This contrasts with the return value in the parent process, which is the child's process ID number, as mentioned earlier.

The normal programming pattern is for any fork operation to be immediately followed by an if statement that checks the return value from fork. That way, the same program code can wind up following two different courses of action, one in the parent and one in the child, and can also handle the possibility of failure, which is signaled by a negative return value. The C program below shows an example of this; the parent and child processes are similar (both loop five times, printing five messages at one-second intervals), but they are different enough to print different messages, as shown in the sample output. Keep in mind that this output is only one possibility; not only can the ID number be different, but the interleaving of output from the parent and child can also vary from run to run.

#include <unistd.h>
#include <stdio.h>

int main() {
    int loopCount = 5; // each process will get its own loopCount
    printf("I am still only one process.\n");
    pid_t returnedValue = fork();
    if (returnedValue < 0){
        // still only one process
        perror("error forking"); // report the error
        return -1;
    } else if (returnedValue == 0){
        // this must be the child process
            while(loopCount > 0) {
            printf("I am the child process.\n");
            loopCount--; // decrement child’s counter only
            sleep(1); // wait a second before repeating
            }
    } 
    else {
        // this must be the parent process
        while (loopCount > 0){
            printf("I am the parent process; my child’s ID is %i\n", returnedValue);
            loopCount--; // decrement parent’s counter only
            sleep(1);
        }
    }
    return 0;
}
I am still only one process.
I am the child process.
I am the parent process; my child's ID is 23307
I am the parent process; my child's ID is 23307
I am the child process.
I am the parent process; my child's ID is 23307
I am the child process.
I am the parent process; my child's ID is 23307
I am the child process.
I am the parent process; my child's ID is 23307
I am the child process.

This example program also illustrates that the processes each get their own copy of the loopCount variable. Both start with the initial value, 5, which was established before the fork. However, when each process decrements the counter, only its own copy is affected.

In early versions of UNIX, only one thread ever ran in each process. As such, programs that involved concurrency needed to create multiple processes using fork. In situations such as that, it would be normal to see a program like the one above, which includes the full code for both parent and child. Today, however, concurrency within a program is normally done using a multithreaded process. This leaves only the other big use of fork: creating a child process to run an entirely different program. In this case, the child code in the forking program is only long enough to load in the new program and start it running. This happens, for example, every time you type a program's name at a shell prompt; the shell forks off a child process in which it runs the program. Although the program execution is distinct from the process forking, the two are used in combination. Therefore, I will turn next to how a thread running in a process can load a new program and start that program running.

The POSIX standard includes six different procedures, any one of which can be used to load in a new program and start it running. The six are all variants on a theme; because they have names starting with exec, they are commonly called the exec family. Each member of the exec family must be given enough information to find the new program stored in a file and to provide the program with any arguments and environment variables it needs. The family members differ in exactly how the calling program provides this information. Because the family members are so closely related, most systems define only the execve procedure in the kernel of the operating system itself; the others are library procedures written in terms of execve.

Because execl is one of the simpler members of the family, I will use it for an example. The program below prints out a line identifying itself, including its own process ID number, which it gets using the getpid procedure. Then it uses execl to run a program, named ps, which prints out information about running processes. After the call to execl comes a line that prints out an error message, saying that the execution failed. You may find it surprising that the error message seems to be issued unconditionally, without an if statement testing whether an error in fact occurred. The reason for this surprising situation is that members of the exec family return only if an error occurs; if all is well, the new program has started running, replacing the old program within the process, and so there is no possibility of returning in the old program.

#include <unistd.h>
#include <stdio.h>

int main(){
    printf("This is the process with ID %i, before the exec.\n", getpid());
    execl("/bin/ps", "ps", "axl", NULL);
    perror("error execing ps");
    return -1;
}

Looking in more detail at the example program's use of execl, you can see that it takes several arguments that are strings, followed by the special NULL pointer. The reason for the NULL is to mark the end of the list of strings; although this example had three strings, other uses of execl might have fewer or more. The first string specifies which file contains the program to run; here it is /bin/ps, that is, the ps program in the /bin directory, which generally contains fundamental programs. The remaining strings are the so-called "command-line arguments," which are made available to the program to control its behavior. Of these, the first is conventionally a repeat of the command's name; here, that is ps. The remaining argument, axl, contains both the letters ax indicating that all processes should be listed and the letter l indicating that more complete information should be listed for each process. As you can see from the sample output below, the exact same process ID that is mentioned in the initial message shows up again as the ID of the process running the ps axl command. The process ID remains the same because execl has changed what program the process is running without changing the process itself.

This is the process with ID 3849, before the exec.
  UID   PID  ...  COMMAND
.
.
. 
    0  3849  ...  ps axl
.
.
.

One inconvenience about execl is that to use it, you need to know the directory in which the program file is located. For example, the previous program will not work if ps happens to be installed somewhere other than /bin on your system. To avoid this problem, you can use execlp. You can give this variant a filename that does not include a directory, and it will search through a list of directories looking for the file, just like the shell does when you type in a command. This can be illustrated with an example program that combines fork with execlp:

#include <unistd.h>
#include <stdio.h>

int main() {
    pid_t returnedValue = fork();
    if (returnedValue < 0) {
        perror("error forking");
        return -1;
    }
    else if (returnedValue == 0) {
        execlp("xclock", "xclock", NULL);
        perror("error execing xclock");
        return -1;
    } 
    else {
        return 0;
    }
}

This example program assumes you are running the X Window System, as on most Linux or UNIX systems. It runs xclock, a program that displays a clock in a separate window. If you run the launcher program from a shell, you will see the clock window appear, and your shell will prompt you for the next command to execute while the clock keeps running. This is different than what happens if you type xclock directly to the shell. In that case, the shell waits for the xclock program to exit before prompting for another command. Instead, the example program is more similar to typing xclock & to the shell. The & character tells the shell not to wait for the program to exit; the program is said to run "in the background." The way the shell does this is exactly the same as the sample program: it forks off a child process, executes the program in the child process, and allows the parent process to go on its way. In the shell, the parent loops back around to prompt for another command.

When the shell is not given the & character, it still forks off a child process and runs the requested command in the child process, but now the parent does not continue to execute concurrently. Instead the parent waits for the child process to terminate before the parent continues. The same pattern of fork, execute, and wait would apply in any case where the forking of a child process is not to enable concurrency, but rather to provide a separate process context in which to run another program.

In order to wait for a child process, the parent process can invoke the waitpid procedure. This procedure takes three arguments; the first is the process ID of the child for which the parent should wait, and the other two can be zero if all you want the parent to do is to wait for termination. As an example of a process that waits for each of its child processes, the program below shows a very stripped-down shell.

#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char command[256]; // Array to hold the command

    while(1) { // Loop until return
        printf("Command (one word only)> ");
        fflush(stdout); // Make sure "Command" is printed immediately

        if (scanf("%255s", command) != 1) {
            fprintf(stderr, "Error reading command\n");
            return -1; // Exit if we fail to read a command
        }

        if (strcmp(command, "exit") == 0) {
            return 0; // Exit the loop if the command is "exit"
        } else {
            pid_t returnedValue = fork();
            if (returnedValue < 0) {
                perror("Error forking");
                return -1;
            } else if (returnedValue == 0) {
                execlp(command, command, NULL);
                perror(command); // If execlp returns, it must have failed
                return -1;
            } else {
                if (waitpid(returnedValue, NULL, 0) < 0) {
                    perror("Error waiting for child");
                    return -1;
                }
            }
        }
    }
}

This shell can be used to run the user's choice of commands, such as date, ls, and ps, as illustrated in the output below. A real shell would allow command line arguments, offer background execution as an option, and provide many other features. Nonetheless, you now understand the basics of how a shell runs programs

Command (one word only)> date
Thu Feb 12 09:33:26 CST 2024
Command (one word only)> ls
microshell microshell.c 
Command (one word only)> ps
  PID TTY       TIME     CMD
23498 pts/2     00:00:00 bash
24848 pts/2     00:00:00 microshell
24851 pts/2     00:00:00 ps
Command (one word only)> exit

Notice that a child process might terminate prior to the parent process invoking waitpid. As such, the waitpid procedure may not actually need to wait, contrary to what its name suggests. It may be able to immediately report the child process's termination. Even in this case, invoking the procedure is still commonly referred to as "waiting for" the child process. Whenever a child process terminates, if its parent is not already waiting for it, the operating system retains information about the terminated process until the parent waits for it. A terminated process that has not yet been waited for is known as a zombie. Waiting for a zombie makes its process ID number available for assignment to a new process; the memory used to store information about the process can also be reused. This is known as the zombie.

At this point, you have seen many of the key elements of the process life cycle. Perhaps the most important omission is that I haven't shown how processes can terminate, other than by returning from the main procedure. A process can terminate itself by using the exit procedure (in Java, System.exit), or it can terminate another process using the kill procedure (see the documentation for details).