Calling fork from Multiple Thread Environment

来源：互联网发布：金税通网络电子发票编辑：程序博客网时间：2024/06/12 01:51

Threads and Process Management

On a Pthreads-compliant system, calls that manipulate processes, like fork and exec, still behave in the way they always have for nonthreaded programs. Let's see what happens when we make these calls from a multithreaded process.

Calling fork from a Thread

A process creates another process by issuing a fork call. The newly created child process has a new process ID but starts with the same memory image and state as its parent. At its birth it's an exact clone of its parent, starting execution at the point of its parent's fork call in the same program. Often, the new process immediately calls exec to replace its parent's program with a new program. It then sets out on its own business.

In a Pthreads-compliant implementation, the fork call always creates a new child process with a single thread, regardless of how many threads its parent may have had at the time of the call. Furthermore, the child's thread is a replica of the thread in the parent that calledfork≈including a process address space shared by all of its parent's threads and its parent thread's per-thread stack.

Consider the headaches:

∙ The new single-threaded child process could inherit held locks from threads in the parent that don't exist in the child. It may have no idea what these locks mean, let alone realize that it holds one of them. Confusion and deadlock are in the forecast.

∙ The child process could inherit heap areas that were allocated by threads in the parent that don't exist in the child. Here we see memory leaks, data loss, and bug reports.

The Pthreads standard defines the pthread_atfork call to help you manage these problems. The pthread_atfork function allows a parent process to specify preparation and cleanup routines that parent and child processes run as part of the fork operation. Using these routines a parent or child process can manage the release and reacquisition of locks and resources before and after the fork.

This is pretty complex stuff, so please bear with us.

Fork-handling stacks

To perform its magic, the pthread_atfork call pushes addresses of preparation and cleanup routines on any of three fork-handling stacks:

∙ Routines placed on the prepare stack are run in the parent before the fork.

∙ Routines placed on the parent stack are run in the parent after the fork.

∙ Routines placed on the child stack are run in the child after the fork.

A single call to pthread_atfork places a routine on one or more of these stacks. With multiple calls you can place routines on any given stack in a first-in last-out order. Because the fork-handling stacks are a processwide resource, any thread≈not just the one that will callfork≈can push routines on them.

In those carefree times when we throw caution to the winds and decide to fork from the middle of a multithread program, we typically use pthread_atfork to push mutex-locking calls on the prepare fork-handling stack and mutex-unlocking calls on the parent and child stacks. We might also place routines that release resources and reset variables on the child stack.

Let's demonstrate what would happen if we did not use pthread_atfork's capabilities in one of those fork-crazy programs of ours. In Figure 5-1, we have two threads: a mutex (Lock L) and the data the mutex protects. Thread A acquires Lock L and starts to modify the data. Meanwhile, Thread B decides to fork. Now, the fork creates a child process that's a clone of its parent process, and this child shows a locked Lock L. The child process has a single thread, a replica of Thread B (the thread in the parent process that called fork). The assortment of clones and replicas that result from the fork has little effect on the threads in the parent process. However, things are not okay in the child. The locked Lock L is an utter mystery to the new Thread B in the child. If it tries to acquire Lock L, it will deadlock. (There's no Thread A in the child that will ever release Lock L in the child process's context.) If it tries to access the data without first obtaining Lock L, it may see the data in an inconsistent form. Life's never easy for our kids.

Figure 5-1: Results of a fork when pthread_atfork is not used

Now, let's use pthread_atfork to control Lock L's state at the time of the fork. The program we show in Figure 5-2 also has Threads A and B, Lock L, and scrupulously guarded data. However, we've added an initialization routine that pushes a routine that locks L on the prepare fork-handling stack, and a routine that unlocks L on the child and parent fork-handling stacks. We've taken care to do this in a routine that executes before any thread actually uses the lock.

Figure 5-2: Results of a fork when pthread_atfork is used

Sometime later, Thread A acquires the lock and starts to modify the data. When Thread B calls fork, the routine on the prepare stack runs in Thread B's context. This routine tries to obtain Lock L and will block; Lock L is still held by Thread A. Ultimately, the fork is delayed until Thread A releases Lock L. When this happens, the prepare routine succeeds, Thread B will become the owner of the lock, and the fork proceeds. As expected, a child process is created that's a replica of its parent. However, inthis case, the newly cloned Thread B in the child knows about the locked lock it finds in the child's context. At this point, the routine we placed on the child fork-handling stack runs and releases Lock L. The same routine runs from the parent fork-handling stack and releases the lock in the parent process. When the dust settles, the lock is unowned in both parent and child, and the data it protects is in a consistent state. Who could ask for more?

Even given the capabilities of pthread_atfork, forking from a multithreaded program is no picnic. We kept our example simple. Imagine having to track every lock and every resource that may be held by every thread in your program and in every library call it makes! Before pursuing this course, you should consider a less complex alternative:

∙ If possible, fork before you've created any threads.

Instead of forking, create a new thread. If you are forking to exec a binary image, can you convert the image to a callable shared library to which you could simply link?

∙ Consider the surrogate parent model.

In the surrogate parent model, a program forks a child process at initialization time. The sole purpose of the child is to serve as a sort of "surrogate parent" for the original process should it ever need to fork another child. After initialization, the original parent can proceed to create its additional threads. When it wants to exec an image, it communicates this to its child (which has remained single-threaded). The child then performs the fork and exec on behalf of the original process.

Refer to : http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_44.html#1137916