Why execve()-like system() call is a must-have
Contents
How do you usually call an external program in Linux? Whichever language you use, you may fork() your process, and call exec() afterwards; you runtime library surely supports these primitives. However, unless oyu're doing something crazy, you more likely use a system() function, that does the fork for you, and automatically waits for the process to finish.
In Linux runtime libraries for various languages usually support setting environment variables, so that all processes forked afterwards inherit the altered environment. In Ruby and Perl it's as easy as changing a global hash. Setting environment variables is a usual mean of interprocess communication in Linux.
However, an only system call to Linux Kernel on that matter is execve() (see its man page). It both sets environment variables for the program called, and calls it, replacing the current program (you should do a fork before, it's also a separate syscall). There's no syscall for system(), it's usually a library function. Anyway, higher-level libraries provide more rich functionality. They surely provide more convenient primitives of the aforementioned setting env variables, and of mere system() call. The work of execve() could be easily substituted with the combination of these.
However, I recently discovered that it's not that obiovus.
Why execve() is crucial
If you know what I usually write about, you should probably have already guessed, what's the deal. The execve()—as a single library call—is especially useful in multithreaded programs!
Let's assume that you only use environment-setting and a mere system(). How would you call an external program then? Here's an example in Ruby (but that actually doesn't matter).
Now assume that two threads are doing this. Yes, you got what's the problem, here's a possible execution:
- The first thread sets environment variable to 'value'. Context switch happens.
- The second thread sets environment variable to 'another_value'.
- The second thread calls system(), and the process spawned inherits the correct 'another_value' environment variable value. Context is switched back.
- The first thread calls system(), and the process spawned inherits the same 'another_value' as a value of the env variable, as the other thread has changed the global environment setting. However, the programmer's intent was that it should have been called with 'value' instead.
So how could we overcome this? A naïve approach is to use a mutex to synchronize the different threads, like this:
But then the mutex won't be available until the first thread to capture it finishes execution of the whole external program. However, out intent was to invoke the programs safely, not to execute them in a mutually-exclusive manner.
How it could look like
One solution is to use raw fork() and exec() calls, and assign your environment after forking only:
The problem with open3 invocation with an altered environment is what actually made this article appear. I had to study and modify an implementation of open3 in Ruby to make this work. See my blog post on how to implement open3 for details.
It's more verbose, but it should work. But what if it's more complex? What if you're not doing a mere system(), but want to make a open3() instead? There are many ways how a child process may be used, including opening it with one or more pipes, or opening a pipeline of several processes. You have to re-implement these primitives if you need just to alter the environment, and you'll be lucky if their implementation is readily available and reusable...
How it should look like
A good implementation of a set of safe library functions resides in Ruby 1.9 runtime library. The system() call and each popen()-like call can accept a hash of environment variables the child process invoked inherits. So the calls look like this:
Unfortunately, Ruby 1.8 doesn't have this functionality, and you can't use a single call to both set the environment and spawn a child process. That pisses me off, actually, since not everyone is still migrated, and I had to reimplement open3 in Ruby to just impose environment variable assignment in a thread-safe manner.
Conclusion
I know that I should be careful when I alter global variables in multithreaded programs. However, when I access basic primitives, I tent to treat them as something special, not just like any usual data. This is not the most safe approach. Use your forks with care, especially if you're operating several at the same time, and choose safe runtime libraries to work with.