Without further ado, the 14 kilobyte standalone Perl script that limits a program's CPU time and memory consumption on most Linux systems can be found at pshved/timeout GitHub project. The script is compatible with older versions of Perl, and it has extra features such as hangup detection and resource usage reporting. The sources are licensed under Apache-2.0. An older version can also be found here.

So here's how this script came about.

Why do we need to enforce limits?

Most programs we use daily are actually pretty good. Years of development result in stable versions, bugs found and fixed. Good software usually does its job in minimal and predictable time. And if the software is to be left running for a long period of time, then its memory consumption over time is usually well-managed... well, at least, the developers aspire to do well by your RAM :-)

However, in certain domains, software is not like that. Software that is not yet as mature, or is solving a narrow research problem, can consume too much RAM, too much CPU, or just hang without making progress.

It's understandable though. Such software usually attempts to do something really complex, like solving an NP-complete problems using heuristics (and we know it's a good thing). For many such algorithms, neither time nor memory consumption have known theoretical limits, let alone practical ones. Usually such algorithms work well for most cases, and in these cases the limits are reasonable. However, for certain cases, which can't be easily distinguished, they can easily go off of these limits.

This doesn't seem to be a big problem if your aim is just to do some research and throw your code away (it's sometimes even referred to as "throwaway code"). However, if you're trying to ship something of production quality, you can't accept such immaturity. If the program works with a particular case for too long, it may block providing service to the other customers. And if it consumes all the memory available, it may throw the whole machine to the swap hole until reaped by OOM killer.

Now imagine it's 2009, you're a junior developer, cgroups are hot off the presses and aren't widely known, and you need to run research software at scale.

Practical Resource Limiting

So I needed to enforce time and memory limits of some black box, non-adversarial program, terminating upon violation. These limits should be set manually (in order to enable tweaking, and allow setting memory limit lower than the available maximum). The most obvious command-line interface would look like this:

        timeout -t 600s -m 1000000k command arguments ...

The commans is executed with arguments, and is constantly monitored. As soon as one of the limits is violated, a relevant message is printed to STDERR, and the program is terminated.

The biggest problem with setting these limits is that the processes we run may be "black boxes". I.e. we should be able to limit the processes, to source code of which we don't have access. We can't install signal handlers to them, or affect their code in any other way.

Another hard problem is that these processes spawn children, which do considerable fraction of the overall work. Some of these children are mature and well-written programs, while the others are the very same "research soft", which can eat all the memory with as great appetite as its parents have.

This means that we should count the time and memory consumed by all processes spawned by the one we watch for.

"But what's the problem?" you might ask, "Linux provides enough standard utils to solve this problem!" However, not all of them fulfill all the requirements to a good limiting tool. I'll enumerate them after the explanation of how the time limiting script is implemented.

What works

The following solution to the problem is a result of several months of experimenting with more complicated technology.

It looks fairly dumb and simple. Just poll each process in the tree the process spawned for time and memory consumption several times per second, and kill them on violation. However, it was no simple task to make this actually work correctly.

Tracking memory consumption is very simple. At each moment of time, every process contains a valid information about its virtual memory. The total memory is no greater than the sum of current virtual memory values for all the controlled processes in the tree. If the sum of memory stamp sizes exceeds the pre-specified limit, just terminate the controlled processes in the tree, and print the relevant message.

However, it's not that simple for CPU consumption. Time consumption accounts for the history of the process tree evolution, as well as for the snapshot in a specific moment of time. By examining the CPU time for each process in the tree, we can learn for how long they had been running, but we don't know how much time was spent there in the processes that had already finished. The total time is greater than the sum of such snapshots. Even if we manually track which processes die, and store the sum of their times, we miss very short processes that are born and finished between two out polls (see Figure 1).

I found out that Linux kernel calculates, for each process, the cumulative time of its reaped children (like in man 2 times). If a process has been terminated then its total runtime is accounted in the cumulative children time of its parent; if the parent is also finished, then the time's added to its grandparent, et cetera. This way, the total CPU time of the already finished children of the processes we watch for is just a sum of their cumulative user times.

Thus, we may learn the CPU time of all the processes currently running, and of all the processes that have already finished. All we need is to sum these two metrics, and to compare the result with the specified time limit. If the limit is violated, the message is printed, and the controlled processes are terminated.

We may fetch these values accurately fetched via /proc even for the black-box processes.

Implementation — the timeout script

Of course, the bare text isn't very inspiring. It just has to be accompanied with a script. And here it is, 14kb of Perl code. The sources are licensed under Apache-2.0. They were a part of the project we develop at work, but we decided to fork this particular script as a separate project.

Being seemingly stupid, that's the only thing that works. At first glance, any person, who knows something about signals, children, wait calls and all this crunching, will say that the solution is quite dumb and inefficient. However, I couldn't find anything better, and it will be much appreciated if you suggest any improvements. Here's the list of the things I tried

What doesn't work

РЖД totally looks like PID!
We started speaking about Linux processes, and I suddenly reckoned that the logo of Russian Railways company (РЖД) totally looks like pid (Process IDentifier)! See it for yourself!

Ulimit

Ulimit (the console tool, not the system call) is the first thing a Linux user would think about. You launch it in the shell, set "soft" and "hard" limits for each resource, and children, which are invoked in this shell, are limited in memory and time.

Its limits are imposed like this:

  • Time limit is set by setrlimit system call. When the soft limit is met, a SIGCPU is sent (which kills by default). This signal can be handled by a process; for example, it can decide to limit its own resource consumption on receiving it. On hard limit violation, a simple SIGKILL does its dirty job without any possible impediments.
  • Memory limit is set via the same system call. Memory limit is enforced by returning ENOMEM from a violating malloc() syscall. This usually leads to an unhandled exception of sorts, and the program (or its interpreter) is forced to terminate.
However, this doesn't meet all requirements. The thing is that
  • setrlimit() call only limits the process, in which it's invoked. Children of the process inherit the limit, but their accounting "starts over," whereas we are interested here in limiting sum of child and parent resource consumption.
  • You can't determine if the reason to terminate the process was a limit violation. When a memory limit is violated, we should print a message about that, but for that we should know all possible outcomes of a no-memory exception. Shell, Perl, Ruby, Python, C and other languages print message about lack of free memory in different, non-portable ways, and we can't possibly enumerate them all.

    However, setrlimit notifies the process when it exhausts its CPU time limit by sending SIGCPU. This, theoretically, can be handled, if we have the sources of a controlled process available, which we don't have—all processes are "black boxes". We also can't create a wrapper around the program, and catch this signal in a wrapper: the wrapper's time would be limited then, not the target's.

Sleep && kill

Another "easiest" solution to track time consumption is to sleep for a time equal to the limit and kill the controlled process afterwards. This only limits real time, while it's more interesting to limit CPU+sys time.

This may sound an overkill, but believe me, you will have to measure CPU time. Hardware develops in such a way that a performance boost may only be achieved by employing parallelization. And parallelization involves competition for a shared CPU between multiple threads and programs, and when that happens, the "astronomical" time elapsed becomes much greater than the CPU time, the time the process had actually been running for.

Coroner's way

A special wait-like call, wait3, can harvest information about a deceased child. It can return CPU time spent by the child. This would present precise information about the runtime of deceased processes.

However, we can't reap grandchildren, and such information should be harvested by each process in the hierarchy then. Glibc has a capability of making program send SIGHUP to parent when it dies. However, this function should be invoked from a process that spawns children. But the processes we watch for can't be modified. So this option can't be the solution.

One more thing is that if a child process hangs up in an infinite loop, and never terminates, we can't wait for it, hence we can't detect it, and watching for such a limit will fail.

Thus, this way of tracking is not close to the solution for limiting time. Let alone that this way won't allow us to track memory consumption: it's an "instantaneous" property, not a cumulative one.

Using process groups to control children

One of the previous versions of the script used process groups to control the children, just like the shell does!. It put the spawned process into a separate process group, and its children were assigned to that group automatically. It was easy to kill them all with a single kill command, which is capable for sending a signal to the whole such group.

However, the issues arisen by that "just like the shell does" part were too severe. If your black-box process contains a shell script somewhere inside (it may be a system()-like call or a mere Bash script), it would unconditionally interfere with your process group setting, and all the processes spawned through such a shell would lurk out of our control.

The current version of the script tracks process trees by default (but you may switch to tracking process groups). OF course, it will miss the children explicitly detached from their parents (such as daemons spawned). But in an only such a case we noticed in our project (an ssh process spawning to serve a network mount), we did not actually want to track such children! So we switched to trees over process groups without hesitation.

What could be improved

Of course, the sources are not ideal. They contain several problems, which are already identified, but not fixed (due to YAGNI principle). Here's the list of them.

Terminating several processes in a group correctly

Let's first elaborate the case with process groups. The Linux command that sends signals, kill, is capable to sending signals to the whole process group. I.e., after such sending, each process in the group receives the specified signal.

However, the processes in the group we track may have internal dependencies. Assume that two processes are connected by a pipe, and the group receives SIGTERM. If there is an interval between delivery of these signals, then, after death of one process, the other may receive SIGPIPE. This signal may cause reporting an error that would look like there's an internal error in that black box. However, this problem was actually caused by the resource controlling script.

I asked a question on Stack Overflow about that, and an only solution close to the truth was to pause the process group, then send term signal, then continue execution of the processes in the group. I.e., here's the part of the script:

If there's a better solution, or it's proved that after kill syscall the next instruction in all processes would be the signal handler, please, let me know!

Terminating several processes in a tree correctly

Process trees are even more tricky to terminate. Here's how we do it:

  1. Get a list of all PIDs in the controlled tree
  2. Stop all these processes by PIDs
  3. Repeat 1 and 2 for the processes spawned between getting list of PIDs and sending signals
  4. Repeat 3 until no new processes is spawned
  5. Send TERM signal to all the processes in the tree
  6. Get list of PIDs in the tree
  7. Send "continue" signal to them

If this solution can be simplified, please, let me know.

Error due to insufficient frequency

The case when a grandchildren is born and finished between two measurements, was mitigated by taking cumulative children time into account. However, the controlled process is still free to go insane between two wakeups of the polling script. This is especially important on an overloaded system, where timeout script has to compete for its CPU share with other processes. Perhaps, making the controlled process more nice would help.

Perl quirks or lack of knowledge?

The actual source contains more tricks undisclosed here. For instance, Perl has issues with ALARM signal, and I had to work this around the way I found in the article linked.

However, I'm not sure if I worked this around correctly, especially in the part of terminating the processes after the script detects limit violations. It involves the usual sequence of SIGTERM-sleep-SIGKILL, but that sleep sometimes never ends, leading to hangups in the slave's SIGTERM handlers sometimes.

Conclusion

In this small article, I shared what I experienced during the development of a script that limits time and memory consumption of a "black-box" process and its children. The result was a "dumb" but a sophisticated solution implemented in Perl, and reasoning why there's not many options to choose from. Further directions, in which this script could be improved, are also listed. I also linked the GitHub project for the timeout script.

I hope, that will help you in your affairs. Apache license allows you to use the script in commercial products in return of just mentioning the usage of the script and its original copyright owners.

Comments imported from the old website

Pavel Shved on 03 October 2010 commented:

We recently encountered a problem that one of our Java programs wasn't waited for properly by waitpid, and its call returned -1, while it should not. This doesn't happen each call, only in 20-30% of invocations.

We're currently investigating this issue; I'll edit the post as soon as we'll figure out what's wrong with it.

Marian (id) on 27 January 2012 commented:

Could it be that line 171 of the time.pl sript print STDERR "${id_str}MEM $timeinfo->{total}\n"; should actually be print STDERR "${id_str}MEM $meminfo\n"; ? Otherwise you print out the execution time instead of memory.

Pavel Shved on 27 January 2012 commented:

Marian, thank you for the bug report! It has already been fixed, you may check the latest version here or at GitHub.