Recent blog updates

Shown here are only posts related to science. You can view all posts here.

Consistency Models Explained Briefly

Wikipedia does a ridiculously bad job at explaining different consistency conditions of multithreaded/distributed systems. Just take a look at the articles, and see how inconsistent the consistency model descriptions are. I'll try to get them together.

What is a Consistency Model?

When you're describing how your distributed system behaves, you usually specify something named a "consistency model". This "model" describes how distant the behavior of your system is in multi-threaded or multi-machine environment from the "ideal" behavior.

Assume that you're implementing a data structure (a set, for instance; say, you can add/remove elements, and check for presence). The specification (model) of its behavior usually implies that the operations are performed sequentially. For instance, a set indicates a presence of an element if ant only if it was added before the presence check, and not removed after that. The terms "before" and "after" imply that there is an order in operations over that structure. While it is straightforward to know the order of operations in a single thread, it gets unclear if there are several threads or machines:

  • Calls may overlap. If two threads call methods on the same structure at approximately the same time, the calls may overlap because they don't happen simultaneously in real world. In what order the calls took effect is not clear, however, because thread scheduling or network connection introduces arbitrary delays in processing.
  • Calls may not take effect immediately. When a method call returns, it is sometimes not clear when the operation actually "takes the effect." By diluting the notion of "taking the effect" one may achieve increase in speed. I.e. the elements added to a stack may not be fetched in the exactly same order, but the operations could be faster. The rationale behind this is that the "order" of events that happen in different parts of your cluster is not that well-defined and meaningful in the first place. I talked about why this is natural in my other post "Relativity of Simultaneity in Distributed Computing.

Object-Oriented Model and Executions

A consistency model usually assumes Object-Oriented approach to describing a system. Assume that we have something (say, a set), and its internal state can be observed and altered via a well-defined set of methods only (say, add(elem), remove(elem), and bool in(elem)). The OO approach assumes that internal state is concealed, and it has no meaning to discuss it per se, and we do not define consistency models in terms of object's state. Instead, we restrict the values these methods may return based on when other methods were called in different threads or machines.

In practice, method calls don't happen instantly, and our models totally acknowledge this. The set of method call times, the times it takes to complete each, their arguments, and their return values is named an execution. Executions are usually depicted as intervals plotted at lines. The parallel lines represent how time simultaneously passes in all threads. The intervals are the method calls, the beginning corresponding to the time the method was called and the end when the method returned. The call arguments and the return values as well as the method and object name are usually written near the method call. Here's a sample execution of a set object:

We will omit the actual call values further because consistency models do not operate in terms of actual arguments and values, but rather try to "map" concurrent executions to sequential ones. In our further diagrams we will also assume that all methods are called against the same object.

Since consistency model imposes restrictions on an execution, the latter may match or not match a consistency model. A data structure adheres to a consistency model iff every execution over its methods we may observe adheres to it. That's what's hidden behind such phrases as "sequentially consistent stack".

No Consistency

The most important consistency "model", is, of course, no consistency. Nothing is guaranteed. The methods called on the object may observe anything. We may add two different elements to an empty set, and observe that the size of set is three, and none of these elements is what we added.

All other consistency models are opposite to the No Consistency in the following sense. They all assume that the method calls may be re-arranged in a single-threaded sequence that is correct with respect to the object's specification. Let's call it arranged correctly in the rest of the post. For example, a set of pets that adheres to any consistency model described below can't tell us that it contains a cow, if only dogs were added to it.

"Multithreading" Consistency Models

What I discovered is that consistency models are divided into two categories: that lean to multithreading (shared memory multiprocessor systems), and to "multi-machine" (distributed systems). I find the former category more formal and mathematical; these models can be used to describe behavior precise enough to allow math-level, rock-solid reasoning about system properties. Some "distributed" consistency models are less precise in their definition, but are widely mentioned in documentation and description, and do allow users to make informed decisions.

Let's start with the "multithreading" ones.

Strong Consistency aka Linearizability

An execution is strongly consistent (linearizable) if the method calls can be correctly arranged retaining the mutual order of calls that do not overlap in time, regardless of what thread calls them. (See "No Consistency" section for definition of "correctly arranged").

In other words, if two calls overlap, then they may "take effect" in any order, but if a call returns, then you may be sure it already worked.

A more anecdotal definition is "before the method returns, it completely takes effect consistently with other threads." This is also usually called "thread safety" unless more precise definition of consistency model is specified. This is quite a stringent restriction, especially for distributed systems, where threads run on different machines.

Two linearizable executions combined also form a linearizable execution. This is an important property, since it allows us to combine linearizable components to get strongly consistent systems. This is not true for other models, for instance....

Sequential Consistency

An execution is sequentially consistent if the method calls can be correctly arranged retaining the mutual order of method calls in each thread.

The calls in different threads can be re-arranged however you wish, regardless of when they start and when they end.

An example of a sequentially consistent system is a naive notion of RAM in multi-core processor. When different threads read from the memory, they actually read from a cache or a register, which is synchronized with a common RAM at writes. While values read and written on different cores may not be immediately synchronized (cores may diverge in the notion of what value a cell contains), the values read and written on the same core are always consistent with each other. This is naive, though: even this guarantee is relaxed by modern compilers, and accesses to different variables within one thread can be reordered (as far as I recall, ARM platforms are guilty of this).

You might notice that the definition of sequential consistency does not directly prevents methods to return values from the future (i.e. to read what will be written later in a different thread). Yet I claim that RAM is sequentially consistent; how's that? The thing is that each execution should be sequentially consistent. If read and write do not overlap, and write follows the read, we may just stop the program between them. Since each execution should be consistent, the value read should be written before in some thread. Therefore, this "future prediction" is not possible in consistent data structures (not only those that are "sequentially" consistent) even that the definition doesn't directly prevent it.

However, if we combine two sequentially consistent executions, the result won't be sequentially consistent. But there is another model, weaker than linearizability, that makes it possible.

Quiescent Consistency

An execution is quiescently consistent if the method calls can be correctly arranged retaining the mutual order of calls separated by quiescence, a period of time where no method is being called in any thread.

This consistency condition is stronger than sequential consistency, but is still not strong enough that we usually expect from a multithreaded system. See my other post for a more in-depth example of a quiescently consistent stack. The reason to use this model is to attain a higher degree of concurrency and better response time in return of weakening consistency a bit.

Quiescent consistency is not as esoteric as you might think, especially since you barely heard about it. Here's an easy way to imagine a quiescently consistent system: assume that all writes are cached, and the cache is flushed to a persistent storage when there are no ongoing writes (or after it reaches certain size). There are special data structures that are quiescently consistent.

Quiescent consistency components can be combined to form quiescently consistent components again. Note also, that strong consistency implies quiescent consistency, hence you can combine these two kinds of systems, and the result will be quiescently consistent.

"Distributed" Consistency Models

Consistency is not Fault Tolerance

It may be tempting to mix the concepts of "consistency" and "fault tolerance". The former is how object reacts to method calls depending on the calling thread (or, in other words, how data are seen by multiple observers). Fault (or Partition) Tolerance describes how system behaves when system becomes partitioned, i.e. messages are lost, and/or parts of the system get disconnected from the network, or form several networks. These concepts are orthogonal to each other: an object can have one of them, both of them, or neither.

It will be, more likely, one of them. There is a scientific result known as CAP theorem that states that a system can't at the same time guarantee:

  • Linearizability
  • Availability (whether RPC calls send replies)
  • Fault Tolerance

In this post, however, we will focus on consistency rather than other properties of distributed systems. Note that multithreaded systems due to close proximity of their components, and less versatile medium that connects them, do not have to worry about fault tolerance...at least, at the current state of art.

The following models are usually seen in the context of distributed systems, i.e. those that reside on multiple machines, communication between which is slow enough. To bring Object-Oriented abstraction to distributed systems, programmers invented "Remote Procedure Call" (RPC) pattern. Method calls (object identifiers, method name, and arguments) are encoded into messages that are then sent across the network to specified destination, which may be the part of the encoding too. Method calls that do not expect a reply (say, unreliable "fire-and-forget" writes) may be treated as instantaneous.

Eventual Consistency

A system is eventually consistent if the method calls can be correctly arranged retaining the mutual order of calls separated by a sufficiently long period of time.

The eventual consistency definition—as well as its understanding—is specifically vague on how much time it would take, and whether this time depends on the call history. We can't really tell how soon an eventually consistent set will respond that an element X exists in it after it was added. But "eventually" it will.

This looks like a quite useless consistency model, since it doesn't actually restrict anything mathematically. Why is it used then? Exactly because of its lack of restriction. Users usually expect some consistency from a system, and if there's actually not much of it, something should be said about this. Eventual consistency is actually a limitation of liability; it's used in documentation to convey that the system can't guarantee that different threads become in sync immediately. However, the system is engineered in such a way that they eventually take effect. Here's an example (here's more on eventual vs. strong consistency)

Some authors, according to wikipedia, require that no method calls should be invoked during the "long period of time" for system to get in sync. This actually sounds like a stronger version of quiescent consistency, where a qualifying quiescence should be long enough.

Strict Consistency

An execution is strictly consistent if the method calls sorted by completion time are correctly ordered.

This is as simple and powerful as it's impossible to attain in most practical systems that have more than one thread. First, it order calls according to "global" time, which is a purely mathematical abstraction unachievable in real life, both according to Physics, and to the theory of distributed computing. Second, any implementation I might imagine would essentially disallow multithreading per se.

This is my rendition on the more "official" definition (1, 2), which defines a strictly consistent system as one where any read operation over a certain item returns the value supplied to the latest write operation according to global time counter. I don't like this definition because it needs to distinguish between "reads" and "writes", and introduces the concept of "data item"—i.e. it breaches the object-oriented approach. Still, I'm sure that if the system really consists of "data items," the definition at the beginning of the section would work as well.

Serializability

An execution is serializable if its methods in each thread are grouped in contiguous, non-overlapping transactions, and the methods can be correctly ordered in such a way that transactions are still contiguous.

In other words, the outcome of a sequence of method calls can be described as a serial execution of transactions, where methods of different transactions are not mingled, while the transactions actually overlap in time across different threads. The useful outcome here is that calls within overlapping transactions can be reordered, even if they don't overlap. However, "serializability" is not enough to describe the system: while it limits reordering of methods across transactions, it is unclear to what extent transactions can be reordered themselves. This is a job for a different consistency model. And serializability is usually used to "remind" about this important property of transaction-based systems.

Practice

Some systems provide several levels of consistency for operations (like the AppEngine's High-Reliability Datastore linked above). You should understand what tradeoffs each model has to make the decision best suitable for your specific application.

Most system you encounter will be not consistent at all ("thread-unsafe"), or linearizable (this is what usually understood by "thread-safe"). If the consistency model differs, it will be noted explicitly.

Strictly speaking, most systems that claim linearizability will be not consistent due to bugs, but these nasty little creatures aren't suitable for careful scientific discourse anyway.

Now that you know what consistency models are out there (and that they actually exist), you can make informed decisions based on what's specified in documentation. If you're a system developer, you should note that you are not bound to choose between linearizability and "thread-unsafety" when designing a library. The models presented above may be viewed as patterns that enumerate what behaviors in multithreaded context are considered universally useful. Just consider carefully what your users need, and do not forget to specify which model your system follows. It will also help to link to something that explains what lies behind the words "somethingly consistent" well--like this post, for instance.

Read on | Comments (1) | Make a comment >>


How Static Verification Tools Analyze Programs

One of my previous posts featured a plaque we got for scoring an achievement in the Competition on Software Verification'12. I promised to tell how the tool we improved analyzed programs to find bugs in them... without actually running them.

Please, note that most of what's described here was not devised by me. It was a state of art when I first started to study the field, and my contribution was kind of tiny—as usual when newbies do their (post)graduate research.

Program Verification is Impossible

I promised to tell you how we do it, but the thing is that... this is actually impossible. Not that I'm holding a secret vigilantly protected by the evil Russian Government, but it's just mathematically impossible. It's not just "hard", as in "NP-hard", but plain impossible. A proof of impossibility of a relevant problem, the halting problem, published in the famous Alan Turing's paper is acknowledged as one of the stepping stones at the beginning of Computer Science as a math discipline. Later it was found that halting is not an anly property you can't universally verify with a computer program: in fact, any nontrivial property can't be universally verified! This is known as Rice's theorem, and was known since 1953.

The undecidebility (one more term to denote the inability to build a program that solves a problem) of program verification only concerns Turing complete languages, i.e. the languages that are as powerful as Turing machine. Some languages can be correcly verified, such as a hypothetic language with no variables and no statements other than print (always terminates), or its extension with non-folded for...to and for downto Pascal-like loops, so that halting can be solved by comparing the numbers involved.

Most interesing real-world problems involve reasoning about constructively non-trivial languages though.

Absence of a program that can verify if any given program halts or bears some nontrivial property means that for any program that claims to be such a universal verifier, there exists a counterexample, a piece of source code that will confuse the self-proclaimed verifier or make it loop forever. But the very same verifier may provide useful results for a multitude of programs that at the same time. Considerable pieces of computer science and software engineering are devoted to studying, building, and putting such verifiers into production. A lot of program analysis, which is closely related to verification, happens in compilers, so you most likely unanimously encounter the results of this research on a daily basis.

In the next sections I'll explain one of the approaches to verifying programs, to which my graduate research was closely related.

Why Go Static?

So how would you verify that a program doesn't have certain bugs (say, never dereferences a NULL pointer ever) without having it run and take potentially harmful effect on customers and equipment?

If your initial idea was to try to interpret program in such an environment that its running wouldn't affect anything, that would not work. First, it hardly counts as "verifying a program without running" as it actually runs, but in isolation. Second, and more important is that the program may involve a user input. This may be an input from keyboard, from text file, or from network. How do you verify the program behavior if the way it behaves depends on what comex from external sources?

You could emulate the behavior of these external stimuli by writing some mock-ups (say, by reimplementing fread("%d",&x) as something that make x a zero), but this is effectively resorting back to testing. The whole idea of static verification—when the program is not run—is to extend the number of possible program execution scenarios beyond the those you can cherry-pick by testing. When you see a scientific paper on static verification techniques, it will surely list the possibility of checking all the possible scenarios as a justification of the increased resource consumption compared to testing in isolated environment—well, who tests in production anyway?

Of course, white-box static verification is not the only complicated testing technique. An interesting testing technology, UniTESK has been developed in ISPRAS, where I previously worked and studied.  One of its features is that it can automatically adjust the tests generated to cover all possible disjuncts in if statements of a method's postcondition. This means that not only each condition branch should be covered, but each of the operands in a disjunction that forms each condition should be true during the test execution. The disjuncts we're talking about here are those in the postcondition, the system under test itself being a black box.

And a justification it is! It's not possible to enumerate all the possible inputs and stimuli, because the number of their combination is infinite even for simple programs. And if you omit some input pattern, who can guarantee that something wrong doesn't happen exactly when users behave like this? You surely know a lot of cases where seemingly "dumb" users unveil behavior no sane person could ever predict when writing tests to software! In fact, any black-box test generation technique is inherently incomprehensive, and can't cover all cases possible. Static analysis "opens" the black box, sees and works with what is inside it, and it is more complicated than writing tests and measuring coverage.

Another case that is better left to automated comprehensive tests is security-related operations. Security-related bugs, such as buffer overflows, are hard to test. Ample testers for such bugs are called "hackers", and they are rare, expensive, and still are not reliable, and may miss something. One would like to ensure that no matter how malicious or damaged input from the network is, no matter what mischief the user clicking on the screen has in their mind, the program doesn't crash or expose certain security breach patters such as the aforementioned buffer overflow. Static verification is used in some very big companies, for instance, to make sure that a specific security bug occurs nowhere else in the system, so that the updated system would only laugh at the inspiration the descriptions of the problem might have caused.

Thus, static verification that reads the source code and may prove that the system is not exposed to certain kind of bugs and security violation patterns is also useful. Static verification is the only way to vouch for infinite number of inputs that they do not expose any bugs in the program. It allows us to consider all possible inputs at the same time as if a randomly-spanking monkey is operating the console rather than a sentient human being.

But, more on that later. It's still not clear how the multitude of possible patterns of security breaches can be studied generically. Do we have to study pointer dereferences, buffer overflows, memory allocation errors etc separately? Some tools try to, but there are fundamental approaches that address this problem of excessive diversity.

Reduction to Reachability

Reachability problem can be formulated like this. Given a program "state," find out if this "state" is reachable for some program execution. If program allows external input, this involves checking if there exists an input such that a program would reach the target state upon execution. We'll explain what the "state" is later.

This problem is as hard as the halting problem because the latter may be reduced to the former. If we know how to solve reachability, we know how to solve halting by checking if the exit state is reachable. It's not just very hard, as an NP-complete problem, it's just impossible to solve in a generic manner for all programs, as pointed out at the beginning of the post.

Reachability is important because a number of programming problems and bugs may be exposed as reaching "error state", a specific program line that gets executed when something undesirable happens. This could be a code that's responsible for printing "Segmentation fault", and checking reachability of this would give us a notion whether NULL pointer is dereferenced anywhere.

As this code is OS-specific, the programs being checked often are prepared to verification by inserting this code explicitely. So the code a = r/p, for instance, would decome if (p == 0) ERROR(); x = r/p;, where ERROR() terminates a program with error. Well, it doesn't actually have to terminate anything because we're not going to run this program as the verification is "static". Its purpose is to mark a specific place in program as undesirable. The process is sometimes called instrumentation. A special tool performs this operation; we developed a GCC-4.6-based one in ISPRAS during our Linux Driver Verification project.

Modeling Nondeterminism

So, how do we model the fread("%d") function? Let's get back to the original goal, which is to check if a program is free from runtime crashes and vulnerabilities. Thus, we can't assume anything on the user input, because a hacker may be operating the keyboard trying to enter a number specifically. What can we do then?

Static analysis techniques allow us to assume that the user just enters something—and soundly act upon that. The integer read may bear any value, literally! How does it play out then? The thing is that the integer entered is read once, and its value doesn't change arbitrarily after this, unless directly assigned to it, or to a pointer to its memory location. Then, such value is transferred to other variables, which make conditions over these variables either true or false, which affect how subsequent conditions are evaluated, and what code is executed. The ties that exist between values during the program execution are exposed, and it becomes possible to rule out some code as unreachable.

Note that if all ERROR() functions share the body, we may treat the body itself as an error location. This means, that any number of "error" locations is equivalent to one "error" location, so we sometimes go plural or singular for the sake of a better narrative.

We're lucky if we detect that all the ERROR() function calls are unreachable, which means that there's no bug. In other words, while we don't know the value of x, we know that any concrete value of x predetermines how the program executes from now on, until one more nondetermined variable is read from somewhere. Let's see on a simple example.

This uses a more simple model of reading an integer (to avoid discussion on pointer aliasing), which is read_int() function that returns an integer the user has entered. So, let's assume that it may return any integer. Can we still prove that a division by zero never happens, and the program never crashes at that statement? Sure. Whatever user returns, we can't get past the while loop until x is greater than zero. We may loop forever, or exit after several iterations, but if we're out of the loop, the x is surely equal to one or greater, i.e. x >= 1

Now, the crash happens if x == 0. But x == 0 and x >= 1 can't both be true. Moreover, whether it can be true or not, can be checked automatically with specialized software that is largely presented both in open-source and in proprietary forms. This demonstrates that we don't need to know exact value of each variable to determine if a program reaches an error location.

So, to model all possible ways the program works (including those ways that depend on the external, unknown input,) and to determine if any of these ways has a bug (i.e. ends in an "error" or a "crash" location) is the purpose of reachability solver's work. Let's see how is it possible.

Feasibility of Program Paths

The "equation" we used in the previous section to demonstrate how we could model nondeterminism is actually a powerful technique. It is generic enough to work in both nondeterministic and deterministic cases, and no distinction between them is even required! Each program where a certain line is marked as "error location" contains a number of "paths", sequences of statements that end in an error location, but for which it is not yet known if the program could actually follow this path. An example of such path is the above assignment x to zero, and having a check if it's greater than one passed. We've already shown that it's not feasible if we look closer at he path equation rather than just at program structure.

This usually requires some conversion. For instance, if we convert a fairly common assignment i=i+1 to an equation component i==i+1, it would render any path infeasible, since this can never be true for real numbers. In such cases, i and all its further occurrences are is renamed after each assignment (this is called Static Single Assignment (SSA) form).

Checking such equations requires using of advanced software such as SAT-solvers, theorem provers or customized built-in add-ons or reimplementations of them. These tasks are NP-complete problems, so the size of programs being verified matters. Modern software doesn't struggle with such "path equations" for programs as large as thousands to hundreds thousands of lines of code. Program size is not the only factor though. The size also depends on how precisely the formula describes the state of the variables and of the memory as a whole. For instance, pointer alias analysis makes the equation bigger for a given program.

Analyzing Program as a Whole

So far we've examined how tools like BLAST reason about individual paths. How does it turn into the whole-program analysis? The problem here is that, in any program that contains a loop, there may be an infinite amount of paths to analyze. Analyzing them one-by-one will never lead us to final solution provided we only check each single path. We need to somehow analyze paths in large quantities.

One of the possible patterns to perform such analysis is to split all paths into classes, such that we need to analyze one path, and somehow understand that the result of this analysis applies to a whole class of other paths, potentially infinite. This approach is applicable to many different tasks, and is not limited to the problem we are discussing.

At this point we notice that, wherever the program counter is, the behavior of program is completely predetermined by the values of all variables given the subsequent user input is also predetermined. For example, assume that at line, say, 15, in the middle of the loop, the values of all variables, and of the all readable memory cells are the same on the 2nd and the 3rd iterations of the loop. Immediately follows that the paths that "start" at each of these locations (given the equivalence of the memory) have a one-to-one match between these two "classes". Getting back to the original problem, if we prove that no path that only iterates two times has a bug on it, then the paths where the loop iterates three times or more also don't have bugs on them. This is how we can decrease the amounts of paths to analyze from infinity to several.

However, the loops where memory states are equal on subsequent iterations are rare. Such loops are either infinite (which is good for our analysis, since we don't encounter a crash if we loop forever doing nothing :-), or contain of user input only. The loop in the division by zero program does satisfy the condition, but what if it looked like this:

This way, attempt counter on each loop iteration is different. How do we tackle this one?

From State of the Whole Memory to Regions

While the memory state of the program in question is different at each loop iteration, it really does not matter. Indeed, whether the x equals zero does not depend on the attempt number. If user enters zero on second attempt, he or she could have entered it on the very first attempt, and our analysis algorithm accounts for that, as shown above in the "Modeling Nondeterminism" section. So all paths with two or more loop iterations are equivalent to the path with one iteration only--if we consider only the relevant memory cells that really affect whether the bug appears. How do we detect these memory cells?

Let's get back to the analysis of a single path. The path that only contains one loop iteration is converted to this equation:

attempt_old == 1, and
attempt = attempt_old + 1, and
x >= 1, and
x == 0

The SAT solver analyzes this, and says that "this equation has no solution." But what if we ask "why?" Why does this equation have no solution? "Well," we could answer, "because x can't be no less than one and equal to zero at the same time!" What is important in this answer is that it never mentions attempt variable. So we may discard this variable from the list of "important" variables, and compare paths with 1 and more loop iterations based on the value of x only, and from this point of view they are all equivalent.

Of course, if the value of attempt is later discovered as relevant (i.e. someone tries to divide by it), the equivalence of paths may be reconsidered. But sooner or later, (if we're lucky: remember that the problem is not solvable universally!) we run out of relevant variables.

Note that we eluded the question of how we realize that x is relevant, and attempt is not automatically, without asking any human (that's the whole point of automated large-scale program analysis). First, we can try to ask the SAT solver. Such solvers do not just run on magic, they execute established algorithms, that can sometimes give us notions why the answer is such and such. Second, we can try to play with the equation a bit by trying to remove the parts related to attempt variable and checking if the equation is still not solvable. Third, we can use the mathematical concept that formalizes the "why?" question, such as "Craig interpolation".

In most cases, such algorithms not only tell what variables are relevant, but also give us hints about when they are relevant (say, Craig interpolation may tell us that it's enough to check if x is greater than zero rather than see its actual value. This also solves another problem: tracking exact values of variables is suboptimal. If it is important that x is greater than 0, then every single positive integer value of x is equivalent, which would not be the case if we separated paths based on the exact value of x.

Such "conditions" that constrain the memory values are composed into "memory regions", (potentially infinite) sets of memory cell values that can be used instead of the previous execution history. Indeed, at each line we need to either know the complete history, what lines have been executed by now, or know the current memory state—or at least some constraints on it. We don't need to know both to predict how the program will behave after we continue execution.

The overall algorithm

To sum up the previous sections, here's how the anti-bug program analysis algorithm works:

  1. First, consider that memory is totally irrelevant.
  2. Try to find some path given that any memory cell may hold any value at any time (even if variables change magically between they're written to and read from) that ends up in the ERROR location
  3. If there's no such path, there's no bug!
  4. If there is such a path, convert it into the path equation, and try to solve it.
  5. If the equation has a solution, then the bug is here.
  6. Otherwise, ask the solver or other advanced tools "why" is the equation unsolvable (more generic question is, "why is this path infeasible?").
  7. Reevaluate the paths in the program, not blindly this time, but rather tracking how the memory regions we get on the previous step affect the equivalence of the paths (we leave other variables change magically though). Only follow one path among those equivalent to each other.
  8. Is there any path that ends up in the ERROR location under the new circumstances? Go to 3 or 4.

The algorithm is laid out this way to provide the explanation how it works. An explanation why it is correct, as well as the most convenient form to advance the state of art, requires us to think from the last poit to the first. One should first envision a novel representation of the multitude of program paths, and then devise how to compute it efficiently.

If we're lucky, the algorithm finishes. If we're even more lucky, the algorithm finishes in a sane amount of time. Depending on the task, the "sane" ranges from five minutes to several days. In LDV, we considered it unacceptable for an individual driver to take more than 15 minutes to get verified against a single bug, but Linux kernel contained 4500 of them. In a less massive setting the time constraint can be increased; some mid 2000-s studies considered three days for exhaustive verification of 2500-LOC programs.

This algorithm pattern is called CounterExample Guided Abstraction Refinement (CEGAR). It was devised less than 15 years ago. It was based on the 45-years-ago advancement by Cousot&Cousot that were first to propose to "abstract away" from the actual variable values (what we did when we allowed variables to change magically unless specifically tracked), and iteratively "refine" the "abstraction" by tracking more and more properties. This is not new stuff, but is not old either.

We omitted certain points in tese algorithms. For instance, even the early framework algorithms, even unimplemented, envisioned tracking whether classes "fold" into each other rather than whether they're equivalent only, leaving the exact implementation behind the "lattice" abstraction of program states. Second, the way how "memory regions" are affected by statements is also a complex theory, and it's surprisingly easy to create an algorithm that piggybacks on formula equation building and SAT solving, see this paper for details if you're curious. The paper also outlines a framework approach on alias analysis and memory modeling.

Common Limitations

The common limitations of this algorithm (compared to other static analysis algorithms) is that it relies on a complex equation solving and proving tools that are only well-defined when working with numbers and not very complicated formulae. Even C programs, however, are fairly complex, and consist of complicated concepts such as arrays, pointers, structures, bit shifts, bounded integers when adding two positive integers sometimes makes the result negative. Converting them to good ol' numbers introduces complications. For instance, it's increasingly more hard to answer the "why?" question as you introduce operators beyond addition, subtraction and multiplication: bit shifts, modulo calculations, square roots make it nearly impossible. The fact that pointer may point anywhere—or, at least, to several locations—makes the number of variants of program behavior explode. And if it's a pointer to function, things get ever worse.

Still, the approach works. We found several dozens of kernel bugs in Linux Kernel device drivers—and keep in mind that we did not have the actual devices! Slowly, scientists make it less fragile, more reliable, and some day it will also commercialize to products based on different static analysis approaches, such as Klokwork and Coverity.

Meanwhile, you may try to use such research-space tools as BLAST or CPAchecker, as well as check other SV-COMP participants.

Conclusion

So this is one of the approaches to program verification. There are others, but you may always notice the same pattern in them: abstract away from the concrete program, discard everything that seems impossible, and hope that you discard all bugs and bugs only. However magical it seems, it does work. If you run Windows, your PC is actually executing millilons of lines of statically verified tools. Microsoft has been carrying a lot of really interesting and relevant research and development in this area for the past decade, as I learned at one of the conferences hosted by MS Research.

I wanted to write this article for years, but never managed to find time for this. This was fun even though I stopped active participation in static analysis research or development. I hope I outlined one of the approaches to program verifications so that a non-scientist could understand. If not, mail me or post a comment, and I'll improve the article. Don't hesitate to ask about something in comments, and I'll try to answer your questions—or at least to direct you somewhere.

Read on | Comments (0) | Make a comment >>


BLAST at the Competition on Software Verification'12

At my previous job in a research laboratory, I was improving a Static Analysis tool named BLAST. The name abbreviates its origin, and is decomposed as "Berkeley Lazy Automation Software Verification Tool."

By the way, my former peers have prepared a nice comparison of Static versus Dynamic analysis for the Linux Kernel.

The tool solves a problem commonly described as unsolvable, the Reachability Problem. BLAST is capable of telling—sometimes—if a certain line of code is reachable. It doesn't just run the program with some input data; rather, it reads the program source code, and tries to devise if some input data exist that make the program enter a specific location, a specific line of code.

Of course, it can't actually solve it (nothing can). What it can is to prove that a line is never reachable for a certain fraction of programs. A much more useful purpose is to find bugs. For instance, if we want to check a null pointer dereferences, then we could insert checks like this:

And it does find such bugs. In ISPRAS, a several-years effort within the Linux Driver Verification program/project has led to finding several dozens of problems in Linux device drivers. I even committed some fixes to these bugs to the kernel sources.

Tool Competitions

Well, I got distracted again. I've been planning a big blog post about this amazing technology for years, and I'll make one, but not now. Anyway, it was not me who devised how a program that checks reachability of lines in other programs works (it were two Cousuot-s), and I only fixed and improved an already released open-source tool, BLAST. I did create some novel things, but they were quite minor to my taste (it was just a new zero-overhead structure aliasing algorithm).

So, my last project in the Institute for System Programming was to prepare the tool to the Competition on Software Verification held at TACAS'12, which, in turn, is a part of ETAPS'12. There is a lot of domains where people write tools to solve the unsolvable (like the reachability problem) or the NP-hard (I wrote that you should try to solve this anyway). Such problems can not be universally solved, but a tool may solve them "better" than another, and it's interesting and useful to know who's the best.

A natural way to determine who is "better" is to measure the results of different tools on a common set of representative problems. This leads to specialized competitions among the tools that solve a specific problem. SMT-COMP for Satisfiability Module Theories logical solvers, and Answer-Set Programming System Competition are only two I know about, you can also name a couple of Chess and other formalized game tool competitions. I hope that SV-COMP will become one of them.

Our contribution

The Cousuots' approach is about 30 years old, and the technology beinhd BLAST itself is about 10 years old. However, BLAST contained a lot of improvements that made it apply to many programs, not just artificially created small "research-like" programs. That's why we chose it for finding bugs in Linux Drivers, and worked hard to improve it. Our improvement efforts are stored here, and are available under free software licenses, just like the original tools are.

But, being a less developed technology, it was eventually overtaken by more recent tools, such as CPAchecker, which provided a better ground for research experiments, and had a more developed theory. Still, the competition results demonstrated that the tool was still capable (and I hope that our improvements played a significant role in that). We got 5th place with 231 points, the 4th place being occupied by SATabs with 236. The final table was divided into two groups (200+ and the rest), and we belonged to the top tier.

What was funny that we even got a plaque for achieving the best results in "DeviceDrivers64" category... naturally. Too bad I wasn't there to see the ceremony :-(. Here's a photo of our team who contributed to the improvement of BLAST since version 2.5:

By the way, I already featured my former peers in a fictional interview about a "collaboration tool" by Penn&Paper's.

(Left-to-right: Vadim Mutilin, Pavel Shved, Mikhail Mandrykin.)

I'm sure that this project will eventually be superseded by more advanced software, but I still hope that I'll have time to have some fun with it as well.

A more detailed post about the verification technology pending, I'd like to thank my former peers for the great time I spent in the ISPRAS working on what I wished to do since I was a 7th-grader who first heard about the Halting problem. And I wish all the best to the Linux Driver Verification project as well.

Read on | Comments (0) | Make a comment >>


Microsoft Office Needs Windows... a lot of them!

Microsoft Office

This is a photo of a Microsoft office (stolen from here), namely, the Roger Needham Building in the University of Cambridge, one of two buildings the Microsoft Research resides in. See how many Windows it features? It just couldn't be a coincidence...

Lately, I visited Microsoft Research PhD Summer School 2011 in Cambridge, Great Britain. The event has been organized by Microsoft for more than ten years. Researchers from all the institutes over the Europe that are partners of Microsoft Research (don't know what it means, but apparently the institute I work at is one of them) come there to learn something new about researching in general, to meet researches who work at Microsoft, and to get introduced to what Microsoft is interested in.

The event was mostly held in Roger Needham Building, where, as far as I know, most of the Microsoft Researchers in Cambridge nest. I haven't been given a chance to take a tour over, but it was obvious that this Microsoft Office really has a lot of Windows on purpose ;-)

Microsoft also blogged about the event, but here I'll share my personalized view.

Microsoft and Linux

My poster

That's how my poster looked like; check out the Tux at the upper-left corner! I know it's a total crap (there's just too much text), but this was my first experience, and the next time I'll do it right... or, at least, more right than this time. Yes, next time I will not use LaTeX for making a poster. (And here is the pdf).

As a part of the obligatory participation program, I made a poster presentation on the research I'm currently doing. Of course, I couldn't miss the chance to troll Microsoft with an image of penguin Tux, the mascot of Linux, and with a title that starts with "Linux" as well (see sidenote).

It turned out that Microsoft researchers are really open-minded; they didn't get xenophobic, they joked on this matter, and some of them even confessed that they sometimes even use Linux in their work.

Of course, the most of the school, though, was devoted to introductory "breadth-first" talks on what Microsoft is doing. The talks were made by practicing Microsoft researchers, and I'll briefly overview the ones that seemed the most insightful to me.

Research techniques introspection

Simon Peyton Jones who is known for his work on Glasgow Haskell Compiler, gave an excellent talk on how to make excellent talks at a research conference. Not about GHC, though, which, probably, could be more interesting.

One of his observations, which I also noticed a couple of conferences ago, was that it's easy to be just good enough because most talks on an average research conference are really boring.

He also made a talk on how to write a good research paper, but it went beyond that, and Simon shared his view on how to interweave presentation into the research. He advised to start writing a paper at a very early stage, starting from the "middle", the "core" of the paper. This would help you to shape your thoughts, and to understand what your research is actually about better, he said. However, some people reported that this recipe may be not universal, as it (luckily?) fits the way Microsoft evaluates their employees.

Another good insight was about about some techniques of how to present a research made. I learned that the approach to presentation of our scientific results we've been taught here, in Russia, is not the only one possible. The key difference with what Simon explained is the fundamental motivation why the research has been done. I have always been taught that in each and every paper I should justify my research with how bad, wrong and suboptimal the results previously discovered were, and here I come, with my research has the sole purpose to improve over them. Simon has introduced another approach. The key motivation for the research, in his opinion, was, "look, I've created something new! Isn't it cool? Look, how efficiently it works! And, by the way, foo and bar have also been discovered, and I sincerely thank their authors for they have inspired me".

Of course, it's hard to tell that one of them is the right way to go. However, I consider papers written the second way much more pleasant both to read and to write.

My Website is Closed for Christmas

Kenji Takeda making a talk on cloud computing

Kenji Takeda gave an overview of what Microsoft is doing in clouds. Aside from flying here and there on planes, Microsoft also builds datacenters, both at software and at hardware level. I learned that an interesting solution was developed for cooling datacenters. Instead of installing fans that just move the air away, simple physical observation could be used to improve over such an inefficient and sometimes failing solution as mechanical fans. The idea is to utilize the phenomenon that the hot air is lighter, and moves up. If the upper part of a large room is open, the hot air is removed from there, thus creating the draught that sucks the cold air in from the holes in the bottom.

After the lecture, we had a brief discussion of how to manage utilization of the cloud resources at peaks of the demand, when there's just no enough power to serve all the requests per minute, such as on Christmas. The answer was that it may be achieved by raising the price at certain periods. Indeed, if we are running a scientific computation (such as genome sequencing), it can be safely paused for Christmas, whereas large shops or entertainment information providers may take the resources instead. Raising price is the way to make it self-manage, driven by reducing expenses and maximizing income for all the parties involved.

As a side-effect, it may turn out that some non-commercial services that utilize cloud computing services, such as my website, may also get temporarily shut down—just as a ground shop would be closed on holiday. So "this website is closed for Christmas" is not a distant future for us.

Biology and Ecology, and... Microsoft?

Biological simulation in Sci-Fi

This is how computer simulation in biology is portrayed in the "Hollow Man" sci-fi movie by Paul Verhoeven. As far as I know, Paul used to be a researcher, and really tried to make it look real. But you may notice that the protagonist is running a simulation of a molecule he designed, and it's not model checking: the science was not that advanced at the time of shooting.

Turns out that Microsoft really carries out a lot of research. For instance, heard a talk about fighting cancer with model checking, which was created with help of Thomas Henzinger, a well-known scientist among software verification specialists. Model checking helps us to fight cancer by discovering and verifying models of internal chemical processes in human cell—without simulations or actual experiment. These models, shaped as finite state machines, are first verified by computers, and if the automaton found is proven sane, the real experiment is carried out to confirm the relevance of the model.

I've seen something like this in the "Hollow Man" sci-fi movie, but it turned out to be the reality of a Microsoft Research laboratory! And even more, while the Hollow Man ran simulations (see sidenote), model checking techniques do not require even a model of the experiment: the assumptions about the nature of the processes being studied are verified as a mathematical abstraction instead of as a physical object, thus providing more reliable result.

Another interesting domain is ecological research. Microsoft gathers and works on providing representation of a lot of ecology-related data. Drew, a researcher from Microsoft, also shared that humanity starts experiencing difficulties with water! He said, that today we literally have to mine water as deep as hundreds of meters. And some of the drink water sources are actually getting exhausted: you have to dig deeper and deeper to get to the actual water!.. However, I'm not an ecologist, so that was just an idle curiosity there.

Anthony Hoare and yet another philosophy lecture

Sir Anthony gave a boring talk about the history of computer science projected on the development of philosophy. Ever since the times when I was visiting philosophy lectures, which were obligatory for my postgraduate studies, dozing off became an instant reaction to the history of philosophy.

I couldn't miss the chance to ask a couple of questions to such a well-known person. I, actually, had one prepared. Several months before, I heard a talk on some applications of model checking. Irina Shevtsova advocated the use of models in software verification. Roughly speaking, she insisted that a program should come with additional information to verify its correctness. Ascribing this to Hoare, she explained that the auxiliary information should be a model, a finite-state machine to depict a programmer's—or a customer's—intent. This sounded too controversial to me, but the moderator of the seminar shot my attempts to start a discussion on that matter.

So, I asked Mr. Hoare whether it was really what he meant. It turned out that it indeed was not the model what Hoare was talking about, but mere assertions. Any assertion you put becomes a piece of meta-information on what the correct behavior of the program is. The non-violation of these assertions can be automatically, or, at least, manually, verified, and the stringent adherence of any execution to all the assertions is what makes a program complying to the developer's intent. Can't wait for another encounter with Irina, where I'll have a chance to explain (oh, let's be honest, to brag about) this misunderstanding.

Terminator

Byron Cook gave an awesome talk on termination checking techniques. This is not about the extinction of the humanity by machines (well, if it is, then just indirectly), but about automatically proving that a given program will terminate. I was aware about the techniques of proving that a program will never violate a given property while running (this is called "safety property"), and I even maintain a tool that performs such a verification. However, the details of proving termination (that the program will not get stuck in an infinite loop) somehow remained being not learned by me. Byron has filled this gap with his awesome talk. I won't try to explain it briefly, and will redirect you to the article in CACM he co-authored (or, judging by his talk, made the biggest contribution to).

Cambridge

King's Colledge

This is, perhaps, the most well-known view of Cambridge—shot on my own camera! At the time of making this picture, I was on a boad being punted, the even having been kindly organized by Microsoft Research.

The event location was Cambridge, a beautiful village an hour of a train ride away from London. We quickly got used to a mild smell of potassium the town greeted us with, and the remaining dominant flavor of science encouraged the dissemination of the knowledge at the school.

We couldn't help walking through all the Cambridge at our free time, including, but not limited to several pubs (authentic English ale is awesome!) But the details of the trip are really off-topic here; what I can safely share is that if you get to England, it's totally worth it to spend a day taking a tour over Cambridge.

Conclusion?

What was most valuable to me personally in this School is the breadth. I learned a few about a lot, and that's how you really enjoy getting new knowledge. I met one person who worked in a similar domain (and I missed a change to have a good discussion with another guy—jeez, I should really learn how to approach people!), and had an interesting discussion with him. I learned something new about how to present my research well, and... Well, I learned that Microsoft is not that evil! Isn't this what the organizers hoped for in the first place?..

Read on | Comments (0) | Make a comment >>


Availability of software development keeps increasing (SYRCoSE 2010 notes)

Recently I visited a conference for beginner scientists and engineers entitled Summer Young Researchers Colloquium on Software Engineering, and presented my paper On Reasoning about Finite Sets in Software Model Checking. My talk wasn't great--mainly because the research didn't yield the results it was expected to. However, the post is not about this.

Most of talks weren't very inspiring. However, some of them revealed interesting ideas that concern availability of programming.

Service-oriented development

The first talk, devoted to service-oriented approach to programming, was given by invited speaker Dr. Prof. Tiziana Margaria-Steffen from Potsdam University.

The main idea of her report was that software development is unnecessarily complex. Software Engineering, the discipline that studies the craft of creating software, mostly addresses issues of how it should be made, not what should appear at the end. However, if we switch focus to the services that software should provide, we can make the development process different.

jABC's business logic editor screenshot

She later passed on to the system that demonstrates the success of such a focus shift, jABC. Its main purpose is to separate business logic from implementation of the concrete components, and have a nice graphical editor of the former (see side note).

Of course you heard about such systems a lot of times. Of course, I couldn't stand trying to find out what's so special. According to Tiziana, that system has been developed for over 15 years, and still isn't dead. The reason is that the business logic was constrained to having absolutely no synchronization between concurrent processes; an only parallelism supported is fork-join parallelism: spawn a constant number of threads and wait for them to finish.

But of course, expressiveness isn't the main feature of such systems. The main one is that it allows non-programmers to construct programs. In my opinion, however, if a person is capable to create a runnable flowchart, he already has enough skills to do it without any "flowchart-driven development". Tiziana's answer to my question about that was that to create a program, a certain amount of skills is indeed required. But if you need to slightly modify the program already created by someone who has enough skills, you don't have to bear them as well.

After I poured upon that, I agreed that it's possible. Perhaps, the reason is that minor modifications require the skill of reading programs, not the skill of programming, which is necessary to creating them. And it is such "flowchart" systems that make this possible. And it means that more people will be programming.

Not-so-natural language

Another report that interested me was the one given by Andrey Klebanov from SpB ITMO. It presented yet another system that verifies automata-based programs, but it had a minor issue which is, in my opinion, more profound than the rest of the work.

The specification for the checker should be written in English language, constrained by a specific grammar. The specification language, which defined some kind of a temporal logic, had a grammar which contained natural-language sentences! So the requirements would look like this, literally: After the red button was pressed, in the finite amount of time coffee will be made. (instead of a temporal logic formula "Button -> ◊Coffee"). And what is important, these requirements will be processed by a machine.

Anyone who's familiar with maths, and who is a specialist in software engineering would argue whether it's convenient. Such enormous amount of unnecessary text in conjunctions and pronouns would anyway make specification writers auto-generate them from a more succinct format.

Well, that's what availability means. And I think that it is a boon.

But I later realized that this system has an important benefit. It increases the availability of specification writing, lowers the bound of education necessary to write a machine-readable spec. If a highly-qualified specialist is to write such a spec, it will convert it from a succinct notation--but such conversion would anyways have been made. Abstracting away from the natural language, writing temporal logic requirements in mathematical notation requires certain skills, which not everyone possesses. So you could hire a less qualified programmer to to the specification writing job. A "natural" language notation should make writing requirements in a formal way more available, more cheap, and increase demand for verification tools.

Are the programmers still needed?

Of course, the increase of availability of development arises sad questions. Will we, the programmers, be needed by mankind in the future? Will we be the new Luddites?

I think, we totally will. However, someone has to write and maintain programs that will replace us. Someone has to make a relevant research. Communications of the ACM May 2010 issue presents an article about the (still research) prototype of a genetic algorithm that fixes bugs in programs. Today it makes most of us smile. Soon the smile will change to grin, and then--to panic.

However, I will not worry about that. It's natural, at the first place. Also, the first parts of the process that will be overtaken by robots and by low-qualified workers are the most tedious aspects of programming. Fixing bugs by digging into stacktraces, writing integration code, writing tests, creating specifications for complex obsolete programs will be the first to automate and outsource. What will left is the pure art of programming, that small piece of our work we all love it for. There'll be enough of that for the rest of my life.

And my kids will be lawyers.

Read on | Comments (5) | Make a comment >>


More posts about science >>