Recent blog updates

Shown here are only posts related to philosophy. You can view all posts here.

Relativity of Simultaneity in Distributed Computing

About a year ago, I described an allusion to physical phenomena in computing in "Caching and Levers". This post is devoted to a more complex theory, namely the Special Theory of Relativity (STR), and mostly to one of its implications, the "relativity of simultaneity".

Relativity of Simultaneity

Special Theory of Relativity may be approached as a set of equations for coordinate translation that take their effect as the speeds of objects approach the speed of light, c.

STR is based on two postulates, one of which states that light is always propagated in empty space with a constant velocity c regardless of the speed of the emitting body. This may be used to introduce a tricky paradox.

Train car experiment

As seen by a passenger of the train. The events happen simultaneously.

As seen by a spectator on the station. The "back" event happens earlier than the "front" event.

(pictures from Wikipedia)

Assume that a train car is passing you when you're standing on a station platform. The train car has a lamp installed exactly at its middle. When the train passes you, a lamp flashes, and then you watch when the light reaches the front and the back of the car. It's interesting that those who sit in the car will notice the light hitting the two walls simultaneously, while those who stand at a platform will notice the back wall lit earlier than the front one. This is implied by the postulate described above: as seen by a spectator on the platform, the light propagates with equal speed at all directions, so it will hit the rear wall before the front one, as the former moves toward the light source, while the latter reproaches it. See wikipedia for more info.

This paradox, known as relativity of simultaneity may be formulated more generically: whether several events are simultaneous, depends on the location one watches them.

But what does it have to do with the computing? Before I answer this, I'd like to share what I've learned from a CACM magazine.

Quiescent consistency

In the multicore age, the classical data structures we all know about are about to change. The reason for that is the increasing demand on computation speed and volume that careful "sequential" CPU chip engineering is no longer capable to provide.

This challenge makes us rethink our approach to algorithm and data structure design. If we want data strctures to be efficient, we no longer may expect them to behave as good old ones in the sequential edge.

In distributed programming, there is an approach to describe data structure behavior known as quiescent consistency. There is a number of consistency conditions, sequential consistency, linearizability and others. These conditions describe how an object behaves when there are several threads calling its methods.

A data structure possesses quiescent consistency if it is consistent between its states of quiescence, i.e. when there are no methods currently in progress. As soon as a quiescently consistent structure has no operations pending (i.e. reaches the quiescence), we may be sure that the executions of methods before this state and after this state is never interpositioned.

Imagine a quiescently consistent stack. An implementation of it is described in this CACM paper "Data Structures in the Multicore Age", the one where I first encountered the quiescent consistency concept. Assume the following sequence of events happen to the q.c. stack:

  1. Three pushes x,y,z
  2. Quiescence (the pushes are processed)
  3. Three more pushes a,b,c
  4. Quiescence (the pushes are processed)
  5. Three pops
  6. Quiescence (the pushes are processed)
  7. Three more pops
  8. Quiescence

Note that q.c. doesn't mean that a data structure guarantees nothing except for this specific consistency. Consistency condition only maps data structure behavior in a concurrent setting to a behavior in a single-threaded environment, i.e. it only limits the number of a multitude of different sequences of method calls that may happen for a specific set of multithreaded events. All the properties a data structure exhibit in this sequential processing should still apply.

Quiescent consistency guarantees that the first three pops return x, y, and z (in an arbitrary order), and the next three pops return a, b, and c, somehow intermixed as well.

Note that, unlike linearizability, quiescent consistency doesn't impose any ordering on results of pops if there was no quiescence between the initial pushes. For instance, if processing of the first and the third pushes do not overlap, the linearizability ensures the pops will respect this order, while q.c. (in case that the second push overlaps with both of them) doesn't ensure that.

Having noted that, q.c. looks like a very weak and useless property. However, it still implies correctness, which, however, is enough in many circumstances. Imagine that you need a pool of unique numbers. It is enough to use a q.c. counter; the numbers it returns will be unique, which should fulfill our initial purpose.

Do we really need stronger consistency?

However, the reasons why we may be satisfied with weaker consistency conditions are not constrained with the specific examples. Let's try to prove the opposite. Assume we're implementing a stack that is accessed from a number of threads. Quiescent consistency may be not enough because if a push of A precedes the push of B, then the pops should be ordered in the specific way, and quiescent consistency may not guarantee that.

But wait... We say, "If one push precedes the other," but do we really know what "precedes" mean? If two threads in our distributed computational system invoke push call independently, how can we be sure that one of them precedes the other? And is there any sense in defining a measure for that?

Let's reckon the paradox we described at the beginning. In one reference frame, one event precedes the other, and in a different reference frame it's vice versa. So whether one push happens before the other, depends on the place we look from, and is not—and may not be—determined by a central, ubiquitous authority. That's a law of the universe, and we can't change this in computing.

Surprisingly, it makes quiescent consistency be a fairly appropriate constraint for a distributed data structure. If there's no strict sense of precedence between events that happen in different places of our network then why can't we use this to develop a more efficient data structure? The correct answer is, "indeed, why not?", and such data structures are well-defined nowadays.

OSI model

OSI model is an abstraction that aims to make the design of distributed computation systems easier. The chain of events that happen during a process of data transmission is separated to several separated layers: each layer has its own protocol, which is independent of the actual implementation of the underlying layers. You may learn more at the wikipedia.

STR vs computing: the difference

We have successfully used a metaphor from physics in distributed computing, but there is an inherent limitation of applying the concept further. In physics, if we have a car of a specific length (as measured by its riders), we may install the light source in such a way that the events at two sides of it happen predictably simultaneous in the reference frame of the spectator at the station.

In computing, we can't do that. OSI model prevents us from predicting how fast the events happen by masking out the capabilities of a system at lower layers. In physical reality, we know that as soon as the light "hits" the back and front covers of the car, the event "happens" (or, more precisely, a piece of reflected light is launched towards the spectator). In computing, however, we don't even know how long it takes for a signal to reach another node. Moreover, the OSI model makes this impossible to predict. All the layers guarantee correctness, but do not have any timing conditions.

Conclusion

The absence of timing in the OSI model suggests that may be no sense in saying, "this structure may not be used, as it processed the request in the different order than they're issued in". The "order they're issued in" is inherently unpredictable in our computing models.

This inherent unpredictability of timings of processes and events, as described in OSI model, is why we really can't apply the special theory of relativity to computing. However, we still may notice that simultaneity in the world around us, as confirmed by physical experimentation, is relative. And the nature still works well!

And if the nature accepts it, why can't we learn from it, and allow our distributed systems to be less predictable, and trade this predictability for speed?..

Read on | Comments (1) | Make a comment >>


Fail Early, Fail Fast

As you probably know (I'm sure I mentioned it somewhere in this blog, not just in my CV), about a half a year ago, I developed a set of components to make our Linux Driver Verification toolset run in a cloud, i.e. on a set of dynamically added and removed virtual machines.

Being low on the amount of man-hours allocated to this task, I needed to make the development as fast as possible. Our toolset was, in fact a simple client-server architecture: it had a large task that split on several independent smaller tasks, and then collected their results. While we did have making our toolset distributed in mind (I even had a post sketching our small debate on how to create an parallelization-friendly architecture), the data exchange had been implemented long before the infrastructure for distributed computing. The data exchange was based on a strict layout of files in the working directory: clients expected the input data in specific places, and server was to ensure the proper files reside there. Then, the server read the output from the specific files the clients created.

Linux has a support for Filesystems in USErspace (FUSE). This means that a non-root user may install and mount a "virtual" file system that would map system calls to the files and directories to whatever user specifies, such as an access to an archive, or to a file on a remote machine.

One of such file systems, an SSH-FS, which mirrors a filesystem on a remote machine via a Secure Shell tunnel, was featured in my previous posts about open3.

In a cloud, a client and the server are located on separate machines. How do we perform the data exchange? The simplest way to do was to use a transparent network file system, such that the data access code is untouched. Indeed, clients just read files from a filesystem, and a combination of Linux filesystem tools maps the read() calls across the network to reads from a server machine.

To achieve this, I used a combination of SSHFS and UnionFS, both being userspace file systems (see sidenote). UnionFS was used to make client write to a local machine and read files from the remote server while thinking it's just one simple filesystem.

It seemed a nice solution because the files the client expected to find in a filesystem were seamlessly transferred via sshfs. Its caching capabilities ensured that the latency happened only when the file was located, and simple buffered reads on that file were cached in advance. Since we need to locate only a relatively small amount of files, but read a lot from them, the latency shouldn't have been crucial.

Have you already guessed where the problem is?..

The trouble: C preprocessing

And there was one simple task in the whole infrastructure, which never got much traction, but is now responsible for all the latency happening in the data exchange via network. Our toolset analyzed C source code, therefore, it needed to preprocess it. And the trouble was in that plain old C preprocessing; namely, its #include directives.

How do #includes in C preprocessor work? A compiler has a prioritized array of folders to look for headers in, the user specifies its own prioritized set of folders, and each #include <file.h> enumerates the array starting from the folder of highest priority, and looks up file.h in each of them, until it founds a match. Then it inserts the contents of the file found into the source code, and keeps preprocessing. What is the incompliancy with our assertions about the performance of our filesystem stack?

During C preprocessing, a very large number of file lookups will fail!

It's the crucial point of preprocessing; it's been working like this for decades, and we shouldn't touch it (we don't want it, too, as using C preprocessor as a third-party tool is much easier to deploy). This is an example of a completely legitimate motivation for issuing a lot of failed requests to something (to file system, in our case). And it works comparatively well locally: while preprocessing of one file on a local disc would take less than a second, the same task over the network requires ten, or more! What could we do with our filesystem setup to avoid such performance problems?

Nothing. The calls that are doomed to a failure on the UnionFS level will be routed over the network to fail on the server, causing insane amounts of latency. Even copying all the headers from the remote machine to the local one wouldn't help: the successes would be executed faster indeed, but the failures would make their way to the server, and cause latencies anyway.

***

What could be learned from this? When you're trying to design a system, you should pay a closer attention to failures. It's tempting to only optimize your architecture for a fast handling of successful requests, while failure of a request may be as much of importance!

Of course, this varies with your domain: for instance, a web authentication system would specifically slow down failures as a protection against brute-force attacks. In my case, however, I'll have to make a patch to SSH-FS (or, to UnionFS) for it to cache failed file lookups. And I did it! See my blog post that describes how I patched it

But when I'll design another piece of software in my career, and it will have to handle legitimate unsuccessful requests, I'll do my best to make it fail early and fail fast.

Read on | Comments (2) | Make a comment >>


Programming as gold mining

This article was planned before the creation of this blog. I tried to understand what programming is as an approach to problem-solving, rather than just as a tool. And I found out that the approach of programming reminds what happens when you mine gold (but the analogy is not the most obvious one you probably thought about). The article describes the gist of this point of view to programming and depicts the connection to mining.

The difference between programmers and common herd

Unlike the IT guys, like me and you, people aren't that excited with programming. Connecting and arranging components of an upcoming system on a blackboard is not an inspiring activity for them. They don't cheer when they manage to write a function in a curiously small number of lines. They have no possibility to settle back, satisfied, and smile, when the program works outright after the first compilation without need to debug it.

But the world is somehow arranged in such a way that these insensitive people nevertheless are those who pays us money! What programming is valued for by them is that it's a good way to solve problems. These problems are usually beyond those a human can handle, and you usually need a horde of zombies to tackle them.

However, problem solving is approached differently by those who can code and those who can't. The management of a non-software company usually needs to do something concrete, like being able to fill forms M-186 and P-337b through the web browser. After they find out that such an automated system would help to run the business, they assign their IT guys to the task. They actually don't care how beautiful the code will be, how much satisfaction will the IT crowd have. They have a problem, and they need a solution.

Sorting ten cards

Assume you are given the task to sort 10 cards... You could write code like this...

But I am sure that there's no one who calls himself a programmer and does it this way. The code's too inefficient, impossible to maintain and can't be written without help of an external program.

So you end up writing something like this (that's a Bubble sort):

But let's reckon the original task. It was, let me remind, to sort ten cards. But it happens, that to sort ten cards you literally have to write a somewhat complex program, which sorts any number of cards. You can't write 10-element sorting without writing any-number-of-element sorting!

In fact, the fact stated in previous paragraph explains the crucial difference that prevents others from understanding that programming is hard. They think like "what is so difficult in sorting ten cards?" "You could just ... what's the word?.. Um, ah! Hardcode the form M-186 onto the webpage and collect results into a database!" They can't understand that you need to create a program, which is close to processing any form, in order to do it only for two forms.

This is the evidence of the idea that programming a solution to a particular problem is always nearly as hard as programming the solution for a general one.

Boon or bane?

The downside of a tendency of "programming" to solve problems more general is that people don't understand how sorting ten items may be so hard. Books are written about it (reckon Don Knuth's "Sorting and Searching" TAoCP book?). Research is held about it (new sorting methods are still developed to solve ever increasing number of edge cases and to handle specifically ordered arrays effectively). But, hell, any kid can arrange ten cards in the order of increasing value! He can't devise an algorithm of doing it for N cards, but we didn't ask for N, did we? Thinking like this, most people just can't understand what's so hard, and seem to value the art of sorting less than it really deserves. Let alone the other crafts of programming.

It's of no wonder that scaling makes a program more complex, and thus changes the algorithm in a way more than just increasing an array's boundary. But scaling and generalizing are nevertheless different problems: while the former is about looking for effective solutions for greater inputs, the latter is just about a theoretical capability of a developed algorithm to solve a larger problem.

On the bright side is the possibility to recycle the waste of programming. Once you've written a sorting function for array of 10 cards, it takes you just to increase the array size to make your program sort a lot of cards. Of course, if you wrote an Ω(n²) sort, like I did in the example above, it won't scale that well, but we assume you picked a better solution to the initial problem. So now you have a program that can sort any array.

Creating such residue is inevitable. We have a good analogy in the other fields: the one who extracts oil should also extract the accompanying gas, build facilities to handle it and make the site protected from accidents that involve gas. In the very same way, a programmer is doomed to produce programs that are more general, and he should be aware that it's just more hard.

By the way, printing invitations in an automated way is a good sign that you're dealing with a programmer. Even if a programmer never encountered a task of such sort, he will always use a programmable way to print invitations.

When I was at school, I was assigned a task to print invitations to the prom. I never did anything like that before, but, being a programmer I managed to find a way to do this. I scanned and recognized a list of persons into Microsoft Excel, imported it to Microsoft Access database, and printed invitations as a series of nicely formatted database records with external background image.

Completely lame, but what else could I do back then?

This property has even been a source of jokes. Alex Exler in his "Notes of a programmer's bride" (oh, I re-read that and fixed the description that follows) describes a situation that a programmer spent a day to write a program that prints invitations to his wedding. The bride bragged that her husband has such a good and a very useful profession. The man she bragged to questioned if it was easier to just fill them manually in a text editor, and after bride's reply about "generalization", he astonished her with, "Yeah? So how many times are you going to get married?"

Conclusion

Back to a programmer's residue. Brooks in his famous "Mythical man-month" claims, however, that producing a reusable component takes three times greater effort than producing the non-reusable one tied to a specific task. Most programs that don't have such a clear abstraction as array sorting, are far more tricky to be scaled to greater inputs without effort.

But that does not contradict the main idea that you can't evade producing residue. The raw "general" residue programming leaves after solving a particular problem is useless, just like in gold mining. To concentrate the ore, you must put effort into doing it, and to get some product from the residue, you should work more. And that's what happens when we write programs. So, isn't programming a gold mining of sorts?

Read on | Comments (0) | Make a comment >>


Programming as Controlling Mindless Zombies

The question that bothers many developers and, maybe, even people in related topics is "What is Programming?"

Indeed, many of us do it every day (more seasoned ones taking a break on weekends), but who really bothered himself with a philosophy of what he's doing? Since I'm fond of making up my own philosophy, I'll try to pore upon the subject in several blog posts. In this one I'd address one side of it, and you, having read the title, already know which.

Computer programming is indeed controlling a horde of mindless zombies. Let's check definition of a zombie on wikipedia:

A zombie is a creature that appears in books and popular culture typically as a reanimated dead or a mindless human being. Stories of zombies originated in the Afro-Caribbean spiritual belief system of Vodou, which told of the people being controlled as laborers by a powerful wizard

First of all, computers are dumb. They really are. Just compare the time you need to devise an idea, to think it out with the time you have to spend to explain it to a computer. To a machine that understands only ugly, primitive texts that do not make sense to any human, unless he's a programmer. How many times did you scold your computer for being not as smart as you? For not understanding that here he shouldn't do this, because it's obvious that he should do that instead?

Recently I had been writing an OCaml program. OCaml is a very smart language that can infer the types of identifiers by looking at operations applied to them. That's right, he's smart--as long as things go the way he used to perceiving. But as soon as you write "separate the list to a list concatenated with an element" instead of "separate the list to an element concatenated with a list", he becomes unbelievably dumb. If it was a russian bureaucrat, I would even think he's doing it on purpose to extort a bribe...

So, the computers are totally incapable of doing anything not told to them explicitly. That's quite a good definition of being mindless.

But these mindless machines happen to do some useful work. And the complexity of this work, usually inhandleable by any human, sometimes gives a false sense that these machines possesses intellect. But that's not intellect of their own. The intellect within a binary is that of the programmer who coded it, of the tester who checked that the dumb machine doesn't crush all around, of the manager who gathered them, and of millions of people who invested their mindpower into compilers and interpreters--the things that make one mindless machine teach another. I can't help quoting the famous Conway's law:

organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

(quoting via Wikipedia). This law is a direct consequence of the fact that computers are dumb, and it's we, the humans, the organizations, that inculcate our own properties to them.

So the result of computer programming is a horde of creatures that carry the will of their masters. Well, sometimes not exactly what the masters meant, thus you see strange error messages here and there, where you really don't expect them. That's because the creatures are so dumb, that a separate programming engineering discipline is devoted to "talking to zombies" (it's also known as "coding"), and the discipline is not practiced well enough by some developers.

And the last thing that concludes the post. While it's always hard to instruct a zombie, it's sometimes also hard to make an already instructed zombie to fulfill your needs as user's. Therefore, users are sometimes required to posses the thing these creatures do not have: brains...

Mindless creatures that look for brains? That just can't sound more zomby! And it's precisely what we deal with while programming.

Read on | Comments (0) | Make a comment >>


More posts about philosophy >>