Shown here are only posts related to data-structures. You can view all posts here.

## Relativity of Simultaneity in Distributed Computing

About a year ago, I described an allusion to physical phenomena in computing in "Caching and Levers". This post is devoted to a more complex theory, namely the Special Theory of Relativity (STR), and mostly to one of its implications, the "relativity of simultaneity".

### Relativity of Simultaneity

Special Theory of Relativity may be approached as a set of equations for coordinate translation that take their effect as the speeds of objects approach the speed of light, c.

STR is based on two postulates, one of which states that light is always propagated in empty space with a constant velocity c regardless of the speed of the emitting body. This may be used to introduce a tricky paradox.

#### Train car experiment

As seen by a passenger of the train. The events happen simultaneously.

As seen by a spectator on the station. The "back" event happens earlier than the "front" event.

(pictures from Wikipedia)

Assume that a train car is passing you when you're standing on a station platform. The train car has a lamp installed exactly at its middle. When the train passes you, a lamp flashes, and then you watch when the light reaches the front and the back of the car. It's interesting that those who sit in the car will notice the light hitting the two walls simultaneously, while those who stand at a platform will notice the back wall lit earlier than the front one. This is implied by the postulate described above: as seen by a spectator on the platform, the light propagates with equal speed at all directions, so it will hit the rear wall before the front one, as the former moves toward the light source, while the latter reproaches it. See wikipedia for more info.

This paradox, known as relativity of simultaneity may be formulated more generically: whether several events are simultaneous, depends on the location one watches them.

But what does it have to do with the computing? Before I answer this, I'd like to share what I've learned from a CACM magazine.

### Quiescent consistency

In the multicore age, the classical data structures we all know about are about to change. The reason for that is the increasing demand on computation speed and volume that careful "sequential" CPU chip engineering is no longer capable to provide.

This challenge makes us rethink our approach to algorithm and data structure design. If we want data strctures to be efficient, we no longer may expect them to behave as good old ones in the sequential edge.

In distributed programming, there is an approach to describe data structure behavior known as quiescent consistency. There is a number of consistency conditions, sequential consistency, linearizability and others. These conditions describe how an object behaves when there are several threads calling its methods.

A data structure possesses quiescent consistency if it is consistent between its states of quiescence, i.e. when there are no methods currently in progress. As soon as a quiescently consistent structure has no operations pending (i.e. reaches the quiescence), we may be sure that the executions of methods before this state and after this state is never interpositioned.

Imagine a quiescently consistent stack. An implementation of it is described in this CACM paper "Data Structures in the Multicore Age", the one where I first encountered the quiescent consistency concept. Assume the following sequence of events happen to the q.c. stack:

1. Three pushes x,y,z
2. Quiescence (the pushes are processed)
3. Three more pushes a,b,c
4. Quiescence (the pushes are processed)
5. Three pops
6. Quiescence (the pushes are processed)
7. Three more pops
8. Quiescence

Note that q.c. doesn't mean that a data structure guarantees nothing except for this specific consistency. Consistency condition only maps data structure behavior in a concurrent setting to a behavior in a single-threaded environment, i.e. it only limits the number of a multitude of different sequences of method calls that may happen for a specific set of multithreaded events. All the properties a data structure exhibit in this sequential processing should still apply.

Quiescent consistency guarantees that the first three pops return x, y, and z (in an arbitrary order), and the next three pops return a, b, and c, somehow intermixed as well.

Note that, unlike linearizability, quiescent consistency doesn't impose any ordering on results of pops if there was no quiescence between the initial pushes. For instance, if processing of the first and the third pushes do not overlap, the linearizability ensures the pops will respect this order, while q.c. (in case that the second push overlaps with both of them) doesn't ensure that.

Having noted that, q.c. looks like a very weak and useless property. However, it still implies correctness, which, however, is enough in many circumstances. Imagine that you need a pool of unique numbers. It is enough to use a q.c. counter; the numbers it returns will be unique, which should fulfill our initial purpose.

### Do we really need stronger consistency?

However, the reasons why we may be satisfied with weaker consistency conditions are not constrained with the specific examples. Let's try to prove the opposite. Assume we're implementing a stack that is accessed from a number of threads. Quiescent consistency may be not enough because if a push of A precedes the push of B, then the pops should be ordered in the specific way, and quiescent consistency may not guarantee that.

But wait... We say, "If one push precedes the other," but do we really know what "precedes" mean? If two threads in our distributed computational system invoke push call independently, how can we be sure that one of them precedes the other? And is there any sense in defining a measure for that?

Let's reckon the paradox we described at the beginning. In one reference frame, one event precedes the other, and in a different reference frame it's vice versa. So whether one push happens before the other, depends on the place we look from, and is not—and may not be—determined by a central, ubiquitous authority. That's a law of the universe, and we can't change this in computing.

Surprisingly, it makes quiescent consistency be a fairly appropriate constraint for a distributed data structure. If there's no strict sense of precedence between events that happen in different places of our network then why can't we use this to develop a more efficient data structure? The correct answer is, "indeed, why not?", and such data structures are well-defined nowadays.

#### OSI model

OSI model is an abstraction that aims to make the design of distributed computation systems easier. The chain of events that happen during a process of data transmission is separated to several separated layers: each layer has its own protocol, which is independent of the actual implementation of the underlying layers. You may learn more at the wikipedia.

### STR vs computing: the difference

We have successfully used a metaphor from physics in distributed computing, but there is an inherent limitation of applying the concept further. In physics, if we have a car of a specific length (as measured by its riders), we may install the light source in such a way that the events at two sides of it happen predictably simultaneous in the reference frame of the spectator at the station.

In computing, we can't do that. OSI model prevents us from predicting how fast the events happen by masking out the capabilities of a system at lower layers. In physical reality, we know that as soon as the light "hits" the back and front covers of the car, the event "happens" (or, more precisely, a piece of reflected light is launched towards the spectator). In computing, however, we don't even know how long it takes for a signal to reach another node. Moreover, the OSI model makes this impossible to predict. All the layers guarantee correctness, but do not have any timing conditions.

### Conclusion

The absence of timing in the OSI model suggests that may be no sense in saying, "this structure may not be used, as it processed the request in the different order than they're issued in". The "order they're issued in" is inherently unpredictable in our computing models.

This inherent unpredictability of timings of processes and events, as described in OSI model, is why we really can't apply the special theory of relativity to computing. However, we still may notice that simultaneity in the world around us, as confirmed by physical experimentation, is relative. And the nature still works well!

And if the nature accepts it, why can't we learn from it, and allow our distributed systems to be less predictable, and trade this predictability for speed?..

## OCaml Hash Table Chronicles

While preparing the static verification tool I maintain, which is written in OCaml, to the Competition on Software Verification, I noticed a strange performance problem. Queries to a hash table structure seemed to take too much time. Of course, there were a lot of such queries, but a greater amount of queries to a much more complicated data structure, a Binary Decision Diagram, caused the CPU less trouble.

I've been troubleshooting the issue for several hours, and the spot where it hit me was, as a Dr. Watson would say after Sherlock Holmes explained him the case, elementary. Bet let's explicate the case from the very beginning.

### What an OCaml hash table looks like

The hash table implementation in OCaml, in addition to the expected capabilities, can also keep the history. So if you add several values for the same key, you may either query the latest value added (via find method), or query them all in the order of appearance (via find_all counterpart). Of course, you can also remove the latest binding of a key (via the remove method), so that the subsequent finds will return the previous value it was bound to.

"Who would want that?" you might ask. It appears, that this feature is very convenient. First, you get a standard library data structure for "a map from something to a list of something" with nearly the same interface as the plain one, without "list". Second, which is more important, when you wrote a multitude of code boilerplates, and only then you realize that not only you need the current value for a key, but, under certain circumstances, the history of its updates as well, you might get really upset. Especially if you use such a strict language as OCaml. Then, you'll praise the opportunity to keep all the code the same, querying the history just where needed.

The latter was what happened in our project. And in this history querying we started to encounter performance problems.

### Batteries not included

The first problem was that the standard OCaml hash table interface doesn't have remove_all function that clears the history of a given key! To do that, we had to call remove method as many times, as many values there were for the key. Our profiler demonstrated that it was here where nearly 25% of the runtime had been spent.

I integrated a hash table from "OCaml Batteries Included" project that provided an extension to the standard library. Its hash table did have the remove_all method, and I hoped it would help.

It didn't. It decreased the runtime of removing, but not dramatically; besides, the run times of finds and find_alls remained bigger than I felt they should've been.

### Digging deeper...

It was time to dig deeper into the Hash table implementation. But I didn't frown upon that. OCaml is a functional language, and I've been playing with it for quite a while. Sometimes, here and there, I notice strikingly nice implementations of well-known algorithms I learned in the imperative terms (the most nice would be depth-first traversal). No wonder I took a look into its intrinsics.

Unfortunately, hash table is too imperative a data structure for its implementation to look too uncommon. It was a plain array with lists attached to each of its elements.

What was surprising is that storing history appeared to be simpler than not storing it! When an value for a key is added to the table, it's just inserted at the beginning of the bucket's list, and the usual, simple search for a matching key in the list would always hit the latest value bound! removeing the key is as simple: you do the same search, and the first value found is removed, making the value added before it available for the next lookup, just in the historical order.

No wonder that our "remove a value for a key as many times as many bindings there were" didn't work fast: its runtime was O(n²), n being the size of the bucket, while we should do it in O(n). Of course, the same worst-case-linear runtime of bucket length couldn't go anywhere, but at least wasting the valuable time in a useless quadratic loop was avoided.

Still, it wasn't fast enough. The integral time to remove all values decreased, but didn't become negligible. Why could that be?

Perhaps, there were just too many values? I inserted some debug prints to the implementation of hash table, and tried to measure how many iterations the remove_all made at each operation. Yes, indeed too many. Hash table bucket size, given a good hash function, should be close to the number of elements divided by the size of hash table's array. And OCaml's standard universal hash function should definitely be good, and it even had a C implementation tuned for speed, apparently. So, the solution could be to increase hash table size.

When the hash table size increase also didn't help, I was quite surprised. How come? I was really lost.

...Until I added a print of hash function values. Yes, they were repeated quite often, and even the keys that were just slightly different had the same hashes—and good hash functions have to assign dramatically different values to slightly different inputs.

I took a closer look to the hash functions, and noticed some magical numbers (10 and 100). What were they?

```external hash_param : int -> int -> 'a -> int = "caml_hash_univ_param" "noalloc"

let hash x = hash_param 10 100 x
```

I referred to the documentation, and... uh, I wish it was what I started with!

```val hash : 'a -> int
```

Hashtbl.hash x associates a positive integer to any value of any type. It is guaranteed that if x = y or Pervasives.compare x y = 0, then hash x = hash y. Moreover, hash always terminates, even on cyclic structures.

```val hash_param : int -> int -> 'a -> int
```

Hashtbl.hash_param n m x computes a hash value for x, with the same properties as for hash. The two extra parameters n and m give more precise control over hashing. Hashing performs a depth-first, right-to-left traversal of the structure x, stopping after n meaningful nodes were encountered, or m nodes, meaningful or not, were encountered. Meaningful nodes are: integers; floating-point numbers; strings; characters; booleans; and constant constructors. Larger values of m and n means that more nodes are taken into account to compute the final hash value, and therefore collisions are less likely to happen. However, hashing takes longer. The parameters m and n govern the tradeoff between accuracy and speed.

Standard library documentation for hash table, see here.

The keys for the hash value were complex structures (think recursively folded C structures), and many of them shared large parts, which made hash values collide too often. I changed the hash function used for that table to Hashtbl.hash_param 100 100, and the whole table became as fast as a jaguar in a savanna.

### Lessons learned

The first thing to see as a morale is that you should read documentation before doing hard debugging. Of course, feeling like a file system hacker is invaluable, but, if you are after delivering something, keeping it slow pays you back in the long run.

When I was writing this post (and it's another cool thing about writing a blog), I suddenly remembered another issue I had with another hash table. The keys also were complex structures, but that table didn't use them directly. Instead, the complex structures were converted to strings, and only they were used as keys. I thought it was too redundant, but the straightforward way, to use the structures as keys, was, surprisingly, slower. I merely rolled back the changes, but now I realize the background behind this. So, perhaps, converting structures to strings (especially when there's a natural 1-to-1 match) helps, as the whole structure will be considered to distinguish.

I realized that keeping the history of hash table updates was not a nice feature at all. Rather, it appeared to be just a small bonus with lack of support in all the aspects (I had to integrate code from the other library to add such a natural functionality as remove_all). Or, it can be used as saving time in return of leaving older unneeded bindings unremoved.

And the last, but not least, I saw a good example how different languages teach you new concepts and new ways to look at the things you're already used to. Such experience, even at cost of half a day, was totally worth it.

## Treap: simple balanced search tree

### Balanced trees and programming contests

This is an example of treap: letters form an alphabetically-sorted search tree, while numbers are arranged as a heap.
Pic from Wikipedia

Picture A. We're going to insert yellow note into tree with blue root.

Picture B. We arranged some nodes but we have to split the subtree of the maroon node into two subtrees to fit dangling red pointers .

Picture C. If we could solve the task of insertion, we could have split the subtree of red node as if out new yellow node was inserted there instead of to a subtree of a blue node.

Picture D. After we've splitted the subtree of the red node, we can now assign to dangling pointers.

A balanced binary search tree is a binary search tree whose height is not big, around O(log N). This value is crucial, because search operation, the one these trees are named after, takes O(tree_height) time.

In a typical programming contest you're given 5 hours to write several programs, each of which, aside passing all jury's tests, meets specified time limit. One of keys to success is (or, at least, was) the ability to quickly code fast standard algorithms and data structures: disjoint sets, heaps, graphs, etc. And due to limited duration of competition not only should the algorithms be fast, but also easy to write in bug-free mode under an emotional pressure.

When it comes to the question how to implement a balanced binary search tree, most of programmers will outright shout "red-black tree!" (wiki). Indeed, it is balanced and is implemented in standard libraries of many languages, so it may seem a good choice (and even better as an advice) unless... Unless you're to code is yourself. It's implementation is way too complex, while the benefits of guaranteed indeterministic 2*log(N+1) height are sometimes not utilized.

### A Treap in my mailbox

Back to the times when I was participating in those contests, I got a mail from one of our trainers. Within was a description of a nice data structure that I later discovered to be called treap. And now I'd like to tell about it.

A treap is a conjunction of words "tree" and "heap" (or "tree" and "heap", for left-recursive readers). No wonder: it comprises properties of both tree and a heap. Or, more strict, it is a tree and a heap at the same time. Each node, aside from payload data, contains two keys; if we look at one key we will see a heap and if we look at another key we'll see a binary search tree.

Our main goal is nonetheless to build a balanced binary tree out of sequence of N key-value pairs that may come as an input in an unknown order. When you insert a node to a treap, you should pick up a uniformly distributed random number and use it as a heap-key. Of course, at the same time you should maintain search tree property for the tree-key.

### Not as complex

Hell, the insertion procedure will be complex, you think. The trick is that restrictions usually help rather than prevent. So, assume we need to insert an element [Eh,Et] (that's heap and tree keys) to a (sub)tree with a given root [Rh,Rt].

1. Eh < Rh - since we maintain heap, our new node should be added to one of the root's subtrees. To left or to right is decided based on tree-keys Rt <=> Et comparison, since we maintain tree as well. To insert element to a proper subtree we call the same insertion procedure recursively.
2. Eh > Rh - our new node can only be a new root then (due to heap keys). What will be its left and right subtrees? Let's assume that Et < Rt. Then the right subtree of our new node will be the former root (it can't be placed nowhere else due to heap-property), and the right subtree of the old root will remain there.

Now, we're left with two dangling pointers (see picture B) and a subtree to split between them. But we can note that if we inserted the new node to that dangling subtree (picture C), it would be the root and its left and right children would fit the dangling pointers amazingly (picture D)! So to solve the original problem we need to solve it on a smaller tree... Smells recursive!

This leads to a tiny split procedure that, given a treap and a Kt tree-key, splits the treap into two treaps that can be left and right subtrees of a node with tree-key Kt in a binary search tree.

If you want to delete a node from tree, you'll need merge operation that makes one treap out of something that were left and right subtrees of a node we've just erased. This merge is the opposite to split and is as simple to implement. Red-black tree deletions require twice more code than insertions.

### Implementation

The implementation of insertion stuff is here. Despite I haven't participated in contests for several years, I wrote it in 30 minutes from scratch (and benchmarking code took some additional time) and I didn't even have to debug the code. Can you repeat the same for a red-black tree insertions?

Unfortunately, it's hard to tie this small implementation into OOP concept, but in programming contests it was rarely needed.

If we run the script, we'll get fairly good height result on ascending sequence (on which unbalanced trees fail badly):

```pavel@lonely ~/social/blog/treap \$ perl -e 'for (1..10_000_000) { print "\$_ ";} print "\n"' >test_file
pavel@lonely ~/social/blog/treap \$ g++ --version | head -n 1
g++ (Gentoo 4.4.2 p1.0) 4.4.2
pavel@lonely ~/social/blog/treap \$ g++ treap.cpp -O2 -o treap
pavel@lonely ~/social/blog/treap \$ g++ treap.cpp -O2 -o tree_STL -DSTTREE
pavel@lonely ~/social/blog/treap \$ time ./treap &lt;test_file ; time ./treap &lt;test_file ; time ./tree_STL &lt;test_file
Height is 58 for RB it is 30.8499
real    0m6.545s
user    0m6.116s
sys     0m0.176s
Height is 56 for RB it is 30.8499
real    0m6.069s
user    0m5.908s
sys     0m0.160s
Height not supported :(

real    0m7.667s
user    0m7.472s
sys     0m0.188s
```

While height for RB-tree is about 30, height of a treap is about 50-60. You can run a series of experiments on your own and see it.

### Why is it balanced?

The estimation of a height of a treap is O(log N).

The means, due to which the treap balances itself as a search tree, are Complex Math and are described in the articles linked at its Wikipedia page.

In common words, for a tree to be non-balanced not only search keys should be inserted in mostly ascending/descending order, but also the randomly generated numbers should also be arranged in the relevant order. This is unlikely. And if a heap_key is "out of sync" with unbalancing pattern, it rotates the subtree, for which it becomes a new root, so it becomes capable to absorb more "unbalancing" nodes without height increase.

### Acknowledgements

Treaps were created by Cecilia R. Aragon and Raimund Seidel back in 1989.