Shown here are only posts related to algorithms. You can view all posts here.

Parallel merge sort

We live in a loosely connected world. There's an increasing demand on middlemen, which connect different entities together.

This seemingly unrelated introduction is important for this post in two different ways:

  • Parallelization of tasks to different processors also grows in demand, as it's like creating stuff that doesn't need middlemen;
  • If there were not for social networks and forums, you probably would never know about interesting things created at the other end of the world.

So, back to the topic. I considered merge sorting as not parallelizeable well, with the same runtime estimate. However, several months ago, having disclosed this ungrounded opinion, a dweller of a message board of my university disproved it. That post linked an article, which describes an algorithm of a parallel merge sort, although is not the original source of it. It contains a brief, practical explanation, and I liked the algorithm so much, that I wanted to retell it here.

What concurrency we're interested in

Note that I'll describe not just a multithreaded version of the algorithm, challenge of which is a clever use of locking. The problem described here is to deploy merge sorting into the system with distributed memory.

All intensive calculations in this framework should be performed locally. However, each processor has a limited amount of on-site random-access memory. So the challenge is to split the processing into small chunks, which will be sorted and merged locally (complex calculations), and will be just flushed byte-by-byte to external memory at the beginning and at the end of processing.

Why merge sorting?

The thing is that merge sorting is actually an external sort. It means that it can handle amounts of data larger than size of RAM you have the random access to. This should make us think that it's parallelizeable, but it nevertheless doesn't directly lead to an obvious way to parallelize it.

The usual merge sorting looks like this, N being equal to 2:

  1. Split the sequence to sort into N (nearly) equal chunks;
  2. Sort each chunk (recursively);
  3. Merge all chunks (at once of pair-by-pair). The challenge is to parallelize this step.

Each chunk out of these N could be processed individually and independently, and this is a huge achievement. But a challenge remains: how to parallelize the merging?

Parallelized merging: idea

So, how to parallelize merging? First, we assume that one processor can store 2t cells in its own RAM, so we're restricted to merging only not greater than 2t cells at once.

Assume we have two large sorted arrays, A and B, which, merged, or even individually, don't fit into RAM available to one processor. This means that we should parallelize it: spit the sequence into small chunks that can be merged locally.

Let's try the obvious way: divide each array into chunks of length t and try to merge them sequentially. We can merge two first chunks into the beginning of the resultant array easily. But what if the elements of the 2nd chunk should be merged inside the very same merged array?

This array of length of 2t already doesn't fit in the memory of one processor, so we cant merge it with the second chunk in the straightforward way.

So what we really want is to divide the arrays into mergable chunks. For example, what chunk in the corresponding B array should be A's sub-array (ai,aj) of length t merged with?

Obviously, the corresponding chunk (bI,bJ) should be such that all bs with index less than k are less than ai, and the similar holds for aj.

However, the size of (bI,bJ) chunk could be greater than t, so we couldn't fit it into RAM to merge with the other chunk. Here, let's highlight some of the items that split B into chunks of length of t, that are inside the segment (bI,bJ):

These new elements split B's chunk into smaller pieces, each of which is less than t. Let's find the corresponding elements in A array for them:

Now all chunks shown here have length less than t, and each corresponding pair could be merged.

But look what we've done here. The order, in which ai, aj, aQ and aP are arranged, is determined by merging them—or by merging [ai,aj] and [bq,bp] arrays. So if we merge these t-distant elements, we will determine the order of "splitters" that divide both arrays into mergable, ordered chunks.

Parallelized merging: the algorithm

The thoughts in the previous section lead to the merging algorithm. First, divide both arrays into (nearly) equal chunks of size t.

Then merge the two arrays that selected elements form. Use the same merging algorithm recursively. The result of this would be an order, in which the splitter elements should be arranged (marked with a red line):

This performed merging allows us to determine the exact chunk of length t, into which should fit each splitting element from the other array, by simple calculations. (We should "switch" in what array we search for the counterpart at the same time as the red line switches sides.) Then you should use binary search to determine the actual place of it in the chunk, this way we'll generate twice more splitters for each array:

By construction, these arrows will not cross. Then, the pairs of chunks (the chunks to merge are depicted as sections colored in an alternating manner) will lay sequentially in the merged array. Merge each pair of chunks locally, and place the result into the larger array:

Each chunk to be merged consists of two segments of length less than t, which makes it possible to merge chunks locally on one processor. This completes the algorithm.

Parallelized merge: runtime estimation

Merging two arrays of n elements requires merging two arrays of n/t elements, performing n/t binary searches in arrays of length t, and, finally, merging the chunks we divided our arrays into, which takes O(N) operations. This leads to equation:

T(n) ≥ T(n/t) + n/t*O(log t) + n

I'm too lazy to solve it, but T(n)=O(n) fits it (we consider t as a constant). This is the same estimate as in a usual merge, so parallel merge sorting is O(n⋅log n) as well.

You may note also note that parallel merge algorithm requires familiar pattern of divide-and-conquer iterations. And it's one of the neat facts about this algorithm: merging the chunks we sorted requires the very same processors that sorted the chunks!

***

This is an algorithm I was looking forward to talk about for a long time. I'm glad that I finally did this, and I hope that you didn't know it, and have spent the time learning new fun facts.

Read on | Comments (0) | Make a comment >>


"NP-complete!" as a lame excuse

Some time ago I bumped into a usual Stackoverflow question. A guy asked for a C# algorithm that could pick elements from array so that their sum is equal to a specified number. A usual NP-complete knapsack problem. But the answer made me think about an interesting matter. Let me screenshot the answer completely:

At a first glance, the answer contains everything an ideal answer should contain: a correct information, a certain bit of succinctness, and, a key to success, an XKCD comic. No wonder it was so highly upvoted.

However it was totally unhelpful.

Complexity classes

In Computer Science a concept of complexity class is used to define a class of problems, for which there exists an algorithm that runs on specified kind of abstract computing machine and uses specified amount of machine-specific resource. For example, famous NP class is defined as "set of problems that can be (a) solved on non-deterministic Turing machine (b) with use of polynomial number of steps with respect to the length of input". P class is the same but its abstract machine is a deterministic Turing one. The famous question of whether each NP problem is also P is still open.

There is a lot of more tricky classes (PSPACE, for example, requires polynomial "memory"--maximal length of line of a Turing machine), which can even be parametrized (PCP(n,m) (probabilistically checkable proof), for example). The relationship between various classes is being studied, and aids the relevant research; here's a picture with some of them:

complexity classes relationship diagram

(pic from «Эффективные алгоритмы и сложность вычислений» book by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license))

It clearly shows that the knowledge about complexity classes made its way into the minds of casual programmers. But instead of a challenge, it became a lame excuse.

What a casual programmer thinks when he encounters an NP-hard problem is, "Oh, it's NP-hard, so no known polynomial solution exists!". Some programmers even try to quickly make an algorithm that might solve something in specific cases, which they don't even realize. While what should be thought of is, "Oh, it's NP-hard, so no known polynomial solution exists! Let me derive constraints, and search for an approximate solution that fits it best!" Complexity theory statements and theorems should aid the solution, not substitute it!

Funny approximations for NP-hard problems

Honestly, I was among the programmers of the first type, as defined above. I used to being proud that I know that much of maths and was glad that I can blow the hopes of solving their problem. However a year ago I've taken an advanced complexity course (by guys from one of research groups in ISP RAS -- "group of Discrete Optimization Algorithms at Department of Mathematical Methods" (in Rus)) that actually revealed to me several amazing facts. Some complex NP-complete problems appear to have good approximations! Let me list several of them.

  • Edge-dominating set (wiki) -- assume we want to penetrate our adversary's network (which is an undirected graph). We should pick several channels (edges) to insert a virus into, and the hacking software will capture all information that passes through any of each channel's ends. We should cover the whole network with such "hacked" channels, so that each unhacked one is plugged into a node with a hacked channel. But at the same time, we should do it with minimum cost, i.e. minimum number of channels to hack. See figure A at the side for examples.

    Figure A

    Examples of edge-dominating sets from wiki page. Red edges are infiltrated channels. You may see that each non-hacked edge is touched by at least one hacked edge at joint router.

    The task of picking such a minimum set is NP-hard. Brute-force? No, there is a better idea. We don't necessarily need minimum (although it would be nice), but instead we can hack some extra channels--the budget is not fixed anyway. We could use some clever algorithm to pick such edges, but instead... let's just hack arbitrary channels that aren't adjacent to already hacked ones!

    What amazed me is that such a "solution"... will never exceed the real minimum number multiplied by two! "That's still too much", might the Greedy Manager have said, and he would be right. But the thing is that we even didn't try to solve anything: the dumbest algorithm yields a good, constant-factor approximation for a problem with yet unknown efficient exact solution. Hell, it's even easier to implement than a brute-force!

  • Knapsack problem (wiki) -- assume a burglar infiltrated a house. Unfortunately, he has a relatively small knapsack, into which he can't put everything of value. Possessing a discriminating eye, he can immediately determine how much the fence will pay for a thing. His aim is to maximize the cost of the stuff carried away, the overall weight being less than his knapsack may bear. But unfortunately he's also limited on time and has to do it quick.

    Complexity theory proves that such task is NP-complete. So the burglar might pack every combination of items into his sack until he finds out the best. He would have been caught if he did that, because the owner would have returned by the time he completes the enumeration. So he chooses another solution. He uses a "greedy algorithm" and pick first a thing with best price/weight ratio, among those that still fit his sack.

    A fool! He didn't know that it was the house of computer scientist, who knew that thieves are dumb and greedy, and who intentionally bought a piece of jewelery with best price/weight ratio in the house, but that piece doesn't allow any other thing to be put into the sack due to overweight. For arbitrary N, this trick can make the burglar take away N times less stuff than he could at most!

    But a simple adjustment to its algorithm can be made to limit the possible loss to "at least 1/2 of maximum value". The burglar should do the same, but at the end compare the value of picked best price/weight items with the value of the most expensive item he can carry away. If it turns out to yield more profit take this thing and run away.

    This is an example that a small hack can guarantee a good approximation.

  • Longest path problem (wiki) -- assume we want to make our competitor lose money, and thus we try to delay shipment of a valuable package. We can make a janitor agent into the dispatcher's room, so he can see the map of the area (an undirected graph) and use the special phone to guide the courier by a longest path between source and destination (without cycles, of course, since the courier will suspect something and report it). He must do it quickly, but, unfortunately, picking such a longest path is an NP-hard problem.

    We've seen approximations by constant factors before, but this task, unfortunately, can't have an approximating polynomial algorithm for each constant factor. Of course, that's true unless P = NP. But the fact is that its polynomial inapproximability by constant factor is proven unless P = NP. So we'll have to teach the janitor to yield worse approximations if we want him to succeed. And conclude that sometimes NP-hard problems just do not have simple hacks that solve them well.

There are more facts that concern randomized algorithms, which have a certain probability to hit a good solution; facts about more tricky, non-constant approximations, but they're far less fun and are more like boring math crunching, so I won't mention it here.

Conclusion

Albeit whether P=NP is still an open question, complexity theory contains many valuable achievements on solving everyday problems.

Do not make complexity theory just a source of lame excuses of not solving the problem or writing brute-force algorithms and garage-made heuristics that can guarantee noting. Some programmers call it all "clever cheats", but many hand-made "cheats" are of no help in approximations (as shown above with the first, most obvious "greedy" algorithm for the burglar). Instead of cheating, do not hesitate to perform a quick googling or consult experts in this area if you encounter an NP-complete/hard problem--a fast solutions with proven and limited weakness, and suitable for your specific task may exist, and may have been studied well.

Or, if you're curious enough, and possess certain bit of math education, you can study it on your own. English-speaking readers may check online pages of their favorite universities for book recommendations and, maybe, online lectures. I can't recommend anything in English since I haven't read such books. To a Russian-speaking reader I can recommend the book «Эффективные алгоритмы и сложность вычислений» by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license), which we used during our studies at the university.

Read on | Comments (3) | Make a comment >>


Treap: simple balanced search tree

Balanced trees and programming contests


This is an example of treap: letters form an alphabetically-sorted search tree, while numbers are arranged as a heap.
Pic from Wikipedia

Picture A. We're going to insert yellow note into tree with blue root.

Picture B. We arranged some nodes but we have to split the subtree of the maroon node into two subtrees to fit dangling red pointers .

Picture C. If we could solve the task of insertion, we could have split the subtree of red node as if out new yellow node was inserted there instead of to a subtree of a blue node.

Picture D. After we've splitted the subtree of the red node, we can now assign to dangling pointers.

A balanced binary search tree is a binary search tree whose height is not big, around O(log N). This value is crucial, because search operation, the one these trees are named after, takes O(tree_height) time.

In a typical programming contest you're given 5 hours to write several programs, each of which, aside passing all jury's tests, meets specified time limit. One of keys to success is (or, at least, was) the ability to quickly code fast standard algorithms and data structures: disjoint sets, heaps, graphs, etc. And due to limited duration of competition not only should the algorithms be fast, but also easy to write in bug-free mode under an emotional pressure.

When it comes to the question how to implement a balanced binary search tree, most of programmers will outright shout "red-black tree!" (wiki). Indeed, it is balanced and is implemented in standard libraries of many languages, so it may seem a good choice (and even better as an advice) unless... Unless you're to code is yourself. It's implementation is way too complex, while the benefits of guaranteed indeterministic 2*log(N+1) height are sometimes not utilized.

A Treap in my mailbox

Back to the times when I was participating in those contests, I got a mail from one of our trainers. Within was a description of a nice data structure that I later discovered to be called treap. And now I'd like to tell about it.

A treap is a conjunction of words "tree" and "heap" (or "tree" and "heap", for left-recursive readers). No wonder: it comprises properties of both tree and a heap. Or, more strict, it is a tree and a heap at the same time. Each node, aside from payload data, contains two keys; if we look at one key we will see a heap and if we look at another key we'll see a binary search tree.

Our main goal is nonetheless to build a balanced binary tree out of sequence of N key-value pairs that may come as an input in an unknown order. When you insert a node to a treap, you should pick up a uniformly distributed random number and use it as a heap-key. Of course, at the same time you should maintain search tree property for the tree-key.

Not as complex

Hell, the insertion procedure will be complex, you think. The trick is that restrictions usually help rather than prevent. So, assume we need to insert an element [Eh,Et] (that's heap and tree keys) to a (sub)tree with a given root [Rh,Rt].

  1. Eh < Rh - since we maintain heap, our new node should be added to one of the root's subtrees. To left or to right is decided based on tree-keys Rt <=> Et comparison, since we maintain tree as well. To insert element to a proper subtree we call the same insertion procedure recursively.
  2. Eh > Rh - our new node can only be a new root then (due to heap keys). What will be its left and right subtrees? Let's assume that Et < Rt. Then the right subtree of our new node will be the former root (it can't be placed nowhere else due to heap-property), and the right subtree of the old root will remain there.

    Now, we're left with two dangling pointers (see picture B) and a subtree to split between them. But we can note that if we inserted the new node to that dangling subtree (picture C), it would be the root and its left and right children would fit the dangling pointers amazingly (picture D)! So to solve the original problem we need to solve it on a smaller tree... Smells recursive!

    This leads to a tiny split procedure that, given a treap and a Kt tree-key, splits the treap into two treaps that can be left and right subtrees of a node with tree-key Kt in a binary search tree.

If you want to delete a node from tree, you'll need merge operation that makes one treap out of something that were left and right subtrees of a node we've just erased. This merge is the opposite to split and is as simple to implement. Red-black tree deletions require twice more code than insertions.

Implementation

The implementation of insertion stuff is here. Despite I haven't participated in contests for several years, I wrote it in 30 minutes from scratch (and benchmarking code took some additional time) and I didn't even have to debug the code. Can you repeat the same for a red-black tree insertions?

Unfortunately, it's hard to tie this small implementation into OOP concept, but in programming contests it was rarely needed.

If we run the script, we'll get fairly good height result on ascending sequence (on which unbalanced trees fail badly):

pavel@lonely ~/social/blog/treap $ perl -e 'for (1..10_000_000) { print "$_ ";} print "\n"' >test_file
pavel@lonely ~/social/blog/treap $ g++ --version | head -n 1
g++ (Gentoo 4.4.2 p1.0) 4.4.2
pavel@lonely ~/social/blog/treap $ g++ treap.cpp -O2 -o treap
pavel@lonely ~/social/blog/treap $ g++ treap.cpp -O2 -o tree_STL -DSTTREE
pavel@lonely ~/social/blog/treap $ time ./treap &lt;test_file ; time ./treap &lt;test_file ; time ./tree_STL &lt;test_file
Height is 58 for RB it is 30.8499
real    0m6.545s
user    0m6.116s
sys     0m0.176s
Height is 56 for RB it is 30.8499
real    0m6.069s
user    0m5.908s
sys     0m0.160s
Height not supported :(

real    0m7.667s
user    0m7.472s
sys     0m0.188s

While height for RB-tree is about 30, height of a treap is about 50-60. You can run a series of experiments on your own and see it.

Why is it balanced?

The estimation of a height of a treap is O(log N).

The means, due to which the treap balances itself as a search tree, are Complex Math and are described in the articles linked at its Wikipedia page.

In common words, for a tree to be non-balanced not only search keys should be inserted in mostly ascending/descending order, but also the randomly generated numbers should also be arranged in the relevant order. This is unlikely. And if a heap_key is "out of sync" with unbalancing pattern, it rotates the subtree, for which it becomes a new root, so it becomes capable to absorb more "unbalancing" nodes without height increase.

Acknowledgements

Treaps were created by Cecilia R. Aragon and Raimund Seidel back in 1989.

Read on | Comments (0) | Make a comment >>


More posts about algorithms >>