A Dying Tux on an Airbus

September 24, 2012 929 word 5 min read

So I am on a trans-Atlantic flight onboard an Airbus A330. It is a pretty large plane; each seat is accompanied with a computer referred to as "the Entertainment System" (big deal at the time). The system offered games, movies, inflight information (the plane location plotted onto the world map), and flight progress details, which is kind of reassuring as the flight lasts for 10+ hours.) A "killer feature" of this system was its USB port to charge your mobile phone (or to type this story, for instance.) Times were definitely different.

Not only did I learn that day what operating system that computer ran, but I also found out yet another example of a machine demonstrating such inherently human feelings as courage and hope.

Nothing prompted this at first. I casually opened inflight information and promptly fell asleep. I woke up; there was a couple of movies worth watching, so I tried to launch them. It didn't work.

"I am a programmer, aren't I," I thought, and started to click all the buttons on the remote control with purpose and determination. None of that worked. I tried to click the on-screen buttons; they worked properly, while their remote-control counterparts did not. I concluded that the driver for the remote died somehow, and asked the stewardess to reboot my terminal.

As it started to reboot, I noticed a lot of familiar words. It used a bootloader named RedBoot copyrighted by RedHat, and it obviously ran Linux kernel 1.59 or something.

However, right after "decompressing image," it showed, "decompression error: incorrect data check," and rebooted again. The system did not boot far enough to start the GUI.

It also displayed a reboot counter. After twenty five reboots something should have happened, and I didn't know what. Curiosity took me over, and I stared at the screen, amused by that I know what the words on the screen meant.

After 25 reboots, the computer realized that there was something wrong with it, and ran the firmware update program. It did not help. He entered the same circle of pain again...and again.

As I watched the guy reboot over and over, I noticed that my curiosity is being replaced by different feelings. I felt sympathy and compassion for that guy. It kept trying and trying, and it kept failing and failing--as if it didn't even know what would happen next.

There are theories that as computer programs start to gain cognition, they would become acutely aware of their own limitations. "I envy you humans for your unknowable boundaries," says a reddit bot that automatically handles pictures posted to reddit in "his" answer during the open interview (aka "AMA"). I don't know if these theories will have anything to do with reality; but they do not sound incoherent. And this little computer clearly had no awareness of his incapability to boot, and it kept crucifying himself.

Compassion was replace by sorrow. Having watched a hundred reboots and firmware updates, I lost all hope. "Why would we keep him alive?" I thought. "He'll just keep doing this being unaware that he's broken. Why can't we just shut him down and end the series of reboots? Do we humans even realize how painful each reboot could be," I thought, "maybe he wants to die already?" I tried to recall what people do when euthanasia is not an option. Morphine? Even morphine can't relieve a computer's pain—though, none is allowed on an aircraft anyway.

I asked the flight attendant to shut it down at my seat to ease his pain. "Oh, unfortunately it's impossible," she replied. "but we can move you to another place if you want though: we have some extra chairs available."

I suddenly realized I can't leave him. To the other passengers and crew, it was just a malfunctioning computer to be rebooted, replaced, or ignored. To me, however, it was a dying Tux who fought for his survival against its own limitations. I recalled how friends and relatives stay at the bed of a dying loved one, and imagined how they'll soon gather near the charger of a family robot who's about to leave for his silicon heaven after a long, deadly malfunctioning.

I sat by his side for the rest of the flight.

And he finally succeeded! After six hours of effortless reboots, his RAM was no longer corrupt; his OS landed a working kernel and booted up! The usual routine went on, the first-boot symlinks had been set up, the X11 system started, showing its tapestry with the well-known black-and-white embroidery. I could launch one of the movies I wanted several hours ago, but I gave the Tux some rest instead. He deserved it.

So I experienced this enlightening encounter with a machine who fought for his life, never lost hope, ignored what was believed to be his limitations, and did finally succeed in this struggle. Not only did he demonstrate the determination only humans seemed to possess: the dying Tux also happened to have more faith in his own victory than the human beside him.

Comments imported from the old website

http://vasyafon.livejournal.com/ on 26 September 2012 commented:

Oh my god! You hacked the airplane!!!!

Pavel Shved on 10 October 2012 commented:

Oh, I wish I could hack him to perform the surgery necessary to bring him back to life. I tried some shortcuts Like holding Ctrl+C when booting, but then I realized that it was just silly. It was as fruitless as applying acupuncture to a patient in coma thinking that this could bring him back to normal.

A Tie for a Python Programmer

September 10, 2012 869 word 5 min read

A couple of weeks ago I started a new project in ROSA. The project is a dynamic system that tracks health of our Linux distribution repository in real-time by watching the builds and repository status, and running external analyzers. This is a relatively small system that should end up with less than 50k LoC, so I could choose any language I wanted. I would choose Ruby, of course, but this time I had to raise the significance of social issues, and chose Python. I never wrote a single Python program before, but all my peers didn't want to learn Ruby, so I took this chance to learn a new language.

Python logo

This is the logo of Python the programming language. Definitely, these aren't snakes, sure.

Python is named after Monty Python's flying Circus, as per wikipedia entry rather than after a non-venomous snake family. Snakes, however, are featured at the Python's logo, and Monty Pythons are not well known in Russia, so it's easier to associate snakes with Python language when it comes to computing.

To aid me in learning a new language, and to make it more fun, I needed an artifact that helps me. Due to my unhealthy devotion to shirts/ties, I thought of buying a tie with snakes painted on them. I didn't find one although I searched all shops in my district, including those that sell clothes for kids. I couldn't even find a tie made out of snake skin; while they're referenced in literature, they're either mythological or require the social connections I don't have to obtain one.

It wasn't until my colleague linked me this website when I found the way to make a snake out of my tire. Indeed, if I can't buy one, why can't I make one?

Ingredients

I started with a black tie I no longer liked. Time to revive it by turning it into a programmer-friendly snake! Here's a step-by-step guide, inspired by the one linked above. You'll need

50 grams of cotton wool;
a piece of red cloth (polyethylene should go as well)
a sheet of A4 paper to experiment with the eyes, a pen to paint the pupils; and
something to attach the two latter things to tie (glue, pins, and/or sticky tape, see below).

Making the Snake Tie

Start with tying your favorite knot, and marking the place where it ends. Starting from that place and below, the two-dimensional tie will turn into a three-dimensional snake. Open the tie up making sure it won't fall apart (I had to re-knot the threads that were running through the yet-to-become Python manually—I didn't have any GIL to assist me!) I had to also open the bottom end of the tie, because it was detached from the upper part:

Start filling the tie with the cotton wool. To make the wool through the tie, you might need something long and stiff, I used a toothbrush.

As you fill the tie, it may no longer keep the original shape. I used a pin to strengthen the lower end. To make the pin completely hidden, stick it into the inner half outwards, and pin the outer one onto it, hiding the pin itself in the fold.

Do not fill it all the way to the knot point! Tie the knot first. Here's the Windsor knot I tied; look how you still have the access to the hole you made at the very beginning. Stuff the tie with more cotton wool.

Now you can attach a tongue. A nice red split tongue is what makes your former tie a mighty snake rather than a boring worm. Cut it from the red cloth you prepared. I had to cut it several times before I managed to make a tongue good enough. Attach it with a small pin, sticking it as close as possible to the tip of the tie. The pin will be concealed when you look from the front side.

Now make eyes for your snake. I just painted with a pen on small pieces of paper. This is an easy part; you may experiment as much as you like; I found that the snake looks most crazy if the eyes are placed close to each other. I also felt that the color of the pupils should fit the color of the tie itself, so I used black.

After you found the proper shape and place for the eyes, you should attach them. Sticking a nail into your future friend's eye is disgusting (and the result wouldn't look real enough), so you'd rather use glue. I didn't have any glue, so I used one-sided sticky tape a body patch was made of, and rolled it up to make it "two-sided".

Your new friend is ready. Make sure it stays away from mice!

If you're brave enough, you may even go to work wearing it.

I hope I won't get fired because of that. This would be programming language discrimination anyway!

***

So that's how you—or, more likely, your kid—can assemble a Python tie from the old useless things you find at home. I hope that programming Python itself will be different than putting the same old stuff together—and more fun.

How Static Verification Tools Analyze Programs

August 28, 2012 4057 word 20 min read

One of my previous posts featured a plaque we got for scoring an achievement in the Competition on Software Verification'12. I promised to tell how the tool we improved analyzed programs to find bugs in them... without actually running them.

Please, note that most of what's described here was not devised by me. It was a state of art when I first started to study the field, and my contribution was kind of tiny—as usual when newbies do their (post)graduate research.

Program Verification is Impossible

I promised to tell you how we do it, but the thing is that... this is actually impossible. Not that I'm holding a secret vigilantly protected by the evil Russian Government, but it's just mathematically impossible. It's not just "hard", as in "NP-hard", but plain impossible. A proof of impossibility of a relevant problem, the halting problem, published in the famous Alan Turing's paper is acknowledged as one of the stepping stones at the beginning of Computer Science as a math discipline. Later it was found that halting is not an anly property you can't universally verify with a computer program: in fact, any nontrivial property can't be universally verified! This is known as Rice's theorem, and was known since 1953.

The undecidebility (one more term to denote the inability to build a program that solves a problem) of program verification only concerns Turing complete languages, i.e. the languages that are as powerful as Turing machine. Some languages can be correcly verified, such as a hypothetic language with no variables and no statements other than print (always terminates), or its extension with non-folded for...to and for downto Pascal-like loops, so that halting can be solved by comparing the numbers involved.

Most interesing real-world problems involve reasoning about constructively non-trivial languages though.

Absence of a program that can verify if any given program halts or bears some nontrivial property means that for any program that claims to be such a universal verifier, there exists a counterexample, a piece of source code that will confuse the self-proclaimed verifier or make it loop forever. But the very same verifier may provide useful results for a multitude of programs that at the same time. Considerable pieces of computer science and software engineering are devoted to studying, building, and putting such verifiers into production. A lot of program analysis, which is closely related to verification, happens in compilers, so you most likely unanimously encounter the results of this research on a daily basis.

In the next sections I'll explain one of the approaches to verifying programs, to which my graduate research was closely related.

Why Go Static?

So how would you verify that a program doesn't have certain bugs (say, never dereferences a NULL pointer ever) without having it run and take potentially harmful effect on customers and equipment?

If your initial idea was to try to interpret program in such an environment that its running wouldn't affect anything, that would not work. First, it hardly counts as "verifying a program without running" as it actually runs, but in isolation. Second, and more important is that the program may involve a user input. This may be an input from keyboard, from text file, or from network. How do you verify the program behavior if the way it behaves depends on what comex from external sources?

You could emulate the behavior of these external stimuli by writing some mock-ups (say, by reimplementing fread("%d",&x) as something that make x a zero), but this is effectively resorting back to testing. The whole idea of static verification—when the program is not run—is to extend the number of possible program execution scenarios beyond the those you can cherry-pick by testing. When you see a scientific paper on static verification techniques, it will surely list the possibility of checking all the possible scenarios as a justification of the increased resource consumption compared to testing in isolated environment—well, who tests in production anyway?

Of course, white-box static verification is not the only complicated testing technique. An interesting testing technology, UniTESK has been developed in ISPRAS, where I previously worked and studied. One of its features is that it can automatically adjust the tests generated to cover all possible disjuncts in if statements of a method's postcondition. This means that not only each condition branch should be covered, but each of the operands in a disjunction that forms each condition should be true during the test execution. The disjuncts we're talking about here are those in the postcondition, the system under test itself being a black box.

And a justification it is! It's not possible to enumerate all the possible inputs and stimuli, because the number of their combination is infinite even for simple programs. And if you omit some input pattern, who can guarantee that something wrong doesn't happen exactly when users behave like this? You surely know a lot of cases where seemingly "dumb" users unveil behavior no sane person could ever predict when writing tests to software! In fact, any black-box test generation technique is inherently incomprehensive, and can't cover all cases possible. Static analysis "opens" the black box, sees and works with what is inside it, and it is more complicated than writing tests and measuring coverage.

Another case that is better left to automated comprehensive tests is security-related operations. Security-related bugs, such as buffer overflows, are hard to test. Ample testers for such bugs are called "hackers", and they are rare, expensive, and still are not reliable, and may miss something. One would like to ensure that no matter how malicious or damaged input from the network is, no matter what mischief the user clicking on the screen has in their mind, the program doesn't crash or expose certain security breach patters such as the aforementioned buffer overflow. Static verification is used in some very big companies, for instance, to make sure that a specific security bug occurs nowhere else in the system, so that the updated system would only laugh at the inspiration the descriptions of the problem might have caused.

Thus, static verification that reads the source code and may prove that the system is not exposed to certain kind of bugs and security violation patterns is also useful. Static verification is the only way to vouch for infinite number of inputs that they do not expose any bugs in the program. It allows us to consider all possible inputs at the same time as if a randomly-spanking monkey is operating the console rather than a sentient human being.

But, more on that later. It's still not clear how the multitude of possible patterns of security breaches can be studied generically. Do we have to study pointer dereferences, buffer overflows, memory allocation errors etc separately? Some tools try to, but there are fundamental approaches that address this problem of excessive diversity.

Reduction to Reachability

Reachability problem can be formulated like this. Given a program "state," find out if this "state" is reachable for some program execution. If program allows external input, this involves checking if there exists an input such that a program would reach the target state upon execution. We'll explain what the "state" is later.

This problem is as hard as the halting problem because the latter may be reduced to the former. If we know how to solve reachability, we know how to solve halting by checking if the exit state is reachable. It's not just very hard, as an NP-complete problem, it's just impossible to solve in a generic manner for all programs, as pointed out at the beginning of the post.

Reachability is important because a number of programming problems and bugs may be exposed as reaching "error state", a specific program line that gets executed when something undesirable happens. This could be a code that's responsible for printing "Segmentation fault", and checking reachability of this would give us a notion whether NULL pointer is dereferenced anywhere.

As this code is OS-specific, the programs being checked often are prepared to verification by inserting this code explicitely. So the code a = r/p, for instance, would decome if (p == 0) ERROR(); x = r/p;, where ERROR() terminates a program with error. Well, it doesn't actually have to terminate anything because we're not going to run this program as the verification is "static". Its purpose is to mark a specific place in program as undesirable. The process is sometimes called instrumentation. A special tool performs this operation; we developed a GCC-4.6-based one in ISPRAS during our Linux Driver Verification project.

Modeling Nondeterminism

So, how do we model the fread("%d") function? Let's get back to the original goal, which is to check if a program is free from runtime crashes and vulnerabilities. Thus, we can't assume anything on the user input, because a hacker may be operating the keyboard trying to enter a number specifically. What can we do then?

Static analysis techniques allow us to assume that the user just enters something—and soundly act upon that. The integer read may bear any value, literally! How does it play out then? The thing is that the integer entered is read once, and its value doesn't change arbitrarily after this, unless directly assigned to it, or to a pointer to its memory location. Then, such value is transferred to other variables, which make conditions over these variables either true or false, which affect how subsequent conditions are evaluated, and what code is executed. The ties that exist between values during the program execution are exposed, and it becomes possible to rule out some code as unreachable.

Note that if all ERROR() functions share the body, we may treat the body itself as an error location. This means, that any number of "error" locations is equivalent to one "error" location, so we sometimes go plural or singular for the sake of a better narrative.

We're lucky if we detect that all the ERROR() function calls are unreachable, which means that there's no bug. In other words, while we don't know the value of x, we know that any concrete value of x predetermines how the program executes from now on, until one more nondetermined variable is read from somewhere. Let's see on a simple example.

This uses a more simple model of reading an integer (to avoid discussion on pointer aliasing), which is read_int() function that returns an integer the user has entered. So, let's assume that it may return any integer. Can we still prove that a division by zero never happens, and the program never crashes at that statement? Sure. Whatever user returns, we can't get past the while loop until x is greater than zero. We may loop forever, or exit after several iterations, but if we're out of the loop, the x is surely equal to one or greater, i.e. x >= 1

Now, the crash happens if x == 0. But x == 0 and x >= 1 can't both be true. Moreover, whether it can be true or not, can be checked automatically with specialized software that is largely presented both in open-source and in proprietary forms. This demonstrates that we don't need to know exact value of each variable to determine if a program reaches an error location.

So, to model all possible ways the program works (including those ways that depend on the external, unknown input,) and to determine if any of these ways has a bug (i.e. ends in an "error" or a "crash" location) is the purpose of reachability solver's work. Let's see how is it possible.

Feasibility of Program Paths

The "equation" we used in the previous section to demonstrate how we could model nondeterminism is actually a powerful technique. It is generic enough to work in both nondeterministic and deterministic cases, and no distinction between them is even required! Each program where a certain line is marked as "error location" contains a number of "paths", sequences of statements that end in an error location, but for which it is not yet known if the program could actually follow this path. An example of such path is the above assignment x to zero, and having a check if it's greater than one passed. We've already shown that it's not feasible if we look closer at he path equation rather than just at program structure.

This usually requires some conversion. For instance, if we convert a fairly common assignment i=i+1 to an equation component i==i+1, it would render any path infeasible, since this can never be true for real numbers. In such cases, i and all its further occurrences are is renamed after each assignment (this is called Static Single Assignment (SSA) form).

Checking such equations requires using of advanced software such as SAT-solvers, theorem provers or customized built-in add-ons or reimplementations of them. These tasks are NP-complete problems, so the size of programs being verified matters. Modern software doesn't struggle with such "path equations" for programs as large as thousands to hundreds thousands of lines of code. Program size is not the only factor though. The size also depends on how precisely the formula describes the state of the variables and of the memory as a whole. For instance, pointer alias analysis makes the equation bigger for a given program.

Analyzing Program as a Whole

So far we've examined how tools like BLAST reason about individual paths. How does it turn into the whole-program analysis? The problem here is that, in any program that contains a loop, there may be an infinite amount of paths to analyze. Analyzing them one-by-one will never lead us to final solution provided we only check each single path. We need to somehow analyze paths in large quantities.

One of the possible patterns to perform such analysis is to split all paths into classes, such that we need to analyze one path, and somehow understand that the result of this analysis applies to a whole class of other paths, potentially infinite. This approach is applicable to many different tasks, and is not limited to the problem we are discussing.

At this point we notice that, wherever the program counter is, the behavior of program is completely predetermined by the values of all variables given the subsequent user input is also predetermined. For example, assume that at line, say, 15, in the middle of the loop, the values of all variables, and of the all readable memory cells are the same on the 2nd and the 3rd iterations of the loop. Immediately follows that the paths that "start" at each of these locations (given the equivalence of the memory) have a one-to-one match between these two "classes". Getting back to the original problem, if we prove that no path that only iterates two times has a bug on it, then the paths where the loop iterates three times or more also don't have bugs on them. This is how we can decrease the amounts of paths to analyze from infinity to several.

However, the loops where memory states are equal on subsequent iterations are rare. Such loops are either infinite (which is good for our analysis, since we don't encounter a crash if we loop forever doing nothing :-), or contain of user input only. The loop in the division by zero program does satisfy the condition, but what if it looked like this:

This way, attempt counter on each loop iteration is different. How do we tackle this one?

From State of the Whole Memory to Regions

While the memory state of the program in question is different at each loop iteration, it really does not matter. Indeed, whether the x equals zero does not depend on the attempt number. If user enters zero on second attempt, he or she could have entered it on the very first attempt, and our analysis algorithm accounts for that, as shown above in the "Modeling Nondeterminism" section. So all paths with two or more loop iterations are equivalent to the path with one iteration only--if we consider only the relevant memory cells that really affect whether the bug appears. How do we detect these memory cells?

Let's get back to the analysis of a single path. The path that only contains one loop iteration is converted to this equation:

attempt_old == 1, and
attempt = attempt_old + 1, and
x >= 1, and
x == 0

The SAT solver analyzes this, and says that "this equation has no solution." But what if we ask "why?" Why does this equation have no solution? "Well," we could answer, "because x can't be no less than one and equal to zero at the same time!" What is important in this answer is that it never mentions attempt variable. So we may discard this variable from the list of "important" variables, and compare paths with 1 and more loop iterations based on the value of x only, and from this point of view they are all equivalent.

Of course, if the value of attempt is later discovered as relevant (i.e. someone tries to divide by it), the equivalence of paths may be reconsidered. But sooner or later, (if we're lucky: remember that the problem is not solvable universally!) we run out of relevant variables.

Note that we eluded the question of how we realize that x is relevant, and attempt is not automatically, without asking any human (that's the whole point of automated large-scale program analysis). First, we can try to ask the SAT solver. Such solvers do not just run on magic, they execute established algorithms, that can sometimes give us notions why the answer is such and such. Second, we can try to play with the equation a bit by trying to remove the parts related to attempt variable and checking if the equation is still not solvable. Third, we can use the mathematical concept that formalizes the "why?" question, such as "Craig interpolation".

In most cases, such algorithms not only tell what variables are relevant, but also give us hints about when they are relevant (say, Craig interpolation may tell us that it's enough to check if x is greater than zero rather than see its actual value. This also solves another problem: tracking exact values of variables is suboptimal. If it is important that x is greater than 0, then every single positive integer value of x is equivalent, which would not be the case if we separated paths based on the exact value of x.

Such "conditions" that constrain the memory values are composed into "memory regions", (potentially infinite) sets of memory cell values that can be used instead of the previous execution history. Indeed, at each line we need to either know the complete history, what lines have been executed by now, or know the current memory state—or at least some constraints on it. We don't need to know both to predict how the program will behave after we continue execution.

The overall algorithm

To sum up the previous sections, here's how the anti-bug program analysis algorithm works:

First, consider that memory is totally irrelevant.
Try to find some path given that any memory cell may hold any value at any time (even if variables change magically between they're written to and read from) that ends up in the ERROR location
If there's no such path, there's no bug!
If there is such a path, convert it into the path equation, and try to solve it.
If the equation has a solution, then the bug is here.
Otherwise, ask the solver or other advanced tools "why" is the equation unsolvable (more generic question is, "why is this path infeasible?").
Reevaluate the paths in the program, not blindly this time, but rather tracking how the memory regions we get on the previous step affect the equivalence of the paths (we leave other variables change magically though). Only follow one path among those equivalent to each other.
Is there any path that ends up in the ERROR location under the new circumstances? Go to 3 or 4.

The algorithm is laid out this way to provide the explanation how it works. An explanation why it is correct, as well as the most convenient form to advance the state of art, requires us to think from the last poit to the first. One should first envision a novel representation of the multitude of program paths, and then devise how to compute it efficiently.

If we're lucky, the algorithm finishes. If we're even more lucky, the algorithm finishes in a sane amount of time. Depending on the task, the "sane" ranges from five minutes to several days. In LDV, we considered it unacceptable for an individual driver to take more than 15 minutes to get verified against a single bug, but Linux kernel contained 4500 of them. In a less massive setting the time constraint can be increased; some mid 2000-s studies considered three days for exhaustive verification of 2500-LOC programs.

This algorithm pattern is called CounterExample Guided Abstraction Refinement (CEGAR). It was devised less than 15 years ago. It was based on the 45-years-ago advancement by Cousot&Cousot that were first to propose to "abstract away" from the actual variable values (what we did when we allowed variables to change magically unless specifically tracked), and iteratively "refine" the "abstraction" by tracking more and more properties. This is not new stuff, but is not old either.

We omitted certain points in tese algorithms. For instance, even the early framework algorithms, even unimplemented, envisioned tracking whether classes "fold" into each other rather than whether they're equivalent only, leaving the exact implementation behind the "lattice" abstraction of program states. Second, the way how "memory regions" are affected by statements is also a complex theory, and it's surprisingly easy to create an algorithm that piggybacks on formula equation building and SAT solving, see this paper for details if you're curious. The paper also outlines a framework approach on alias analysis and memory modeling.

Common Limitations

The common limitations of this algorithm (compared to other static analysis algorithms) is that it relies on a complex equation solving and proving tools that are only well-defined when working with numbers and not very complicated formulae. Even C programs, however, are fairly complex, and consist of complicated concepts such as arrays, pointers, structures, bit shifts, bounded integers when adding two positive integers sometimes makes the result negative. Converting them to good ol' numbers introduces complications. For instance, it's increasingly more hard to answer the "why?" question as you introduce operators beyond addition, subtraction and multiplication: bit shifts, modulo calculations, square roots make it nearly impossible. The fact that pointer may point anywhere—or, at least, to several locations—makes the number of variants of program behavior explode. And if it's a pointer to function, things get ever worse.

Still, the approach works. We found several dozens of kernel bugs in Linux Kernel device drivers—and keep in mind that we did not have the actual devices! Slowly, scientists make it less fragile, more reliable, and some day it will also commercialize to products based on different static analysis approaches, such as Klokwork and Coverity.

Meanwhile, you may try to use such research-space tools as BLAST or CPAchecker, as well as check other SV-COMP participants.

Conclusion

So this is one of the approaches to program verification. There are others, but you may always notice the same pattern in them: abstract away from the concrete program, discard everything that seems impossible, and hope that you discard all bugs and bugs only. However magical it seems, it does work. If you run Windows, your PC is actually executing millilons of lines of statically verified tools. Microsoft has been carrying a lot of really interesting and relevant research and development in this area for the past decade, as I learned at one of the conferences hosted by MS Research.

I wanted to write this article for years, but never managed to find time for this. This was fun even though I stopped active participation in static analysis research or development. I hope I outlined one of the approaches to program verifications so that a non-scientist could understand. If not, mail me or post a comment, and I'll improve the article. Don't hesitate to ask about something in comments, and I'll try to answer your questions—or at least to direct you somewhere.

Time-Based Protests

July 2, 2012 1017 word 5 min read

Sample time-based Protest

(photo by Ivan Afanasiev). The largest "Strategy 31" time-based protest I attended (January 2011). A protester features a sign with a protest logo.

A month ago I bumped into a protest manifestation in Moscow. It was one of a series of events "Different Russia" movement has been organizing since 2009. The interesting thing in this event is that it happens on 31st of any month consisting of enough days. This day was selected because the 31st clause in the Russian Constitution stated the freedom of assembly the movement believed was violated. Russian protest movement considered innovative to have a massive manifestation at a a specific date regardless of whether it was approved. For software engineers, however, doing such thing is not an epiphany of any kind. We call it "time-based releases", so I call the way of arranging such protests "time-based protests".

Time-Based Releases

Time-based release is a pattern in software engineering which have taken the open-source world's mind over the latest five years. It means that the product team promises that each new release will be completed on a preliminarily established schedule no matter what new functionality or bug fixes it is about to contain. The release schedules are then arranged to ensure that a complete release cycle with such stages as feature proposal and selection, development, testing, and distribution completes by a given date. Release cycles often span from several months to years.

Many open-source projects switched from "when it's done" to time-based release algorithms. The most notable examples are Linux distributions named Fedora and Ubuntu, which followed GNOME project. Our ROSA Desktop Linux distribution does this as well. It's usually commented that time-based releases help to improve stability, to maintain a sustainable product delivery, and to strengthen the collaboration between developers and testers and users.

The key result here, however is that time-based release model involves a trade of innovation to an established process. No matter how many new, potentially innovative features we develop, we should release at this point, over. In the aforementioned Ubuntu this leads to criticism.

Sometimes I feel that just making release schedules is enough, and it is why people use time-based release as their planning paradigm in the first place. The thing is that you can't make a time-based release without a schedule. This makes me wonder if some projects opt for time-based release model just to finally make themselves do release schedules.

Why It Should Work Then

The most important reason to choose time-based release over "when it's done" is that this makes you think less and make less decisions. How is this important?

There are theories (I heard it at a conference, but failed to find any evidence of this at all, so let's consider it a fairy tale that is to teach us something useful :-) ) that the amount of decisions a person is to make per period of time is an expendable resource. Making each decision supposedly makes our body to spend some specific amino-acid; its supply renovates with time when you rest, or have a meal, but it never hurts to save some.

So, making decisions is hard, and making as few decisions as possible is a good thing rather than bad. Any open-source project has a lot to decide upon, and making one less decision is a small relief of sort anyway. That's why agreeing on a period to make a release in lets the project save some strength for other things.

A funny story happened there at the first meeting the protest committee had managed to get approved by the police. The committee split into two groups, one of which, led by Limonov, tried to manifest in a place not approved by the police.

Their aim was to protest against a collaboration with police forces, the necessity of such collaboration these manifestations was initially to protest against. At the meeting an interesting case of oppression happened: the police lifted the leaders of the "unapproved" protest group, and forcibly moved them into the approved location, saying, "This is where you should protest [against us]."

What "Strategy 31" has become

(photo by RBC). This is a typical representation of a latest manifestation on the 31st. A lot of wandering with cameras, some police, and only few protesters.

Do you want your software project to become like this?

...And Why It May Not

However, one should never forget the ultimate mission, and never let keeping the schedule be put "in front of the cart". I feel that tying innovations into a time-based release schedule is much harder than doing the same into the release that has more freedom in picking the final date. It's not impossible, no, but is just harder. Mark Shuttleworth, the leader of Ubuntu, speaks about the same in his article on this matter.

I can't back this up these claims with statistical data about any software projects. What I did see, however, is how the energy of these "time-based protests" decayed over time. Many political commenters observed that the regularity became the only thing these protest could deliver. At first, it was enough, because the meetings were oppressed with an unusual degree of violence with no lawful reasons. After the government had relinquished the oppression directed at these particular meetings, and there even were two approved by government, the participants realized that the assemblies lost the nerve, and carried no message whatsoever anymore.

This is an important lesson for software engineers as well, especially for those that are involved in open-source projects , which consist of the volunteers just like public assemblies. Establishing a regular schedule is an important thing, but this is only the first step. If you commit to it too much, and sacrifice the nerve and the sense of the project to make it, you will en up like the time-based protests the "Different Russia" was arranging. At the manifestation a month ago, I saw dozens of police officers, a hundred of loiters who came to take a look at the meeting and photograph it on their iPads, and only 1 (one) actual protester.

A Different View on the Strategy Pattern

June 9, 2012 2309 word 11 min read

If you read a short description of a patern in a usual pattern book or in a Wikipedia article, you will most likely not realize its full power. A true realization comes when you derive the pattern from the programming tasks you are to accomplish on your own, bottom-up.

A month ago I was chatting with a peer who was about to finish his Object-Oriented Programming course. One of the patterns he mentioned was Strategy. As I didn't recognize what it was about at once, I thought that it was fairly useless. Having gotten home and refreshed my memory with this article, I realized that this heuristic, of inferring the lack of use from if I remember what it's about, led me to the right result. The pattern was about encapsulation of a family of algorithms behind an interface. The lack of sense was confirmed by a detailed, but not assuring description of how the Strategy was different from the Bridge pattern.

"Well, just a yet another useless pattern," I had thought... until I found myself writing a bunch of strategies in my script three days later.

The Strategy pattern

First, lets refresh what the strategy pattern is. This is yet another behavioral pattern that supplies the programmer with encapsulation of the behavior. Strategy encapsulates several methods with a common interface, and allows a programmer to switch between strategies seamlessly and reuse them in different environments.

This functionality is not unique. The Interface pattern, Virtual Function pattern, and even simple Function Pointers provide more or less the same. Other unimportant details are only interesting to OOP aficionados. Why bother with Strategies at all then?

There is one case, however, when thinking in terms of Strategies is not artificially induced by OOP courses, but rather is quite a natural way to think about your algorithms. This us when you play a game--or write a computer program that plays one.

Strategy games

There is a whole genre of video games named "Strategies". Though gaming is not the main focus of this post, a lot of what I describe below applies to these games as well. Sadly, in these days of video game consoles, Strategy games do not get as much traction as earlier.

"Still narrow", you might think, "I'm not a game AI programmer, so I'm not going to need this". This argument is relevant indeed, but it.misses the fact that some activities may be modelled as games. This doesn't necessarily refer to gamification, a popular concept in our times of ubiquitous crowdsourcing and Web2.0.

The games I'm talking about are closer to mathematical game concepts, when the environment of your application is modeled as "adversary" who doesn't really tend to respond to your actions in the nicest way possible. An example of that is the script I was implementing. I wanted to create a crawler for a web forum, that should download new posts as soon as possible, given that posts may be deleted by evil moderators, which should not confuse the script. I'll expand on how this was modeled as a "game" later; let's talk about strategies for now.

Let's take a less abstract example of a game to explain what the strategies are about. You a played Battleship, a game where you are to destroy ships on the adversary's board with as few moves as possible. in this game you make a move by shooting to a particular cell of the battlefield, and get a response whether you missed, or hit the ship, or destroyed it completely if it's the case. In fact, you have a "Strategy" that reacts with moves to the messages from the battlefield. (We intentionally leave the art of ship positioning aside).

Implementing a strategy for Battleship game was the very first task I was given as an intern at my first job at ISPRAS. We were tow implement a strategy for the game, and our algorithms competed with each other afterwards. They played thousands of matches, each being a comparison of who was faster to shoot out a randomly generated battlefield. I came up with two strategies: one hacked into the random number generator by srand()-ing it to a predefined value (that won all the matches), and the other was built via a series of experiments with combinations of the strategies implemented following the pattern described in this post.

The programs we wrote back then revealed a lot about our consequent professional lives. One of my peers pre-analyzed the probabilities of a cell being occupied by a ship, and built his strategy on top of this; he's finishing his Ph.D now. The algorithm of the team supervisor was the simplest, the fastest, and had the largest number of victories (naturally). And my algorithm was over-engineered, slower than the others, but flexible and efficient enough: it scored 3rd in the competition.

Most strategies look like this: Shoot the battlefield along diagonals, starting from the middle, until you encounter a hit. As soon as you do, finish th ship you hit by shooting nearby cells, and keep shooting across the diagonals.

Of course, a multitude of other, more complex strategies exist. Each of the strategies, however, fits the common interface: given a result of the previous move, make the next one. The strategy should, thus implement the following interface:

Decomposition of a complex, stateful strategy

A strategy to play even such a simple game may be quite sophisticated. The implementation of next_move will embody a number of states we're currently at (such as "shooting along diagonals", or "trying ti finish a ship".), and the strategy will also implement a complex logic to transition among the states based on the shot result. This pattern, known as "State Machine" is viable, but the whole logic may be simplified.

We can use the rethinked strategy pattern to decompose the strategy into smaller ones. The "global" strategy described above in italics may be turned into a local one as follows. Instead of approaching the way we play a game as one "big" strategy, we instead consider that our strategy changes each move as well. In response to a message from the field, the rethinked next_movefunction now returns the strategy we are now using as well as the next move.

This way, we begin with "shoot diagonals" strategy; it changes to the "kill ship" strategy as soon as the response is "hit", which, in turn, changes back to the shoot diagonals strategy as soon as the response is "ship dead".

A clean way to implement it is to change the interface of a strategy slightly. Instead of a single next_move function that both returns the next move, and changes the internal state, we transform the interface as follows:

The peek function returns the next move you should make according to this strategy, and move function returns the next strategy.

This offers a simpler way to switch strategies. When the strategy feels that a strategy should be switched, it returns NULL from peek, this makes the caller detect that the strategy is going to switch, and doesn't pass the move to the adversary. Instead, it peeks the next strategy. The overall algorithm now looks like this (memory allocation is intentionally put aside):

You may do this in a more complex but more encapsulated manner. In the peek function, you make construct the next strategy, and peek its next move; save the strategy, and then route the move function to it, returning the strategy built.

The implementatinon for the Bttleship game stereos would look like this. Shot diagonals strategy is initialized with a battlefield state, and this should be enough for choosing the next move. As soon as a hit is encountered, the peek function returns nil, and the move func returns "finish the ship" strategy. The "finish ship" strategy is initialized with the last hit, and explores the area around the hit. As soon as the result is "killed", it initializes the diagonal strategy with the current battlefield, and returns it as the next strategy.

Now the implementation looks much more clean and decomposed. You can combine and chain together various strategies, discovering the most efficient through experimenting.

Note that this algorithm doesn't have any protection against infinite loops, so you have to ensure the termination manually. On the other hand, no other way to implement a strategy for a game has any protection against this.

A functional way to encode the strategy change

The decomposition may be continued. We see that the implementation of the "kill ship" strategy has the switch to a specific strategy hardcoded. This limits the reuseability of the class, because we would like to switch to killing a ship and back from many different strategies, rather than just from the diagonal one only. This hardcoding may be solved via subclassing or via using functors, i.e. passing a function to the strategy constructor (I prefer the latter).

When a specialized strategy exhausts the moves it can—or was ought to—make, it calls the function supplied to the constructor (or a virtual method, if you preferred subclassing), and this function returns a new strategy. Most likely, the function is taught to return the previous strategy via a closure. I.e. the "shoot diagonals" strategy's move function may contain such (pseudo)code (I tried to use C++0x lambdas here):

Concurrent execution of strategies

An established interface allows us to define combinations of strategies. Assume we would like to shot along the diagonals in different regions of a battlefield. Instead of implementing a strategy that shoots along two, three, etc regions, we may implement a strategy composition.

A simple composition takes an array of strategies, and asks them for the next move in a round-robin way. In some cases, it's a sufficient composition. In battlefield, however, we don't want, for instance, shoot into the othee regions, if we've hit a ship; what we want is to pay our full attention to that ship. In this case, we would use a priority queue instead to store strategies instead od an array.

What about forum downloading?

Now assume we are to download all messages from a forum, given

each message has a unique id that automatically increments;
there's no way to get the current maximum id without posting a new message;
Moderators may erase messages, and there's no way to distinguish if a message with a particular ID was erased, or it never existed (aside from encountering a message with a greater id);
we should be capable of both getting new messages and downloading the old messages at the same time

We can model the work of such a crawler as a battlefield-like game. Our "move" consists of an ID of a message to download, and of a pause to make before doing so. Our response is the message contents or a notification the it doesn't exist.

Our "strategy" would be a round-robin combination of two strategies. The first goes down from a specified id down to the message number one. The second waits for the new mesages in the following way.

The strategy goes up message by message until it encounters a nonexistent message. Then the strategy switches to a "random walk" strategy that finds a message with a greater that exists, making appropriate pauses. No matter how many messages a moderator would delete between our attempts we should be capable to finding a message that is not deleted. After we found one, we may be sure that all the nonexistent messages between the last one we downloaded successfully and the one we found during the random walk either exist or are removed forever. This part is implemented by another strategy, which then switches to the "random walk", looping the execution.

I implemented a slightly different downloader, given that we always know the ID of the last message, regardless of whether some latest messages were deleted. You may check its sources here (ahe downloader as at the time of writing), and here is the latest version of the script

Other applications and limitations

The most obvious limitation is that all the strategy classes you implement depend severely on what you're move is. Each strategy class you implement is going to construct move objects. If the interface in the constructor used changes, or its meaning becomes different then you have to alter all the classes you implemented. The same applies to the response objects.

Limited applicability

One of the oldest turn-based strategy games, chess, is one of the examples where the approach described is not the best one possible. A more effective way to play chess may be something like Alpha-Beta-pruning rather that structural exploration.

Trying to analyze the possible applications of this strategy framework, I came to a conclusion that it's quite limited. It is a nice way to model a strategy when your game is to explore "terra incognita", to reveal what is initially hidden. Indeed, in Battleship you do not react to intelligent moves of your adversary. Instead, you explore a carefully prepared field, initially covered by the fog of war. The forum downloader is no different: the actions of the evil moderator are in no way affected by the "moves" the program makes.

The framework supports parallelization via clever strategy combination classes, like the one I described earlier. Note that this abstraction moves the parallelization code away from the strategies themselves.

I don't think that this strategy framework it's useful for games where you are to make intelligent moves, such as Chess or Go. Moreover, even in "exploration" games (such as Battleship), it's ability to combine and decouple may just lose to a better tailored but a less decoupled counterpart. However, this framework is invaluable when you need too explore a complex and yet uncovered environment with limited resources, and want too experiment throughout the process.

BLAST at the Competition on Software Verification'12

May 22, 2012 785 word 4 min read

At my previous job in a research laboratory, I was improving a Static Analysis tool named BLAST. The name abbreviates its origin, and is decomposed as "Berkeley Lazy Automation Software Verification Tool."

By the way, my former peers have prepared a nice comparison of Static versus Dynamic analysis for the Linux Kernel.

The tool solves a problem commonly described as unsolvable, the Reachability Problem. BLAST is capable of telling—sometimes—if a certain line of code is reachable. It doesn't just run the program with some input data; rather, it reads the program source code, and tries to devise if some input data exist that make the program enter a specific location, a specific line of code.

Of course, it can't actually solve it (nothing can). What it can is to prove that a line is never reachable for a certain fraction of programs. A much more useful purpose is to find bugs. For instance, if we want to check a null pointer dereferences, then we could insert checks like this:

And it does find such bugs. In ISPRAS, a several-years effort within the Linux Driver Verification program/project has led to finding several dozens of problems in Linux device drivers. I even committed some fixes to these bugs to the kernel sources.

Tool Competitions

Well, I got distracted again. I've been planning a big blog post about this amazing technology for years, and I'll make one, but not now. Anyway, it was not me who devised how a program that checks reachability of lines in other programs works (it were two Cousuot-s), and I only fixed and improved an already released open-source tool, BLAST. I did create some novel things, but they were quite minor to my taste (it was just a new zero-overhead structure aliasing algorithm).

So, my last project in the Institute for System Programming was to prepare the tool to the Competition on Software Verification held at TACAS'12, which, in turn, is a part of ETAPS'12. There is a lot of domains where people write tools to solve the unsolvable (like the reachability problem) or the NP-hard (I wrote that you should try to solve this anyway). Such problems can not be universally solved, but a tool may solve them "better" than another, and it's interesting and useful to know who's the best.

A natural way to determine who is "better" is to measure the results of different tools on a common set of representative problems. This leads to specialized competitions among the tools that solve a specific problem. SMT-COMP for Satisfiability Module Theories logical solvers, and Answer-Set Programming System Competition are only two I know about, you can also name a couple of Chess and other formalized game tool competitions. I hope that SV-COMP will become one of them.

Our contribution

The Cousuots' approach is about 30 years old, and the technology beinhd BLAST itself is about 10 years old. However, BLAST contained a lot of improvements that made it apply to many programs, not just artificially created small "research-like" programs. That's why we chose it for finding bugs in Linux Drivers, and worked hard to improve it. Our improvement efforts are stored here, and are available under free software licenses, just like the original tools are.

But, being a less developed technology, it was eventually overtaken by more recent tools, such as CPAchecker, which provided a better ground for research experiments, and had a more developed theory. Still, the competition results demonstrated that the tool was still capable (and I hope that our improvements played a significant role in that). We got 5th place with 231 points, the 4th place being occupied by SATabs with 236. The final table was divided into two groups (200+ and the rest), and we belonged to the top tier.

What was funny that we even got a plaque for achieving the best results in "DeviceDrivers64" category... naturally. Too bad I wasn't there to see the ceremony :-(. Here's a photo of our team who contributed to the improvement of BLAST since version 2.5:

By the way, I already featured my former peers in a fictional interview about a "collaboration tool" by Penn&Paper's.

(Left-to-right: Vadim Mutilin, Pavel Shved, Mikhail Mandrykin.)

I'm sure that this project will eventually be superseded by more advanced software, but I still hope that I'll have time to have some fun with it as well.

A more detailed post about the verification technology pending, I'd like to thank my former peers for the great time I spent in the ISPRAS working on what I wished to do since I was a 7th-grader who first heard about the Halting problem. And I wish all the best to the Linux Driver Verification project as well.

Logging in Expressive Languages

May 12, 2012 809 word 4 min read

I am not aware of any strict mathematical definition of expressiveness of a language. Informally, a language is more expressive if it allows you to achieve the same with less keystrokes. From this point of view, for instance, Perl or Python are more expressive than C. Of course, you can't always achieve with Python most things that C is capable of at low level, and therefore one may question whether Python is really more capable than C. But capability has little to do with expressiveness. In certain domains, one language may be more expressive than the other, and in other it could be just vice versa. (For instance, Python is more expressive in string operations while C is more expressive in low-level raw data representation.)

One may notice, however, that the majority of languages are Turing complete, and you basically may express the same with all of them. What is the difference in expressiveness then, as seen by a Practicioner rather than a Computer Scientist?

I usually use the following criterion. If you write one non-splittable expression instead of a chain of statements, the language is more expressive. Compare the following two programs that compute a sum of an array in C:

...and in Ruby:

..and, in C, we also should find a place to deallocate the array when it's no longer needed.

You may notice that Ruby can achieve with a single expression what C needs several operators and variable declarations for. Three Ruby features contribute to this, anonymous lambda functions, syntax sugar to pass them to functions (such as inject), and to avoid the really useless variable declarations. Isn't it always that beautiful?

In some cases it's not.

What does it take, to log?

Now let's try to add some logging to the programs above. Assume we actually want to hook into the loop, and make it print the square result it has calculated to a logging stream.

Let's take a look at the code again.

Ruby:

Ewww! Why does our Ruby code looks like C? Where is our one-liner? Where is the expressiveness? How come we're explicitly introducing a temporary variable just like in C?

I spotted such patterns, "compute-log-return" in a lot of places in my practice. For i instance, the BLAST codebase I maintain, contains these here and there, because it's written in an expressive functional language (OCaml), and uses logging as a primary debugging mechanism. Because of it, the code that should look beautiful looks as an old prostitute trying to mask their weary skin with tons of makeup, and I'm not happy with this.

Trying to get asleep on a plane, I suddenly realized that this logging could be achieved with a better code. Since "assign-debugprint-return" is a recurring pattern, let's define a helper function

Then, the resultant Ruby code will look like this:

Now our Ruby code looks like Ruby again.

Of course, this is not limited to Ruby. Though less versatile, the sane function may be implemented for the C code as well.

And this brings us back to the original comparison with 1 line in Ruby and five--in C.

Discussion

while you may write such function for a less expressive language (as seen above), it will "stand out" in a large C code sheet and look like ugly duckling. What may benefit most are languages that mix functional and imperative coding styles, such as the aforementioned Ruby and OCaml. The reason why this helper might be useful is that it's relevant to the functional languages in two different ways.

First, functional language allow the programmer to define anonymous function inline, which makes them be frequently passed as parameters to other functions. In C, defining a separate function is, inmost cases, uglier than just adding a convenient logging shortcut.

Second, logging an intermediate result contradicts the functional level paradigm. In functional languages you define a computation as a whole rather than specify concrete sequence of actions. In case of logging, you have to violate a chain of functions with a side-effect operation. Of course, you can try to make it look nice, as we did, but the very reason we need the kludge is that we're doing something a bit wrong.

But it's not limited to functional languages. Most expressive languages that do not require from user a strict specification of type of every identifier can. For example, we might want to debug Perl's regular expression substitution with using a similar function:

(we couldn't use log because it conflicted with the mathematical logarithm function that bears the same name).

***

In any case, the "compute-log-return" function that encapsulated the debug print is a widely useful pattern. And it's especially useful when you're writing a program in an expressive language. Do not let the programming cruft conceal the actual idea of your code, use loggers that hide all the details.

Counting Visits Efficiently with "Wide Caching"

April 5, 2012 3373 word 16 min read

One of the features of the web forum engine, a clone of which I develop in Ruby on Rails, is that it displays some site usage statistics.

There are two kins of statistics it displays. First, it tracks and shows how many times a particular message was read (i.e. clicked), tracking if a user doesn't self-click his or her message. Second, it shows the generic site usage activity, unique visitors, and page views in the top-right corner.

In this post I'll tell how I solved the problem of calculation the page view and visitors statistics. Impatient developers may jump directly to the solution section, while others are welcome to read about the path I walked from the most naïve activity tracking to the Wide Cache.

Naïve solution

The naïve solution was to store all site accesses in a table in the Application MySQL database. Each line in this table would represent one access to a site, keeping a user's hostname, and the access time. At each access, a corresponding line is inserted into the table. Then, if we are to display the information gathered, we run two simple db queries to get the pageview information (the time is the current time minus 10 minutes):

SELECT COUNT(distinct host) FROM `activities` ;
SELECT COUNT(*) FROM `activities` WHERE (created_at < '2012-04-01 18:33:25');

The writes to and reads from the activities table happened at each request. This was achieved via a before_filter in the root controller which is usually named ApplicationController, as hinted in this StackOverflow question:

It spawned another process that handled all the activity business, and didn't prevent the main process from serving pages timely.

As you might expect, the performance measurements were disastrous: the application could serve only (four) requests per second, while with all activity stuff turned off the throughput was about 40 rps.

Storing the table in memcached

The reason why the naive solution was not working as expected was lack of throughput. At first, I thought that since we could defer the database writes, and perform them after the request was served, we wont have any problems. However, this only addressed one of the performance metrics, response time, while what solution lacked was throughput.

I also didn't want to introduce other technologies specifically for activity tracking. What I had at my disposal was memcached. Other components of my application used memcached to cache database reads and whole pages. We could, I thought, cache database writes with it as well.

Rails storage with memcached backend supports two operations. The first is to write something under a string key, perhaps with a limited lifetime. the second is to read from a specified storage entry if anything has been written to it before, and if its lifetime is not over yet. That's basically all.

We could try to use memcached to store the SQL table. itself. Indeed, that table may be viewed as an array of rows, so we could just read the table, append a row, and write the result back. However, for the sake of performance, memcached doesn't support any locking, so we can't just store the table described in the naive approach in a cache bucket. Two threads may read the array, then both append the next row, and both write, one row being discarded. This is a classical race condition. And my aim was to serve 30-40 requests per second, which means that race conditions that appear if this approach is used were inevitable.

Besides, even if we could use locking (for instance, via Linux named locks (flocks)), it could even perform worse than the database, because it wouldn't be able to handle enough requests sequentially. We need a certain degree of concurrency here.

Wide caching

To mitigate the race conditions that happen during database writes, I used a technique I named "wide caching". I certainly reinvented the wheel here, but for the sake of clarity, I'll use this name in the rest of the post.

The race condition from the previous section could be mitigated by distributing the big "table" into smaller, independent tables stored in different memcached chunks. Writes to each of the chunks would not be protected by a mutex, so a race condition would be possible. We could try to mitigate it by using a round-robin chunk allocation, i.e. a new thread writes to the chunk next to the one allocated to the previous thread.

This solution, however, could be improved.

First, round-robin allocation requires a central, synchronized authority to return proper chunks. This brings the same level of technological complexity as a single bucket with a lock.

Second, round-robin doesn't guarantee the lack of race conditions either. A thread may stall long enough for the round-robin to make a whole "round", and to consider the chunk currently in process again. To avoid this, the allocator should be even more complex.

Third, let's realize that we may trade precision for speed. The result will not be absolutely accurate. People look at the site usage statistics out of curiosity, and the figures may not be precise. We'll try to make them fast first.

This all suggests a simple idea: get rid of the allocator! Just use a random bucket in each thread.

This will not prevent us from race conditions either, but it will make them predictable. If the allocation is completely random, we can carry out experiments, and extend their results in time being sure that they will be reproducible.

Decreasing effect of race conditions

The previous paragraph consisted of reasons and measures that addressed the amount of potential accesses to the same cache bucket within a period of time. What also matters is how long this cache access is. The shorter it is, the more writes to a hash bucket per second we are going to handle correctly.

Memcache request structure

Rails provides a transient interface to various caches. The interface allows to store serialized Ruby structures in the cache. This picture shows the steps that happen during an update of a table in hash, the scale roughly representing the time each of the steps take.

A typical hash bucket access consists of these steps:

reading raw data from cache;
deserializing, converting to Ruby format from a character format (the opposite to serializing;
appending a row to the deserialized table;
serializing to the character format (the opposite to deserializing);
writing raw data to the cache.

We can see that steps 1, 2, 4, and 5 depend on the amount of records in the array we store in the hash bucket. The more information we record, the more time they take, and when the data are large enough, the time becomes proportional to the size. And if we just store all the activity data in cache, the size of the arrays being (de)serialized only grows in time!

How could we decrease the size of buckets? The wide cache already decreases the expected size by a factor of width, could we do better?

It turns out we can. Activity is a time-based data. The older the data are, the less interesting they become. In our use-case, we were only interested in the data for the latest 10 minutes. Of course, the routine cleanup of tables that drops all records with old enough timestamps is not sufficient: the activity data for 10 minutes are large enough (with 50 rps, and 50 being the cache width, the mean size would be 600).

Time reminds us about another idea. Why do we load and then write the old data for the table? They don't change anyway! Since we're using a generic cache solution, and can't lurk into its read/write algorithm, we can't do it on a per-record basis (i.e. do not load the table, just append the record being inserted. What can we do instead?

We may utilize another highly-accessible resource, which usage is however totally discouraged in building a reliable distributed algorithm. We have the clock that are easy and fast to access. The clock provides us with a natural discriminator: each, say, three seconds, abandon the old hash bucket, and start a new that will be responsible for storing the requests for the next three seconds. Given current time, we can always find the "layer" of the cache we should randomly select the bucket from.

Reading the data back

The discussion above was devoted to how to record the activity data. However, the data should be displayed in a timely manner. How timely? Let's consider that if the data are updated at least once per 30 seconds, it's good enough.

Then, each 30 seconds we should reconcile the activity data written into the memcached, and "compress" it to an easily readable format. Since I already had MySQL implementation before, I piggybacked on this, and merely inserted the activity information to the SQL table, so the reading procedures do not have change at all!

To get the statistics, we'll have to iterate all hash buckets the writing algorithm could fill, because gathering the ids of the buckets filled will require additional synchronization (additional writes), and, since the majority of them will be filled under high load, we'd better just collect them all. Theoretically, we should collect all the buckets that might have been filled during the last 10 minutes, the period we show the activity for. However, if we run the collecting algorithm each 30 seconds, we can only collect the data for the latest 30 seconds as well, since all the previous data should have already been collected.

We'll have another race condition here. A worker may get the current time, generate a hash bucket id X, and then stall for several seconds. During that stall, the writing worker collects the commits all the data to the MySQL table, including the piece stored in X. The first worker wakes up, and writes the visit information to X, from which it will never be read, so the request is lost.

To mitigate this race condition, we may collect and commit the data only from the layers that are deep enough. This won't help to avoid it completely, but will decrease its probability to the degree at which we can ignore it.

The final Wide Caching solution

If we assemble all of the above, we'll get the final solution that looks like this:

When a page request arrives, it asks for the current site usage statistics as one of the steps during the page printing. The statistic data are requested from the `activity` table in MySQL database, and are cached for a medium amount of time (30 seconds), because more frequent updates do not improve user experience.

After the page has been served, the request notifies the wide cache about the access to the site. First, it determines the "row" of the cache by rounding the current time in seconds since the Epoch down to the number divisible by three. Then, it determines the "column" by choosing a random number from 1 to 15. These numbers are appended; they form a unique identifier of a memcache entry. The website then reads the table from that entry, appends a row, and updates the same entry.

Dumping the collected accesses to the DB is performs like this. After notification, the request also checks if there is a live memcached entry with a special name, and a lifetime equal to 30 seconds. If there is such entry, it means that the information in the DB is obsolete, so the algorithm starts to commit the information into the MySQL DB.

While the information is being committed, there may appear other requests that would also check the lifetime of the memcached entry, and start the commit process. This is why the order, in which the memcached entries are being read is randomized, so that the amount of cases when the same information is committed twice is minimized.

Here's the code. It really looks much shorter than this post that describes it.

The latest version of the source code may be found here.

Experiments

I mentioned that the setup contains race conditions that will lead to losing some requests. With the cache width of 15, and bucket height of 3 seconds, the payload of 35 requests per second made the tracker lose 350 out of 6800 requests. This is approximately 5% of total number of requests, which is acceptable. Because we randomized request queries, we may conclude that 5%—given these figures for requests per seconds, cache width, and the equipment—will be an average visit loss factor.

Spawning

In previous sections, I claimed that spawn-ing threads using spawn Rails gem, and writing to the database in the spawned processes/threads is slow. Indeed, spawn reinitializes the connection in the spawned thread, and this is already a considerable burden on the server if several dozens of threads are spawned each second. Here are the experimental details (see bug #67 where I first posted them info):

Method	Reqs. per sec.
No activity	40.2
With activity; no spawning	27.4
With activity and spawning	13.0
With wide cache	36.7

This shows that (a) you should not use spawning for short requests, and (b) wide cache is really fast enough.

Background periodic job or per-request?

In the algorithm described above, activity data collection was started when a request arrives, and the application finds out that the currently cached activity stats are out of date. We were careful to make this case as painless as possible, but it has other implications that are not related to race conditions, and experiments show that the loss of throughput is acceptable if we don't try to spawn a thread for each this activity.

Surprisingly, this method starts performing badly when the site activity is low rather than high. On a low-activity site, we can't rely on requests coming each second. Rather, they may arrive even less frequently that activity cache times. So, to support the low-activity case, we have to collect the information for caches that might have been filled in the latest 10 minutes (the maximum value a visit information could still be relevant for), not 30 seconds (the lowest possible time since the previous data collection). Otherwise, if users arrive less frequently than 30 seconds, the engine would always show zero activity, which would make users visit the site even less.

This wouldn't be important, unless I used HAML template engine, which—and this is a known, sad bug—doesn't flush the page until the request is complete. Even putting the code to after_filter doesn't help. Experiments demonstrate that activity data reconciliation may take up to 1 second, so some requests will be served with an excessive 1-second delay, and when the rate of requests is low, this will constitute a considerable fraction of them. And I surely don't want my users to experience frequent delays during their infrequent visits.

Spawn instead of after_filter? We have already seen that spawn-ing will make our server choke at the other extreme, when the load is high.

Luckily, we have a solution that suits both cases equally well. It is to periodically invoke the activity data collection procedure, without tying it to requests. However, the periodic invocation of a task is not straightforward in Rails. I'll get back to it in another post.

Future work

The current implementation of the activity tracker is parametrized with cache attributes (cache lifetime, and width). However, this Wide Cache could be parametrized further, with the procedures that are executed, namely:

Cache bucket updating
Commit procedure
Reading what we previously committed

I think that I'm going to need the same cache that doesn't always touch the database for the first kind of the activity, for visits of individual posts. This parametrization will help me to keep the code DRY, and to re-use the Wide Cache.

This will require refactoring of visits and hits to a single function that calls a lambda function passed in the constructor.

Other approaches

Parse apache/nginx/rails logs.

Indeed, the topmost web serving layer already collects the activity data: it carefully logs each visit, with the URL being visited, a user's IP address and user-agent. Instead of spending time on activity in a comparatively slow Rails application, we could use the logs of a "fast" web server.

I have seen production systems that display activity data based on parsing nginx logs, and it may be integrated into Rails in such a way that it doesn't look like a kludge. Perhaps, free log parsers are already available on github... but this solution just doesn't look cool enough to me.

Do no track activity

Does anyone care about the visits at all? Maybe we just shouldn't track them?

First, from my observation, everybody tries to stick a visit tracker into a website. A bigger activity also means that you're visiting something valuable. A Russian clone of the Facebook social network even used to display a fake registered user number counter at their homepage. Such fraud could only be justified by a huge benefit from displaying it, which means that the statistics is considered valuable by at least some very popular sites.

Second, in my case, there was an easier answer. I should reimplement everything the engine I'm trying to clone contains by politics-related reasons: unless I reimplement everything, and manage to keep the site fast at the same time, my approach to the development will be considered a failure.

Use a specialized activity tracking solution

Too late. I have already implemented my own. But if you want some activity data on your website, do not hesitate to use one. It is hard, see above.

Conclusion

In one of the discussions on the very forum I'm reimplementing, someone wondered, why the view count on Youtube is not update in realtime for popular videos. Having gotten through a series of trial-and-failure experiments with visit tracking, I realize that the developers of Youtube surely had their reasons. My activity information is also already delayed, while the traffic volume still insignificant compared to even a single Lady Gaga video.

It seems that during the development of this tracking I have reinvented a lot of wheels. This is how, I guess, the most primitive database and file systems caches look like. Their algorithms are, of course, more sophisticated, and are not built on top of the generic cache and serialization solutions, using their own, custom ones instead.

But as any other re-invention of a wheel, it was fun, and I surely got a lot of understanding about higher-load web serving while trying to optimize this component of the forum engine.

Ruby on Rails Errors and Solutions

March 10, 2012 2236 word 11 min read

When you develop something in Ruby on Rails, you will most likely encounter strange errors here and there. These errors may look like language or framework bugs, but they are not. Most likely, they are programming bugs you were not aware of before you encountered them.

The reasons for that are twofold. First, Ruby is a dynamically-typed language, and this alone introduces subtleties you should know about. Second, the web framework values convention over documentation. It means that methods and functions should just work as expected, without making the user read the whole documentation page.

And when something just doesn't work, you'll have troubles finding the cause in the API docs. You'll have to google the exception message, and hope to find the cause.

This post is another contribution to this Exception-Driven Development, and lists some errors a Ruby/Rails programmer may make. Some of the errors were found via web search (I tried to provide a proper attribution where possible), and some of them were solved by my own. I'll try to keep this list up-to-date, and add more errors here as I write more Rails programs.

Generic Ruby Issues

Calling setter methods

You are setting a value of an ActiveRecord field from inside a method of your model, but it does nothing? You are doing it wrong. Perhaps, you write something like this:

But the value is reset after this method. What's happening? The field = ... line actually creates a new local variable, rather than calling the object's field=(value) method ActiveRecord has created for you. Just use self to distinguish:

Note that you don't have to do the same when you read from the field—unless you have created a local variable with the same name! This is why the second line of the method doesn't have to contain self in this case.

Sometimes this strikes you from a different angle. For instance, if you have a quality attribute in your model, and you want to set conversion quality with RMagick, you have to write:

because you can't easily access the outside's variable quality from inside the block, as it's routed to the RMagick object instead.

Other gotchas

You might have specified :false instead of false when you pass a hash attributes to Ruby. I devoted a separate post to this.

Surprising Bugs

Error to clean your cookies

If you have this error:

ActionController::InvalidAuthenticityToken in SessionsController#create 
ActionController::InvalidAuthenticityToken

RAILS_ROOT: /home/rails/app/
Application Trace | Framework Trace | Full Trace 
/home/rails/app/vendor/rails/actionpack/lib/action_controller/request_forgery_protection.rb:79:in `verify_authenticity_token'
/home/rails/app/vendor/rails/activesupport/lib/active_support/callbacks.rb:178:in `send'

you should clear your cookies. Perhaps, you have deployed a different site under the same hostname. If this alone does not help, check if you have set up constant DOMAIN or SESSION_DOMAIN in your config (I had them in config/environments/production.rb), adjust them, and restart your server.

Strange remote_ip error

This error:

undefined method `remote_ip' for nil:NilClass

means that you accidentally named one of your actions "request". And you can't name your actions "request", as there's some clash when the engine wants to get a current request, but calls your controller method instead. Rename your action.

Spawn plugin and Rails 3

Your spawn plugin is throwing this kind of error:

spawn> Exception in child[26055] - NameError: uninitialized class variable @@connection_handler in ActiveRecord::Base

That means that it's too old for your current version of Rails. Upgrade it—to the latest commit it has (this is not on the master, but on its edge branch), or use it as a gem. If you use Rails3, here's the line for your Gemfile:

gem 'spawn', :git => 'git://github.com/tra/spawn', :branch => 'edge'

Commit 0e1ab99b worked for me, the corresponding gem version is 1.1). Source.

Unicode problems

UTF-8 regular expressions do not work by default? Add #coding: utf-8 as the first line of your ruby file. If it's an ERB or HAML template, add this at the beginning anyway.

If you want to make your application completely Unicode, from the database to the tips of your views, you should do all of the following.

Make sure that all of your source code files are in UTF-8;
Make sure you added # coding: utf-8 at the beginning of any source .rb file (views, initializers, library files) that actually contains non-ASCII characters;
Make you database Unicode:
- make sure you set utf-8 character set for the database as a whole;
- set utf-8 character set for each table;
- set utf-8 character set for each text or string column in the table;
This is especially important if you dump your database and load if from the dump on another server. In this case you may also need to cast @SET names='utf-8' in your mysql console;
the "default" rails gem doesn't support Unicode. Use mysql2 gem instead as your MySQL driver.

If you miss one of the steps, you may encounter problems. For instance, if you do everything except for the mysql2 gem, you may see these errors for your views:

  Showing app/views/layouts/application.html.erb where line #48  raised:
  incompatible character encodings: ASCII-8BIT and UTF-8

If you do not set up the character set for the database, migrations may create new tables with latin1 encoding, and this may result in ?????????? instead of non-ascii lines in the data user enters to your DB.

Models, Databases, and Validations

Ever-failing presence validation

Do NOT add empty? method to your objects unless you really mean it! This can strike you from you back when you don't expect.

Assume you want to validate that a "belongs_to" association presents at save. You might use validates_presence_of for this. However, as specified in the documentation, it calls Rails' Object.blank? to check presence. It, in turn, calls empty? method (documentation) if it exists.

In my forum engine, I used empty? method to convey if the post's body is empty. Therefore, you couldn't answer to posts with empty bodies, because validation thought that such posts are nonexistent.

Instead of using a complex validation, I suggest renaming the empty? method. When the author of the next model forgets—or won't even happen to know—about this bug, he/she will still try to use validates_presence_of. It's better to make it work, following the principle of least surprise.

Model name clashes

You can't name your models like Ruby classes, i.e. Thread (conflicts with internal class for threading) or Post (conflicts with HTTP library).

It's interesting, that the default scaffolding automatically names your model "Threads" when you instruct it to create a "Thread" model to avoid conflict. However, you still need to specify the proper class name in associations:

Check the scaffolded controller for other declarative methods that may also need adjustment.

Using several databases in the application

If you want to specify several databases for your app (on a per-model basis), you can use a built-in ActiveRecord capabilities. Just add a section in your config/database.yml with a distinguished name (i.e. second_db instead of production), and add this to your model:

See the docs. Found here.

Troubles with making a form with validation but without database

Rails3 has a built-in solution for creating a form that use ActiveRecord-style validations, but are not backed up by a DB table. Indeed, not reusing this functionality would be a waste. But just including ActiveModel::Validations makes your form yeild errors like this:

undefined method `to_key' for #<Loginpost:0x000000038de700>.

Listen to a short RailsCast on ActiveModel, and learn how to add some magic to your model. Here's how it should look like:

Validation of non-database attributes

Want to validate the range of numbers in your temporary attributes? You write

but get a kind of "undefined method something_before_type_cast" error:

undefined method `quality_before_type_cast' for #<Image:0x7f579f5fe940>
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/attribute_methods.rb:255:in `method_missing'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `send'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `validates_numericality_of'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:479:in `validates_each'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:476:in `each'

The error trace makes a hint about the solution:

For ActiveRecord attributes, these methods are automatically generated. However, for our attributes that are not backed by DB columns, we should create on our own.

Uniqueness validation column name error

I created a model, added an `validates_uniqueness_on`—uniqueness validation—and tried to create a valid record. However, I was getting the NoMethodError: undefined method `text?' for nil:NilClass error no matter how hard I tried:

NoMethodError: undefined method `text?' for nil:NilClass
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:57:in `build_relation'
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:25:in `validate_each'
        from /activemodel-3.2.2/lib/active_model/validator.rb:153:in `block in validate'
        from /activemodel-3.2.2/lib/active_model/validator.rb:150:in `each'
...

I checked the code, and realized that my uniqueness validation referred to an association name, instead of the database column name:

It seems that you must specify the exact database column name in the uniqueness validations since they are closely integrated with the database.

Connecting to a local database via TCP sockets

If you want to avoid using file sockets for connection to a local database, it's not enough to just remove the socket line. Rails will replace it with the default socket configuration if you use localhost as your host. To connect via a network socket, you have to change "localhost" to something else by adding an entry to /etc/hosts, for instance.

Housekeeping

Accessing methods from console

Contrary to the popular belief, there is a way to easily access helpers and other useful application methods from console. Here is a couple of examples:

>> app.project_path(Project.first)
=> "/projects/130349783-with-attachments"
>> helper.link_to "Home", app.root_path
=> "Home"

You may read the complete guide here.

Get list of Rake/Capistrano tasks

When you use rake tasks, or a similar program (such as Capistrano), it's easy to forget what actions it provides. Googling for the list doesn't help either. Where can you find them?

They're right there:

These commands print the list of all tasks. With cap -e task:name, you can get a more extended help.

Redo specific migration

$ rake db:migrate:redo VERSION=20111122143422

Rake can also go several steps back/forward in migration application; you now know how to see the docs (rake -T).

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 has

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

MySQL gem and Mac

If you install mysql onto a MacOS with MySQL 5.5, you may encounter problems that show up as a strange exception:

rake aborted!
Constant MysqlCompat::MysqlRes from mysql_compat/mysql_res.rb not found
Constant MysqlRes from mysql_res.rb not found

Tasks: TOP => db:migrate
(See full trace by running task with --trace)

It turns out that the problem here is a wrong mysql gem version. Uninstall the existing gem, and install 2.7 version, fixing some errors afterwards with a symlink:

$ gem uninstall mysql
$ gem install mysql -v 2.7 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

Source: Sergey Ler told me about it.

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 supports OS-level threading, and some multithreading bugs fixed.

JavaScript in Rails

jQuery ajax events do not work

I used Unobtrusive jQuery JavaScript gem for Rails. I read in its Installaion instructions that jQuery is among its "requirements". I downloaded jQuery, and put it into my assets folder, and then included the //= lines into the application.js.

As a result, my jQuery didn't work. It sometimes worked, but its AJAX events did not fire although the server reported that queries had been made. The mistake was that I should had not downloaded and placed the jQuery js file into the assets folder. I should only had added the "require" lines into the application.js.

Thanks to StackOverflow... as usual.

Talking to Web APIs

Specifying POST and GET PARAMS at the same time

If you are to POST some HTTP data to a URL that has GET parameters, documentation used to elide this case. The correct way to do this is to create URI object, and supply it as a whole, not just uri.path.

res = Net::HTTP.post_form(URI('http://example.com/webservice.php?token=' + TOKEN + '&function=api&id=50'), {:post_param => post_value})

Alternatively, you may supply a URL instead of URI, but I recall we had issues with this, though the exact reasons are now forgotten.

Converting data structures to POST form data

It's strange that Rails doesn't have such a method (I couldn't find it in version 2.13). When you invoke API of a web service, you sometimes need to convert arrays, hashes and other structures to a result that looks like a HTML form that Rails itself uses. For instance, this structure:

should be "flattened" to a "form field name"-"form value" set like this:

The "solutions" found in the internet seem to have been written by programmers who are not aware of recursion. Here's the correct one I wrote:

Russian Support

Here's a gem that adds Russian support to your Rails messages in default form validation and helpers. Russian morphology differs from that of English. In this plugin, the issues with date and proper clause of plural nouns are solved, though the difference in nouns on form create/update nouns is not.

Optimizing Massive Data Views in Ruby on Rails

February 19, 2012 3365 word 16 min read

Ruby on Rails is a web application development framework most known for its flexibility and heavy use of metaprogramming. Rails programs, as one of my peers described, "look like they explain what to do rather than how". That's true. This declarative nature of Rails, however, comes at a cost. At a cost of performance, of course.

A couple of days ago, I was developing a tree-shaped web board engine (some of you have probably forgot what these are; here's the engine I'm creating a clone of). I designed a data model for users, authorization, threads, messages, message revision history, etc, and implemented a comprehensive set of operations with them, such as creating, displaying, editing, and so forth. The implementations looked nice; it consisted of carefully engineered bits of declarative programming.

This is how the index page of the forum should look like:

You may notice that all messages from all threads are displayed However, when I tried to display several thousands of messages on a single web page, which is an average figure for an board like this, it appeared deadly slow. Nobody would wait 32 seconds for the index page to show. Caching would help, but as several messages are posted each minute, most cache accesses would be misses.

In the rest of the post, I'll show you how the speed of page displaying and rendering may be increased by a factor of 50 (fifty).

Data model

The key source of why the original speed was so surprisingly slow is that I designed a database layout that fits a relational database requirements, in which the data span across several tables that may be join-ed. The query to display the whole information should thus not be slow to achieve. Here's how it looked like:

The image above was generated with "MySQL Workbench" tool. If you supply a database with properly set up foreign keys, which I did manually, as Rails doesn't set them up for compatibility reasons, it will show you a nice entity-relationship diagram.

The messages to display consisted of the posts made to the newest threads, which were created within the last day. It's obvious that you could get all the necessary information from the database with a single SQL query with a lot of joins though, but no unions, the query being executed negligibly fast by even the "default" MySQL server. Why was it so slow?

How Rails make you program slower

There are two main reasons why the naïve way of displaying these posts was slow. By the "naïve way" I mean a simple view that renders each post and its parameters by querying its association attributes, like this:

You saw the tree that should appear as a result of this above.

How the controller would look like for this view? Due to database design, it should select a thread, and render a tree of its root (head) post. So the controller would look like:

So, what happens when a view is rendered? For a single thread, all messages are fetched by a single SQL query, and are converted to an array. The ActiveRecord has_many association within a thread object automatically does this; this operation is relatively fast. Then, we assemble a tree structure of the posts (it works very fast, and is not shown here; the result is returned by the build_subtree method), and render these posts in the proper order.

Trying to render a post, Rails encounters its title function that queries the latest revision of a post's title. This query is resolved as querying the TextContainer, and, subsequently, the TextItem with number = 1, and revision matching the latest_revision of the TextContainer. Then the same is repeated when Rails renders the post's user name: it queries user object for each of the posts separately. Therefore, for each post, Rails performs three separate queries to the database, each involving the whole ActiveRecord's overhead for initiating a query, finalizing it, and packing the individual objects.

These multiple and excessive database requests, followed by the subsequent conversion from SQL to "heavyweight" ActiveRecord models, contributed to approximately 25 out of 32 seconds of inefficiency at each call. Let's try to optimize them.

Optimizing ActiveRecord queries

One might infer from the reasoning in the previous paragraphs that ActiveRecord is just a bad framework for ORM. This is not true in two ways. First, the main objective of ActiveRecord is to provide an ORM that works with little configuration, or even without configuration at all. Second, ActiveRecord is mainly used for representing single-table models; in our case, however, we deal with data representation that spans several database tables.

This suggests the way to fix performance here. Indeed, we already noted that all the data required to represent the tree of posts, may be fetched with a single SELECT query. We can create an ActiveRecord model designed specifically for this query (you may call it a "view" if you like) reaping the benefit of having no configuration alongside. In the view, we alter the methods that query the information, so that they match the fields

We don't even have to alter the way the properties are queried in the view by introducing fake "proxy" objects to our new model, like this. table_tow.user returns a simple object that has a login method initialized with a value from the query. In my case that would be an overkill, but if you have a lot of views to fix, you may explore such an option.

In the case of the view presented above, we'll need the following fields, id, title, created_at, empty_body (instead of body.empty?, see sidenote), marks (which is a serialized array automatically maintained by Rails' serialize class method), unreg_name, user_login (instead of user.login, see sidenote), host, clicks, hidden. Not shown directly in the view, but used to build the posts tree is parent_id field. We'll call the model to comprise these fields FasterPost, and start with an empty model stored in app/models/faster_post.rb:

These posts will be fetched via Thread class. We already had a has_many association "Thread has many Posts", and we modify it using a standard :finder_sql attribute:

The settings_for is a global helper that denotes the current user, we'll return to it later.

While :finder_sql was originally used for slightly different purpose, we piggyback on its ability to execute arbitrary SQL query (ActiveRecord has only two such ways, the second is find_by_sql).

Let's try to fetch data bia this association, and we'll see an exception:

Mysql2::Error: Table 'database.faster_posts' doesn't exist: SHOW FIELDS FROM `faster_posts`

The original post model has a plural name (Posts) because the ruby standard library contains the singluar Post

Indeed, ActiveRecord doesn't know what columns there should be in the resulting table. We need to override a couple of methods; as of Rails 3.1.3, these are columns, and columns_hash. These methods are not really used, and we need to supply some ActiveRecord's internal structures for that. Instead of supplying the actual column information for our raw SQL, we simply copy the column information from Posts model (and cache it, as ActiveRecord itself does):

When we try to serialize this model for debugging purposes via the inspect method, we'll see that for our model is not really useful: it doesn't show all the fields. We may find the original Object's inspect method useful here:

That's all what's needed to make this model work (see sidenote for one more subtle issue). However, we may find out (with the profiler) that we still spend a lot of time inside some ActiveRecord methods. When a field method is called, it still gets through some ActiveRecord internal checks and conversions, which take a lot of time, and are, basically, useless for our lightweight model. Even that we have to enter method_missing to route the method to the actual field slows us down.

I found that ActiveRecord has a method that is close enough to the data fetched from the table, but it doesn't perform some conversions (for instance, boolean values are not converted, and are fetched as strings instead). It is read_attribute_before_type_cast (note, that its attribute should be a String, not a Symbol!). So, each fast field method should directly call it as "short cut". We can generate these methods for all the query fields with Ruby metaprogramming capabilities:

Do not forget to add overrides, which should be declared after this call, for methods that require additional processing (such as serialization):

or return Boolean or Integer values:

After these fixes are put in place, and the view is altered accordingly, all the database queries will not take any significant time. Moreover, this solution also complies to Model-View-Controller pattern, as the data are fetched in the controller, before the view is rendered. This is, probably, one of the rare cases when compliance to the Design Patterns does boost performance.

This, originally, decreased the original runtime by approx. 20 seconds out of 32. But there still are 12 seconds to optimize away. How could this be done?

Optimizing the view

ActiveRecord optimization is not enough, though. A lot of time is spent in the view itself. I had to use a profiler to do that, and made several wondrous discoveries. Using profiler is quite simple in Rails 3. All you have to do is to start a server, and invoke in the console

$ rails profiler -r 5 'get("/")'

and open tmp/performance/ProfilerTest#test_get_process_time_graph.html in your browser. An annoying downside is that it prints its log to the server's console (but you're not trying to do anything while profiling, are you?) Here are several points to optimize I found.

Link Generation with link_to

One of the greatest offenders was the code that generates links. The usual link_to and autogenerated path methods, such as post_path(post) are very slow, compared to the rest of HTML-generating code. However, if we try to generate URLs in our view on our own, it may no longer be enough to edit config/routes.rb to change all the methods in the application. We'll lose the flexibility of Rails we love it so much for!

Instead, I used a kludge for link generations. I assumed that links to all posts look the same, and the difference between them is only the post's ID. Therefore, if we generate a link for one post the Right Way (with link_to and ..._path functions), dismantle it, and create other links as simple string concatenations from the resultant parts, we'll avoid this overhead. Here's the helper I created:

It creates a link generator that generates the link in the fast way, provided all the links are the same. Calling this method from the view is as clean as calling the link_to:

The method returns a proc object (function), and brackets are used to call it with post parameter.

Authorization

Declarative Authorization is a very nice plugin to manage authorization. It's declarative, and is very convenient to use once you carefully read its manual, and start naming your before_filters by its rules. However, I found out that it's terribly slow when you are to query authorization rights for thousands of posts.

I merely got rid of authorization at the index page, and moved it to a page of individual posts. But you could add more JOINs to your SQL, fetch user roles, and write some lightweight authorization methods instead, if you really need it.

Another way to optimize authorization could be the same kludge that was used in Links optimization. In most cases, if you are authorized to delete a post, you are also authorized to delete all posts in this threads (or in the whole forum). You can try to piggyback on these assumptions.

User-specific settings

How could we optimize user-specific settings on a large scale? Here's the excerpt from the view code I'm referring to:

This excerpt implements user's functionality to show/hide posts in its own view, not affecting what other users see. In the big SQL statement above you could notice that we join the hidden_posts_users to fetch the user-specific settings. Here we also encounter the same problem that with the links, but the "simple" link generator should be slightly more complicated:

but its usage will become even simpler:

Note that wile you supply the block that translates the strings thousands of times to the fast_hide function, it will be called only once, at the very beginning.

Translations

I won't dive into here too much, since it's obvious that you should not call t method that translates a string according to the current locale thousands of times for the same string. Save the result of a single call into a variable, and re-use it. It won't change as you render parts of your view anyway.

However, this doesn't save you really a lot of CPU cycles. I found out that calling t("Show subtree") (see the example above) approximately 2500 times only takes 0.05 seconds. But if you multiply that with the number of strings you have to translate, this may grow large.

Caching serialized attributes

The default (de)serialization procedure for ActiveRecord's serialize generator does not cache its results. Caching may not be useful in all cases, of course. However, in my case, there was a total of four possible values for the serialized column ("the post has/has not a picture/a link inside"), and thousands of rows. To use caching (and short cut some ActiveRecord's default useless code), you should just pass your own serializer to the method:

I'm not sure that this behavior is documented (I extracted it from the code, and can't find anything about it in the help), but it works.

Avoid excessive render calls.

This may not be the case for you, but if you call render :partial thousands of times, this just can't be fast. Instead, use a single partial template that renders many items at once. My measurements show that this may save up to 20% of ~ 11 seconds runtime.

If your template is recursive, and you user HAML (not ERB), you'll probably have to move your code in some cases to a helper, and print raw HTML in order to keep the markup (for instance, if you use recursively-folded lists).

Raw HTML instead of Templates

If it's still not fast enough, you may print raw HTML. I don't have any measurements of the effect of this operation alone, but you may try that. This will clean up your profile, and you'll see the offenders more clearly

Make sure to use String Buffer pattern (as I encountered it in many languages, I can call it a pattern). Instead or using + operator to concatenate strings, allocate a buffer string, and append other strings to it. In some languages, such as Java or OCaml, it's not enough, and you should use a separate StringBuffer objects, but, in Ruby, Strings are mutable, and any string may serve as a buffer. By the way, HAML templates are already aware of this pattern (and, I guess, ERB templates are, too), so this only should be used if you already chose to print raw HTML on your own by other reasons (such as described in the previous section).

Other speed increase advice

As you might have noticed, my blog is very slow. It takes about 2 seconds to render any page. The blog was my first Ruby on Rails application ever, so it surely had some design mistakes. On the other hand, three years ago, Rails (version 2 at that time) was much less mature than today, and many speed improvements, such as monadic database querying, was only introduced in version 3.

I figured out the reasons why the rendering is slow, but today I just don't have enough time to fix the code. I'll just describe what mistakes I made and how to avoid it.

Touch your parents when you change!

A good advice in the real life as well, touching parents is also a good practice in Rails. Assume that you need to sort threads by the update time of a thread (I'll use the forum data model as an example). An update time is a time of creation of a post in a thread, or an update time of a post in the thread. When we click "edit" on a post, and complete updating it with save call, the default behavior is that the updated_at field of the post's database record is updated to the current time.

The updated_at record of the thread, however, is not updated automatically. Unless you do this, you will have either to write SQL queries, or just to load all threads, and check all posts for each of them. Rails' belongs_to association has :touch option, that will make it automatically "update" parents when you save children. This will solve the problem of fetching updated threads with little effort.

If you have an abstract "content" type, so that all text items on your site are stored and edited in a uniform way, you will find that :touch neatly integrates with polymorphic associations.

Unfortunately, :touch is only included in Rails 3.

Use monadic database queries

I used to fetch the last blog post in my engine by blog.posts_ordered.first, where posts_ordered is a non-standard ordering of posts. It first fetched all posts, reordered them, and selected the first, which was overly inefficient.

The solution to that would be monadic queries (see find method documentation). Rails 3 allow you to "chain" database queries joining the conditions without executing the query. After you chained all the queries, you may "bind" the query object, and execute the query. It will look like this:

Only one post will be fetched and converted into the ActiveRecord object this way.

Unfortunately, this is only included in Rails 3. In Rails 2 I had to write my own "monadic" query wrappers.

Slow and Loving It

For many cases, however, the performance of Rails is good enough. It can't compete with pure C++ web applications, of course. Rather, Rail's stronger side is describing complex behavior with little effort, with just a few lines of code. A lot of problems I encountered were due to bad performance of overly complicated data models, or with showing thousands of records on a single page. Rails is not unique in such poor performance; for instance, it takes Bugzilla 3 (written in Perl), like, 20 seconds to show a bug with seven hundred comments.

The cool thing in Rails, however, that it's a flexible framework. If you really need to display a thousand of records fast, you can do it. Nothing in the framework prevents you from creating shortcuts to its low-level functionality (Ruby language plays an important role in this), and nothing prevents you from swapping its parts with your own fast code pieces, where it's really necessary. In the forum engine I used here as an example, only a couple of views should display thousands of records. Optimized models and views may be only assigned to these "hot parts" of the application. The rest of functionality, at the same time, takes the whole benefit of the easily configurable framework, of declarative coding, and of rapid feature development, everything I love Rails for.