Recent blog updates

Shown here are only posts related to ruby. You can view all posts here.

SF Muni LED Sign at Home with Raspberry Pi

"SF Muni Sign at Home" series

Several post on how I was building a San Francisco Muni sign that shows train arrival times at home.

Something like this, except for the girl.

My android phone wakes me up with its stock alarm clock into a cold San Francisco summer morning. I lie around a bit, but eventually get up to enjoy yet another day. An hour later, I'm running in my dress shoes, tie waving in the wind like that of an anime superhero schoolboy, towards a light rail train stop. Twenty seconds before I get there, the train leaves, right before an unanticipated 20-minute break in the service. I eat a bagel I don't want in a cafe nearby to sit there and work while waiting; my day, having barely started, is already ruined.

Could I have made it? Of course, I could. I could have not re-tied my tie in order to drape it half inch shorter; I could have not procrastinated on reddit for five minutes while sipping tea, I could have postponed paying my bill to the evening. On the other hand, I don't want to finish too early and wait at home instead of in a cafe. I need something to let me know, at all times, when the next train is.


San Francisco "Muni" public transportation vehicles, like those of many other modern big cities, are equipped with real-time GPS trackers, and publish location data on the internet. My smartphone has QuickMuni app installed, so I could easily be checking it. I would pick up my phone, unlock it, wait for all the lags Adnroid greets me with, wait while the app downloads the upcoming train times, and repeat this every five minutes. This would be implausible. If only I had something like a wall clock so that it constantly displays upcoming train time predictions...

One morning I located the item I wanted in the wild. Actually, I've been seeing it every day, but was not realizing that it would solve my problem.

That is it! I can mount a similar sign on my wall, drive it with something like Raspberry Pi or even my desktop PC, and write some software to download, process, and display the train arrival times! I can even teach it to show weather forecast to quickly see if I need to take an umbrella.

The Result

The sign I used is probably not the only one with Perl API: here's something else on CPAN that is probably about these guys, but I didn't try them so I can't vouch that they work.

Well, let's go for it. I purchased Raspberry Pi from Amazon, and purchased this sign from I was looking for at least 16 rows of LEDs (to squeeze two lines of text,) and for a simple interface to draw pictures on it, as I didn't want to spend time to low-level programming. I got the sign for $89 sale price.

After several nights of hacking, I finally had it: my own, always-on customized dashboard with relevant morning information.

This sign shows that the next 38-Geary buses will come to a specific, preset stop in 2, 7, 14, 27, and 40 minutes; ellipsis means a long wait, the next line shows fewer arrivals for "backup" 5-Fulton line, and it also shows that it's 55 degrees outside right now, and it will be 56 at 8pm.

I also implemented a rendition of a real Muni sign as a byproduct of this work. It has some bugs, and it differs from a real sign, but I didn't want to spend much time on that since I was not going to use it anyway. It looks good for demonstration purposes, though. It's animated, here is a video!

The rest of the post describes what components I assembled this thing from, how I configured the Raspberry Pi, what software I used to program the dashboard and to utilize external dependencies, and what problems and bugs I encountered on the way.

Source Materials

Here's the complete list of electronic components I used:

  • Programmable LED sign - programmable with a simple API, monochrome, at least 16 pixels high, bought from this vendor the 16x96 one, with Perl API and USB interface.
  • Raspberry PI - The famous $25 energy-efficient computer I managed to buy on Amazon for $40;
  • USB Wi-fi dongle - found a uselss one, origially bought on Amazon
  • SD-card 8 Gb - Raspberry Pi uses SD-cards as "hard" disks, to which it installs an operating system. I found one in my closet.
  • Case for Raspberry Pi - bought on Amazon too; not required, but you are less afraid of moving a working piece around;
  • Micro-USB power cord - found unused in my closet
  • Mouse, Keyboard, HDMI cable (or a less digital RCA cable), and USB hub - found in my closet; used for initial set-up only, and not required to program or run software

Total cost: around $200.

Source Code

As usual, I'll begin with the source code, which is on github. Here's a local mirror, just in case The source code of a patched Next Muni gem is here.

How to Run

There's a small documentation in the README file, and sample command lines are in the daemon startup script for this spec. Basically, you select a stop on to find out your stop ID, and run client/client.rb --stopId 12345. Alternatively, you can run a morning dashboard by specifying up to two routes and stops in text, and add an URL to retrieve weather from

PI configuration notes

During the first run, connect mouse, keyboard, and Wi-Fi card through the USB-hub. Follow up the Getting Started guide to download initial image onto SD card. Boot it up, install the default Raspbian Debian-based distro.

Do select "automatically boot desktop environment on startup" when asked. While you won't need it, it's the easiest way to configure Wi-Fi. Once at the desktop, select "Wi-Fi configuration" under one of the menus accessed via Start button, and configure the wi-fi card you inserted to connect to the router. Make sure you also configure it to start up wi-fi automatically at boot.

Configure SSH server, and make sure it starts up at boot. Either configure hostname, or write down IP address. Reboot and try to SSH from another machine. If this works, disconnect mouse, keyboard, and HDMI: you don't need them anymore.

Make sure git is installed via sudo apt-get install git, download the sources, and follow the README instructions to install language runtimes. Play with script's command line options to figure out what you want to run, and configure a daemon by installing the initscript. Once confident enough in resilience of the setup, mount this onto your wall.


So the solution helped: I stopped running like a Japanese schoolboy, and started to control my morning procrastination, too. As a bonus, I had a ballpark estimation of how much it should cost taxpayers to program this sign, and, of course, I had a lot of fun, too. Finally, I now have a geeky item in my living room... although it looks to others as a stream of stock prices rather than of Muni arrival times.


The rest are less "hot" sections, detailing my experience with the tools I used for the job.


Raspberry Pi's default Linux, as a Debian derivative, contain decent support for Ruby, including 1.9. which is a bit old already, though. It takes Pi several seconds to load the interpreter and a dozen of gems I needed for this sign, but it's OK since the program only starts up once when the system is powered up.

LED Sign

So the sign I used provided a Perl API, so it would be natural to write the client program in Perl, too... not really. While I have a lot of experience programming in Perl, I would much rather prefer writing in a more interesting language, such as Python or Ruby. That was one of the reasons why I wrote a small wrapper over the Perl API, so that I could exec this simple program from any other language.

Another reason why I wrote this API was that the sign does not support two-line text in its built-in text renderer. The API allows you to supply a line of text, and the sign will happily render it, and even add some effects, but I wanted two lines in a smaller font instead. API doesn't have a "render image" function, but the sign allows you to create your own font glyphs, and insert them into text. The obvious idea of printing one 16x96 "symbol" worked. The wrapper does exactly this.

The sign can also loop through several images--just like the real Muni sign. API makes this very easy to program: this happens automatically when you send several messages. Integration with Pi went smoothly: just use $sign->send(device => "/dev/ttyUSB0"); in your program.

There were several glitches, though. First, each glyph is automatically centered, unless it's greater than about 80 pixels wide, in which case it's aligned to the left. Looping through differently aligned messages looks badly, so I force all images I send to the sign as 16x96, padding them with black pixels.

Second, there is no way to power off the sign programmatically. To power it off, I need to physically remove the USB connection, and turn it off because it starts using the battery otherwise. The sign manufacturer's intent probably was that you don't re-program the sign often, so you would program it once and then turn on and off manually. But we're programmers too; isn't working around things our bread and butter? Displaying an all-black picture did the trick of turning the sign "off".

Third, I refresh the sign each 30 seconds to ensure timely prediction, but it takes 1-2 seconds to update the sign. I suppose this is because the sign contains its own CPU and programming, or because I didn't try to play with send function arguments.

Although its internal plumbing probably affects the update speed, and I saw examples of offloading all the synchronization and flashing of the lamps to Raspberry Pi completely with different, more low-level signs, that would be too time-consuming for me to bother with. I also don't know how long the sign will last when it's re-programmed twice a minute. So far, I'm quite happy with the Bright LED sign I used and its API.

Font Rendering

I expected to spend zero time on font rendering, but it took much, much longer than I expected. Well, I was planning to write my own simple font rendering engine, but several hidden rocks were waiting for me beneath the thin sand.

First, I couldn't find a "raster" font that is just a set of bitmaps. I ended up downloading "pixelated" TTF fonts and rendering them with FontConfig library. I didn't want the sign to have a lot of dependencies, so I pre-rendeded the glyphs on my Pi in a ready-to-use manner with this script, and wrote a small library to operate them. The most fun part is that I could easily add new glyphs, such as rainfall indicators.

We also take a lot of font rendering for granted, such as aligning to center, word wrapping, kerning. I implemented some of them in a very simple manner (in Ruby!) but I feel that a couple more features would make me ragequit in favor of a real font rendering library instead.

Accessing Muni Predictions

Another reason why I chose Ruby was the Muni library in Ruby, ready-to-use for getting predictions from the NextBus system. Or so I thought: the library was missing some functionality, such as refining routes or getting predictions for all routes of a stop (this is what a Muni sign displays). It also pulled in the whole Rail stack only to display nicely formatted time string! (like, convert interval in seconds to something fancy as "over 1 hour").

I fixed some of these things, and bundled the gem with the original source for your convenience because it was easier than forking it and adding it into an official repository.

Specify routeConfig!

Apparently, official Muni monitors operate in the same terms: routeConfig is one of the API commands of NextBus service.

Anyway, NextBus system has a nice API, which is seemingly free to use if the use is limited or distributed among clients. A spawn of this site, is devoted to SF Muni.

The only thing I didn't find was the actual names of the

Weather Forecast Retrieval

Oh, well, here comes the easy part," I thought when I finished with everything else, and decided to throw in weather forecasts. I was wrong in a very interesting way.

Turns out, weather prediction APIs are usually not free. The websites I found at the top of Google search all required signups and including your personal key to all API calls. Many had free limited versions but required payment for hourly forecast--and I was mainly interested in conditions at a specific place at a specific hour of the day (San Francisco weather isn't very stable). Not that I'm opposed to paid services per se, or to parsing web pages with regexps, but I don't need that much from them to pay, do I?

I proceeded with a free government service, which is actually kind of cool despite the web design of the last century. The service is limited to the U.S., but San Francisco Muni predictions are not very useful outside of the country either. To use it, you need to provide a link to XML table for your city/area, which would look like this:

I didn't use their API, because it's a full-blown SOAP, and ain't nobody got time for that.

Raspberry Pi

Surprisingly, I don't have much to say about this. It just works! I didn't have to use it, and I could have used a desktop PC. The best part about Pi is that it works backwards, too: I successfully replaced desktop PC with it without doing anything differently. The device's operating system is simply a variation of Debian Linux, as found on desktops. I didn't have to debug through COM-port nor use specialized software to upload programs: I downloaded compilers and interpreters directly from the internet onto the device!

And now it's neatly powered from a single micro-usb port, and doesn't hike my electric bill despite being always on.

Future work

That's all I have to say for now. I need to add a couple of things, such as Muni Messages that remind users about non-working elevators and fare increases, to make the Muni experience complete. I also need automatic sign power-off when unused. I heard, there are motion detectors for Raspberry Pi, but a simple schedule would be nice, too.

Read on | Comments (1) | Make a comment >>

The Pipe-Pipe-Equals

It is hard to come up with a google-friendly name for the ||= construct you see in some programming languages quite often, "pipe-pipe-equals" being the is the closest (other names include "double-pipe equals," "or-equal," or "double-or equals"). Why should we name this monster in the first place? While cryptic at first sight, it is a very convenient shorthand that, I believe, allows us to write cleaner code. Let's see how it works on this example:

Essentially this ||= construct is equivalent to other convenience operators, such as +=. x += 2; means x = x + 2. The "or" operator || usually denotes a weak logical "or" that doesn't try to compute the left-hand side if the right-hand side is true. Usually, meaningful values evaluate to true, while the only evaluating to false are NULL pointers/empty values, zeroes, boolean Falses, and, sometimes, empty strings.

Alternatively, you could set a default hash value, but ||= is used more widely, and also takes exactly one line... doesn't it? ;-)

The statement is used mostly to set a variable to a "default" value if it's "unset" at some point. This gives the most visible advantage when you initialized dictionary elements in a loop. Here's a piece that saves counts of each array element into a dictionary (Ruby):

In some languages (Perl, Javascript), you could not even bother with this, as += 1 on an unset value would result in its assignment to 1.

If you don't initialize output[x], you'll get a runtime error for trying to increment NULL. The advantage of ||= against other ways is that you don't repeat anything. You could have written the same piece as

Oh yes... we forgot about the most natural use of ||=, for booleans. Here's how we'd check if an array contains zeroes if we need to iterate over its elements for something else.

Pipe-pipe-equals is rarely used this "natural" way, though.

But here you have to type output[x] twice and add two more lines of code, which is a complete waste of screen space and, possibly, computing resources if the interpreter doesn't optimize the duplicated computation out. Let's take a look how pipe-pipe-equals works in different languages (we've already seen it in action in Ruby).


Perl was the first interpreted language I learned, and the first place I saw the pipe-pipe-equals operator in. It works as expected, but is used less often than the direct version of the weak logical "or" to specify default parameters of a function. Here's how you'd put a default to a hash bucket:

This works in a strict mode, too. Note that, in Perl, many things evaluate to false, including empty strings (I once even tried to emulate it in Ruby). To restrict this action to undefs only, use //= in Perl6 instead, which sinks to its right like the Pisa tower, and look as if you're trying to put a division while drunk.


Python has no equivalent of this operator. You have to type the very[long(expression)] twice, and it will be computed twice. You have several different ways to do this:

The engineers more experienced in Python programming, though, assure me that this is not a big deal since you have a different way of supplying default arguments for a function, which seemingly covers half of the use cases for pipe-pipe-equals (this doesn't prevent Ruby from having both, though). Another half is covered by dictionary's dict.get(key, []) method, so that the code piece #1 can be written in a succinct manner. But I still miss it.

Bash (Linux shell)

Bash? Its language is so simplistic, and it looks creepy; how come it would have an equivalent of such a beautiful shortcut? Here it is:

This assigns y to variable if the former is unset or null (as per Bash manual). I mostly used it to set default parameters for environment variables user may or may not set before invoking the script.


While |= is a syntactically correct expression (C++ does not have the "short-circuit" version of this expression), it doesn't do what we discussed here.

C++ is statically typed, so the result of the standard logical "or" is boolean. Retaining the nice semantics we find in a dynamically typed language would require it to be "either the type of the left-hand side or the type of the right-hand side". This is hard to pull in a pass-by-value statically typed language.

Pass-by-value semantics also means that not everything can be assigned a NULL, and not everything can be converted to boolean value. C++ has default arguments as well as Python, so the same reasoning could apply here. You'll have to be more verbose in C++. That's probably why only |= expression is available, which is only useful if its left-hand-side is bool (see sidebar above for similar usage.)


Everything said above about C++ applies to OCaml as well. Moreover, OCaml, as a functional language, doesn't have a flawless support for mutation, and pipe-pipe-equals statement its inherently mutational. However, its matching operator would require us to use the very_long_variable twice. However, OCaml and other functional languages have a very interesting construct called "option". If something has "X option" type, it may contain either "nothing" or a value of x. Then, this value may be "unpacked" trough pattern matching:

let very_long_variable = match very_long_variable with None -> y | Some t -> t

here, t is not an shorthand for another very long expression; instead, it's just an identifier, written as is. The match...with allows us to "unpack" values of structured (algebraic) types with shorthands like this. Since this was too long, OCaml has made this a library function Option#default:

let very_long_variable = Option.default very_long_variable y

Anyway, OCaml programs are even more explicit than those in C++ and Python, so trying to tie pipe-pipe-equals into them is quite pointless.


"Erm, we saw how it is in Ruby at the beginning," you might think. Well, I lied to you a bit. The thing is that, in Ruby, it is not strictly equivalent to an assignment to a result of logical "or". Which do you think x ||= y is equivalent to?

In Ruby, and only for ||= and &&=, it's the second. If you assign to something other than a local variables, what looks like an assignment is actually a method call (think properties,) and, if so, this assignment does not happen at all if the left-hand side of ||= is false. Which makes sense, but looks like a special case. Read more here.

Why You Might not Need This

Some argue that this operator is mostly useless, especially if their favourite language doesn't have it. Here are some arguments they list.


Indeed, this expression is redundant. You can do the same in a multiple different ways, All the examples I demonstrated above showed how to write essentially a simple if statement in a very short-hand form. The number of characters spared is probably not worth it to include support for this feature to a language designer's must-have checklist.

The statement discussed decreases the redundancy in code in return of broader language definition; each language seeks a balance between these, and often leaves the pipe-pipe-equals aside.

Default Function Arguments

The ability to specify default function argument (C++, Python, Ruby, but not Bash or Perl) covers many use-cases for ||=. However, this doesn't help fancy loops that fill complex structures, one of which we showed above. Nor helps it when you have to specify the default parameter anyway but the concrete value is not known at the time of coding, and is an optional field user may or may not fill.


It is confusing. The pipe-pipe-equals requires explanations how it works. It becomes a special snowflake, different from its "mathematical" counterparts +=, if a complex language wants to make its very useful (see Ruby section above). While "confusion" as an excuse of not doing something is my pet peeve (and I tried to explain why), it indeed requires some effort to understand the mechanics. I'm sure that you'll love it once you understand it, and the way I learned about this operator is not by reading a textbook or "The Most Confusing and Obscure Programming Language Features Possible" newsletter, but by reading someone's code.

To Use or Not To Use?

When it comes to deciding whether or not to use pipe-pipe-equals in the "creepy" way we discussed throughout the post, the criticism somehow fades out. If the language supports this, it will be used. I haven't encountered any coding style document that bans this feature. In Ruby and Perl, it is considered as good of an idiom as many. So the answer on the question of whether or not you sohuld use pipe-pipe-equals (||=) in your programs is, definitely, "yes".

Read on | Comments (0) | Make a comment >>

Logging in Expressive Languages

I am not aware of any strict mathematical definition of expressiveness of a language. Informally, a language is more expressive if it allows you to achieve the same with less keystrokes. From this point of view, for instance, Perl or Python are more expressive than C. Of course, you can't always achieve with Python most things that C is capable of at low level, and therefore one may question whether Python is really more capable than C. But capability has little to do with expressiveness. In certain domains, one language may be more expressive than the other, and in other it could be just vice versa. (For instance, Python is more expressive in string operations while C is more expressive in low-level raw data representation.)

One may notice, however, that the majority of languages are Turing complete, and you basically may express the same with all of them. What is the difference in expressiveness then, as seen by a Practicioner rather than a Computer Scientist?

I usually use the following criterion. If you write one non-splittable expression instead of a chain of statements, the language is more expressive. Compare the following two programs that compute a sum of an array in C:

...and in Ruby:

..and, in C, we also should find a place to deallocate the array when it's no longer needed.

You may notice that Ruby can achieve with a single expression what C needs several operators and variable declarations for. Three Ruby features contribute to this, anonymous lambda functions, syntax sugar to pass them to functions (such as inject), and to avoid the really useless variable declarations. Isn't it always that beautiful?

In some cases it's not.

What does it take, to log?

Now let's try to add some logging to the programs above. Assume we actually want to hook into the loop, and make it print the square result it has calculated to a logging stream.

Let's take a look at the code again.



Ewww! Why does our Ruby code looks like C? Where is our one-liner? Where is the expressiveness? How come we're explicitly introducing a temporary variable just like in C?

I spotted such patterns, "compute-log-return" in a lot of places in my practice. For i instance, the BLAST codebase I maintain, contains these here and there, because it's written in an expressive functional language (OCaml), and uses logging as a primary debugging mechanism. Because of it, the code that should look beautiful looks as an old prostitute trying to mask their weary skin with tons of makeup, and I'm not happy with this.

Trying to get asleep on a plane, I suddenly realized that this logging could be achieved with a better code. Since "assign-debugprint-return" is a recurring pattern, let's define a helper function

Then, the resultant Ruby code will look like this:

Now our Ruby code looks like Ruby again.

Of course, this is not limited to Ruby. Though less versatile, the sane function may be implemented for the C code as well.

And this brings us back to the original comparison with 1 line in Ruby and five--in C.


while you may write such function for a less expressive language (as seen above), it will "stand out" in a large C code sheet and look like ugly duckling. What may benefit most are languages that mix functional and imperative coding styles, such as the aforementioned Ruby and OCaml. The reason why this helper might be useful is that it's relevant to the functional languages in two different ways.

First, functional language allow the programmer to define anonymous function inline, which makes them be frequently passed as parameters to other functions. In C, defining a separate function is, inmost cases, uglier than just adding a convenient logging shortcut.

Second, logging an intermediate result contradicts the functional level paradigm. In functional languages you define a computation as a whole rather than specify concrete sequence of actions. In case of logging, you have to violate a chain of functions with a side-effect operation. Of course, you can try to make it look nice, as we did, but the very reason we need the kludge is that we're doing something a bit wrong.

But it's not limited to functional languages. Most expressive languages that do not require from user a strict specification of type of every identifier can. For example, we might want to debug Perl's regular expression substitution with using a similar function:

(we couldn't use log because it conflicted with the mathematical logarithm function that bears the same name).


In any case, the "compute-log-return" function that encapsulated the debug print is a widely useful pattern. And it's especially useful when you're writing a program in an expressive language. Do not let the programming cruft conceal the actual idea of your code, use loggers that hide all the details.

Read on | Comments (0) | Make a comment >>

Counting Visits Efficiently with "Wide Caching"

One of the features of the web forum engine, a clone of which I develop in Ruby on Rails, is that it displays some site usage statistics.

There are two kins of statistics it displays. First, it tracks and shows how many times a particular message was read (i.e. clicked), tracking if a user doesn't self-click his or her message. Second, it shows the generic site usage activity, unique visitors, and page views in the top-right corner.

In this post I'll tell how I solved the problem of calculation the page view and visitors statistics. Impatient developers may jump directly to the solution section, while others are welcome to read about the path I walked from the most naïve activity tracking to the Wide Cache.

Naïve solution

The naïve solution was to store all site accesses in a table in the Application MySQL database. Each line in this table would represent one access to a site, keeping a user's hostname, and the access time. At each access, a corresponding line is inserted into the table. Then, if we are to display the information gathered, we run two simple db queries to get the pageview information (the time is the current time minus 10 minutes):

SELECT COUNT(distinct host) FROM `activities` ;
SELECT COUNT(*) FROM `activities` WHERE (created_at < '2012-04-01 18:33:25');

The writes to and reads from the activities table happened at each request. This was achieved via a before_filter in the root controller which is usually named ApplicationController, as hinted in this StackOverflow question:

It spawned another process that handled all the activity business, and didn't prevent the main process from serving pages timely.

As you might expect, the performance measurements were disastrous: the application could serve only (four) requests per second, while with all activity stuff turned off the throughput was about 40 rps.

Storing the table in memcached

The reason why the naive solution was not working as expected was lack of throughput. At first, I thought that since we could defer the database writes, and perform them after the request was served, we wont have any problems. However, this only addressed one of the performance metrics, response time, while what solution lacked was throughput.

I also didn't want to introduce other technologies specifically for activity tracking. What I had at my disposal was memcached. Other components of my application used memcached to cache database reads and whole pages. We could, I thought, cache database writes with it as well.

Rails storage with memcached backend supports two operations. The first is to write something under a string key, perhaps with a limited lifetime. the second is to read from a specified storage entry if anything has been written to it before, and if its lifetime is not over yet. That's basically all.

We could try to use memcached to store the SQL table. itself. Indeed, that table may be viewed as an array of rows, so we could just read the table, append a row, and write the result back. However, for the sake of performance, memcached doesn't support any locking, so we can't just store the table described in the naive approach in a cache bucket. Two threads may read the array, then both append the next row, and both write, one row being discarded. This is a classical race condition. And my aim was to serve 30-40 requests per second, which means that race conditions that appear if this approach is used were inevitable.

Besides, even if we could use locking (for instance, via Linux named locks (flocks)), it could even perform worse than the database, because it wouldn't be able to handle enough requests sequentially. We need a certain degree of concurrency here.

Wide caching

To mitigate the race conditions that happen during database writes, I used a technique I named "wide caching". I certainly reinvented the wheel here, but for the sake of clarity, I'll use this name in the rest of the post.

The race condition from the previous section could be mitigated by distributing the big "table" into smaller, independent tables stored in different memcached chunks. Writes to each of the chunks would not be protected by a mutex, so a race condition would be possible. We could try to mitigate it by using a round-robin chunk allocation, i.e. a new thread writes to the chunk next to the one allocated to the previous thread.

This solution, however, could be improved.

First, round-robin allocation requires a central, synchronized authority to return proper chunks. This brings the same level of technological complexity as a single bucket with a lock.

Second, round-robin doesn't guarantee the lack of race conditions either. A thread may stall long enough for the round-robin to make a whole "round", and to consider the chunk currently in process again. To avoid this, the allocator should be even more complex.

Third, let's realize that we may trade precision for speed. The result will not be absolutely accurate. People look at the site usage statistics out of curiosity, and the figures may not be precise. We'll try to make them fast first.

This all suggests a simple idea: get rid of the allocator! Just use a random bucket in each thread.

This will not prevent us from race conditions either, but it will make them predictable. If the allocation is completely random, we can carry out experiments, and extend their results in time being sure that they will be reproducible.

Decreasing effect of race conditions

The previous paragraph consisted of reasons and measures that addressed the amount of potential accesses to the same cache bucket within a period of time. What also matters is how long this cache access is. The shorter it is, the more writes to a hash bucket per second we are going to handle correctly.

Memcache request structure

Rails provides a transient interface to various caches. The interface allows to store serialized Ruby structures in the cache. This picture shows the steps that happen during an update of a table in hash, the scale roughly representing the time each of the steps take.

A typical hash bucket access consists of these steps:

  1. reading raw data from cache;
  2. deserializing, converting to Ruby format from a character format (the opposite to serializing;
  3. appending a row to the deserialized table;
  4. serializing to the character format (the opposite to deserializing);
  5. writing raw data to the cache.

We can see that steps 1, 2, 4, and 5 depend on the amount of records in the array we store in the hash bucket. The more information we record, the more time they take, and when the data are large enough, the time becomes proportional to the size. And if we just store all the activity data in cache, the size of the arrays being (de)serialized only grows in time!

How could we decrease the size of buckets? The wide cache already decreases the expected size by a factor of width, could we do better?

It turns out we can. Activity is a time-based data. The older the data are, the less interesting they become. In our use-case, we were only interested in the data for the latest 10 minutes. Of course, the routine cleanup of tables that drops all records with old enough timestamps is not sufficient: the activity data for 10 minutes are large enough (with 50 rps, and 50 being the cache width, the mean size would be 600).

Time reminds us about another idea. Why do we load and then write the old data for the table? They don't change anyway! Since we're using a generic cache solution, and can't lurk into its read/write algorithm, we can't do it on a per-record basis (i.e. do not load the table, just append the record being inserted. What can we do instead?

We may utilize another highly-accessible resource, which usage is however totally discouraged in building a reliable distributed algorithm. We have the clock that are easy and fast to access. The clock provides us with a natural discriminator: each, say, three seconds, abandon the old hash bucket, and start a new that will be responsible for storing the requests for the next three seconds. Given current time, we can always find the "layer" of the cache we should randomly select the bucket from.

Reading the data back

The discussion above was devoted to how to record the activity data. However, the data should be displayed in a timely manner. How timely? Let's consider that if the data are updated at least once per 30 seconds, it's good enough.

Then, each 30 seconds we should reconcile the activity data written into the memcached, and "compress" it to an easily readable format. Since I already had MySQL implementation before, I piggybacked on this, and merely inserted the activity information to the SQL table, so the reading procedures do not have change at all!

To get the statistics, we'll have to iterate all hash buckets the writing algorithm could fill, because gathering the ids of the buckets filled will require additional synchronization (additional writes), and, since the majority of them will be filled under high load, we'd better just collect them all. Theoretically, we should collect all the buckets that might have been filled during the last 10 minutes, the period we show the activity for. However, if we run the collecting algorithm each 30 seconds, we can only collect the data for the latest 30 seconds as well, since all the previous data should have already been collected.

We'll have another race condition here. A worker may get the current time, generate a hash bucket id X, and then stall for several seconds. During that stall, the writing worker collects the commits all the data to the MySQL table, including the piece stored in X. The first worker wakes up, and writes the visit information to X, from which it will never be read, so the request is lost.

To mitigate this race condition, we may collect and commit the data only from the layers that are deep enough. This won't help to avoid it completely, but will decrease its probability to the degree at which we can ignore it.

The final Wide Caching solution

If we assemble all of the above, we'll get the final solution that looks like this:

When a page request arrives, it asks for the current site usage statistics as one of the steps during the page printing. The statistic data are requested from the `activity` table in MySQL database, and are cached for a medium amount of time (30 seconds), because more frequent updates do not improve user experience.

After the page has been served, the request notifies the wide cache about the access to the site. First, it determines the "row" of the cache by rounding the current time in seconds since the Epoch down to the number divisible by three. Then, it determines the "column" by choosing a random number from 1 to 15. These numbers are appended; they form a unique identifier of a memcache entry. The website then reads the table from that entry, appends a row, and updates the same entry.

Dumping the collected accesses to the DB is performs like this. After notification, the request also checks if there is a live memcached entry with a special name, and a lifetime equal to 30 seconds. If there is such entry, it means that the information in the DB is obsolete, so the algorithm starts to commit the information into the MySQL DB.

While the information is being committed, there may appear other requests that would also check the lifetime of the memcached entry, and start the commit process. This is why the order, in which the memcached entries are being read is randomized, so that the amount of cases when the same information is committed twice is minimized.

Here's the code. It really looks much shorter than this post that describes it.

The latest version of the source code may be found here.


I mentioned that the setup contains race conditions that will lead to losing some requests. With the cache width of 15, and bucket height of 3 seconds, the payload of 35 requests per second made the tracker lose 350 out of 6800 requests. This is approximately 5% of total number of requests, which is acceptable. Because we randomized request queries, we may conclude that 5%—given these figures for requests per seconds, cache width, and the equipment—will be an average visit loss factor.


In previous sections, I claimed that spawn-ing threads using spawn Rails gem, and writing to the database in the spawned processes/threads is slow. Indeed, spawn reinitializes the connection in the spawned thread, and this is already a considerable burden on the server if several dozens of threads are spawned each second. Here are the experimental details (see bug #67 where I first posted them info):

MethodReqs. per sec.
No activity40.2
With activity; no spawning27.4
With activity and spawning13.0
With wide cache36.7

This shows that (a) you should not use spawning for short requests, and (b) wide cache is really fast enough.

Background periodic job or per-request?

In the algorithm described above, activity data collection was started when a request arrives, and the application finds out that the currently cached activity stats are out of date. We were careful to make this case as painless as possible, but it has other implications that are not related to race conditions, and experiments show that the loss of throughput is acceptable if we don't try to spawn a thread for each this activity.

Surprisingly, this method starts performing badly when the site activity is low rather than high. On a low-activity site, we can't rely on requests coming each second. Rather, they may arrive even less frequently that activity cache times. So, to support the low-activity case, we have to collect the information for caches that might have been filled in the latest 10 minutes (the maximum value a visit information could still be relevant for), not 30 seconds (the lowest possible time since the previous data collection). Otherwise, if users arrive less frequently than 30 seconds, the engine would always show zero activity, which would make users visit the site even less.

This wouldn't be important, unless I used HAML template engine, which—and this is a known, sad bug—doesn't flush the page until the request is complete. Even putting the code to after_filter doesn't help. Experiments demonstrate that activity data reconciliation may take up to 1 second, so some requests will be served with an excessive 1-second delay, and when the rate of requests is low, this will constitute a considerable fraction of them. And I surely don't want my users to experience frequent delays during their infrequent visits.

Spawn instead of after_filter? We have already seen that spawn-ing will make our server choke at the other extreme, when the load is high.

Luckily, we have a solution that suits both cases equally well. It is to periodically invoke the activity data collection procedure, without tying it to requests. However, the periodic invocation of a task is not straightforward in Rails. I'll get back to it in another post.

Future work

The current implementation of the activity tracker is parametrized with cache attributes (cache lifetime, and width). However, this Wide Cache could be parametrized further, with the procedures that are executed, namely:

  1. Cache bucket updating
  2. Commit procedure
  3. Reading what we previously committed

I think that I'm going to need the same cache that doesn't always touch the database for the first kind of the activity, for visits of individual posts. This parametrization will help me to keep the code DRY, and to re-use the Wide Cache.

This will require refactoring of visits and hits to a single function that calls a lambda function passed in the constructor.

Other approaches

Parse apache/nginx/rails logs.

Indeed, the topmost web serving layer already collects the activity data: it carefully logs each visit, with the URL being visited, a user's IP address and user-agent. Instead of spending time on activity in a comparatively slow Rails application, we could use the logs of a "fast" web server.

I have seen production systems that display activity data based on parsing nginx logs, and it may be integrated into Rails in such a way that it doesn't look like a kludge. Perhaps, free log parsers are already available on github... but this solution just doesn't look cool enough to me.

Do no track activity

Does anyone care about the visits at all? Maybe we just shouldn't track them?

First, from my observation, everybody tries to stick a visit tracker into a website. A bigger activity also means that you're visiting something valuable. A Russian clone of the Facebook social network even used to display a fake registered user number counter at their homepage. Such fraud could only be justified by a huge benefit from displaying it, which means that the statistics is considered valuable by at least some very popular sites.

Second, in my case, there was an easier answer. I should reimplement everything the engine I'm trying to clone contains by politics-related reasons: unless I reimplement everything, and manage to keep the site fast at the same time, my approach to the development will be considered a failure.

Use a specialized activity tracking solution

Too late. I have already implemented my own. But if you want some activity data on your website, do not hesitate to use one. It is hard, see above.


In one of the discussions on the very forum I'm reimplementing, someone wondered, why the view count on Youtube is not update in realtime for popular videos. Having gotten through a series of trial-and-failure experiments with visit tracking, I realize that the developers of Youtube surely had their reasons. My activity information is also already delayed, while the traffic volume still insignificant compared to even a single Lady Gaga video.

It seems that during the development of this tracking I have reinvented a lot of wheels. This is how, I guess, the most primitive database and file systems caches look like. Their algorithms are, of course, more sophisticated, and are not built on top of the generic cache and serialization solutions, using their own, custom ones instead.

But as any other re-invention of a wheel, it was fun, and I surely got a lot of understanding about higher-load web serving while trying to optimize this component of the forum engine.

Read on | Comments (0) | Make a comment >>

Ruby on Rails Errors and Solutions

When you develop something in Ruby on Rails, you will most likely encounter strange errors here and there. These errors may look like language or framework bugs, but they are not. Most likely, they are programming bugs you were not aware of before you encountered them.

The reasons for that are twofold. First, Ruby is a dynamically-typed language, and this alone introduces subtleties you should know about. Second, the web framework values convention over documentation. It means that methods and functions should just work as expected, without making the user read the whole documentation page.

And when something just doesn't work, you'll have troubles finding the cause in the API docs. You'll have to google the exception message, and hope to find the cause.

This post is another contribution to this Exception-Driven Development, and lists some errors a Ruby/Rails programmer may make. Some of the errors were found via web search (I tried to provide a proper attribution where possible), and some of them were solved by my own. I'll try to keep this list up-to-date, and add more errors here as I write more Rails programs.

Generic Ruby Issues

Calling setter methods

You are setting a value of an ActiveRecord field from inside a method of your model, but it does nothing? You are doing it wrong. Perhaps, you write something like this:

But the value is reset after this method. What's happening? The field = ... line actually creates a new local variable, rather than calling the object's field=(value) method ActiveRecord has created for you. Just use self to distinguish:

Note that you don't have to do the same when you read from the field—unless you have created a local variable with the same name! This is why the second line of the method doesn't have to contain self in this case.

Sometimes this strikes you from a different angle. For instance, if you have a quality attribute in your model, and you want to set conversion quality with RMagick, you have to write:

because you can't easily access the outside's variable quality from inside the block, as it's routed to the RMagick object instead.

Other gotchas

You might have specified :false instead of false when you pass a hash attributes to Ruby. I devoted a separate post to this.

Surprising Bugs

Error to clean your cookies

If you have this error:

ActionController::InvalidAuthenticityToken in SessionsController#create 

RAILS_ROOT: /home/rails/app/
Application Trace | Framework Trace | Full Trace 
/home/rails/app/vendor/rails/actionpack/lib/action_controller/request_forgery_protection.rb:79:in `verify_authenticity_token'
/home/rails/app/vendor/rails/activesupport/lib/active_support/callbacks.rb:178:in `send'

you should clear your cookies. Perhaps, you have deployed a different site under the same hostname. If this alone does not help, check if you have set up constant DOMAIN or SESSION_DOMAIN in your config (I had them in config/environments/production.rb), adjust them, and restart your server.

Strange remote_ip error

This error:

undefined method `remote_ip' for nil:NilClass

means that you accidentally named one of your actions "request". And you can't name your actions "request", as there's some clash when the engine wants to get a current request, but calls your controller method instead. Rename your action.

Spawn plugin and Rails 3

Your spawn plugin is throwing this kind of error:

spawn> Exception in child[26055] - NameError: uninitialized class variable @@connection_handler in ActiveRecord::Base

That means that it's too old for your current version of Rails. Upgrade it—to the latest commit it has (this is not on the master, but on its edge branch), or use it as a gem. If you use Rails3, here's the line for your Gemfile:

gem 'spawn', :git => 'git://', :branch => 'edge'

Commit 0e1ab99b worked for me, the corresponding gem version is 1.1). Source.

Unicode problems

UTF-8 regular expressions do not work by default? Add #coding: utf-8 as the first line of your ruby file. If it's an ERB or HAML template, add this at the beginning anyway.

If you want to make your application completely Unicode, from the database to the tips of your views, you should do all of the following.

  • Make sure that all of your source code files are in UTF-8;
  • Make sure you added # coding: utf-8 at the beginning of any source .rb file (views, initializers, library files) that actually contains non-ASCII characters;
  • Make you database Unicode:
    • make sure you set utf-8 character set for the database as a whole;
    • set utf-8 character set for each table;
    • set utf-8 character set for each text or string column in the table;
    This is especially important if you dump your database and load if from the dump on another server. In this case you may also need to cast @SET names='utf-8' in your mysql console;
  • the "default" rails gem doesn't support Unicode. Use mysql2 gem instead as your MySQL driver.

If you miss one of the steps, you may encounter problems. For instance, if you do everything except for the mysql2 gem, you may see these errors for your views:

  Showing app/views/layouts/application.html.erb where line #48  raised:
  incompatible character encodings: ASCII-8BIT and UTF-8

If you do not set up the character set for the database, migrations may create new tables with latin1 encoding, and this may result in ?????????? instead of non-ascii lines in the data user enters to your DB.

Models, Databases, and Validations

Ever-failing presence validation

Do NOT add empty? method to your objects unless you really mean it! This can strike you from you back when you don't expect.

Assume you want to validate that a "belongs_to" association presents at save. You might use validates_presence_of for this. However, as specified in the documentation, it calls Rails' Object.blank? to check presence. It, in turn, calls empty? method (documentation) if it exists.

In my forum engine, I used empty? method to convey if the post's body is empty. Therefore, you couldn't answer to posts with empty bodies, because validation thought that such posts are nonexistent.

Instead of using a complex validation, I suggest renaming the empty? method. When the author of the next model forgets—or won't even happen to know—about this bug, he/she will still try to use validates_presence_of. It's better to make it work, following the principle of least surprise.

Model name clashes

You can't name your models like Ruby classes, i.e. Thread (conflicts with internal class for threading) or Post (conflicts with HTTP library).

It's interesting, that the default scaffolding automatically names your model "Threads" when you instruct it to create a "Thread" model to avoid conflict. However, you still need to specify the proper class name in associations:

Check the scaffolded controller for other declarative methods that may also need adjustment.

Using several databases in the application

If you want to specify several databases for your app (on a per-model basis), you can use a built-in ActiveRecord capabilities. Just add a section in your config/database.yml with a distinguished name (i.e. second_db instead of production), and add this to your model:

See the docs. Found here.

Troubles with making a form with validation but without database

Rails3 has a built-in solution for creating a form that use ActiveRecord-style validations, but are not backed up by a DB table. Indeed, not reusing this functionality would be a waste. But just including ActiveModel::Validations makes your form yeild errors like this:

undefined method `to_key' for #<Loginpost:0x000000038de700>.

Listen to a short RailsCast on ActiveModel, and learn how to add some magic to your model. Here's how it should look like:

Validation of non-database attributes

Want to validate the range of numbers in your temporary attributes? You write

but get a kind of "undefined method something_before_type_cast" error:

undefined method `quality_before_type_cast' for #<Image:0x7f579f5fe940>
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/attribute_methods.rb:255:in `method_missing'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `send'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `validates_numericality_of'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:479:in `validates_each'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:476:in `each'

The error trace makes a hint about the solution:

For ActiveRecord attributes, these methods are automatically generated. However, for our attributes that are not backed by DB columns, we should create on our own.

Uniqueness validation column name error

I created a model, added an `validates_uniqueness_on`—uniqueness validation—and tried to create a valid record. However, I was getting the NoMethodError: undefined method `text?' for nil:NilClass error no matter how hard I tried:

NoMethodError: undefined method `text?' for nil:NilClass
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:57:in `build_relation'
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:25:in `validate_each'
        from /activemodel-3.2.2/lib/active_model/validator.rb:153:in `block in validate'
        from /activemodel-3.2.2/lib/active_model/validator.rb:150:in `each'

I checked the code, and realized that my uniqueness validation referred to an association name, instead of the database column name:

It seems that you must specify the exact database column name in the uniqueness validations since they are closely integrated with the database.

Connecting to a local database via TCP sockets

If you want to avoid using file sockets for connection to a local database, it's not enough to just remove the socket line. Rails will replace it with the default socket configuration if you use localhost as your host. To connect via a network socket, you have to change "localhost" to something else by adding an entry to /etc/hosts, for instance.


Accessing methods from console

Contrary to the popular belief, there is a way to easily access helpers and other useful application methods from console. Here is a couple of examples:

>> app.project_path(Project.first)
=> "/projects/130349783-with-attachments"
>> helper.link_to "Home", app.root_path
=> "Home"

You may read the complete guide here.

Get list of Rake/Capistrano tasks

When you use rake tasks, or a similar program (such as Capistrano), it's easy to forget what actions it provides. Googling for the list doesn't help either. Where can you find them?

They're right there:

These commands print the list of all tasks. With cap -e task:name, you can get a more extended help.

Redo specific migration

$ rake db:migrate:redo VERSION=20111122143422

Rake can also go several steps back/forward in migration application; you now know how to see the docs (rake -T).

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 has

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

MySQL gem and Mac

If you install mysql onto a MacOS with MySQL 5.5, you may encounter problems that show up as a strange exception:

rake aborted!
Constant MysqlCompat::MysqlRes from mysql_compat/mysql_res.rb not found
Constant MysqlRes from mysql_res.rb not found

Tasks: TOP => db:migrate
(See full trace by running task with --trace)

It turns out that the problem here is a wrong mysql gem version. Uninstall the existing gem, and install 2.7 version, fixing some errors afterwards with a symlink:

$ gem uninstall mysql
$ gem install mysql -v 2.7 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

Source: Sergey Ler told me about it.

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 supports OS-level threading, and some multithreading bugs fixed.

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

JavaScript in Rails

jQuery ajax events do not work

I used Unobtrusive jQuery JavaScript gem for Rails. I read in its Installaion instructions that jQuery is among its "requirements". I downloaded jQuery, and put it into my assets folder, and then included the //= lines into the application.js.

As a result, my jQuery didn't work. It sometimes worked, but its AJAX events did not fire although the server reported that queries had been made. The mistake was that I should had not downloaded and placed the jQuery js file into the assets folder. I should only had added the "require" lines into the application.js.

Thanks to StackOverflow... as usual.

Talking to Web APIs

Specifying POST and GET PARAMS at the same time

If you are to POST some HTTP data to a URL that has GET parameters, documentation used to elide this case. The correct way to do this is to create URI object, and supply it as a whole, not just uri.path.

res = Net::HTTP.post_form(URI('' + TOKEN + '&function=api&id=50'), {:post_param => post_value})

Alternatively, you may supply a URL instead of URI, but I recall we had issues with this, though the exact reasons are now forgotten.

Converting data structures to POST form data

It's strange that Rails doesn't have such a method (I couldn't find it in version 2.13). When you invoke API of a web service, you sometimes need to convert arrays, hashes and other structures to a result that looks like a HTML form that Rails itself uses. For instance, this structure:

should be "flattened" to a "form field name"-"form value" set like this:

The "solutions" found in the internet seem to have been written by programmers who are not aware of recursion. Here's the correct one I wrote:

Russian Support

Here's a gem that adds Russian support to your Rails messages in default form validation and helpers. Russian morphology differs from that of English. In this plugin, the issues with date and proper clause of plural nouns are solved, though the difference in nouns on form create/update nouns is not.

Read on | Comments (0) | Make a comment >>

Optimizing Massive Data Views in Ruby on Rails

Ruby on Rails is a web application development framework most known for its flexibility and heavy use of metaprogramming. Rails programs, as one of my peers described, "look like they explain what to do rather than how". That's true. This declarative nature of Rails, however, comes at a cost. At a cost of performance, of course.

A couple of days ago, I was developing a tree-shaped web board engine (some of you have probably forgot what these are; here's the engine I'm creating a clone of). I designed a data model for users, authorization, threads, messages, message revision history, etc, and implemented a comprehensive set of operations with them, such as creating, displaying, editing, and so forth. The implementations looked nice; it consisted of carefully engineered bits of declarative programming.

This is how the index page of the forum should look like:

You may notice that all messages from all threads are displayed However, when I tried to display several thousands of messages on a single web page, which is an average figure for an board like this, it appeared deadly slow. Nobody would wait 32 seconds for the index page to show. Caching would help, but as several messages are posted each minute, most cache accesses would be misses.

In the rest of the post, I'll show you how the speed of page displaying and rendering may be increased by a factor of 50 (fifty).

Data model

The key source of why the original speed was so surprisingly slow is that I designed a database layout that fits a relational database requirements, in which the data span across several tables that may be join-ed. The query to display the whole information should thus not be slow to achieve. Here's how it looked like:

The image above was generated with "MySQL Workbench" tool. If you supply a database with properly set up foreign keys, which I did manually, as Rails doesn't set them up for compatibility reasons, it will show you a nice entity-relationship diagram.

The messages to display consisted of the posts made to the newest threads, which were created within the last day. It's obvious that you could get all the necessary information from the database with a single SQL query with a lot of joins though, but no unions, the query being executed negligibly fast by even the "default" MySQL server. Why was it so slow?

How Rails make you program slower

There are two main reasons why the naïve way of displaying these posts was slow. By the "naïve way" I mean a simple view that renders each post and its parameters by querying its association attributes, like this:

You saw the tree that should appear as a result of this above.

How the controller would look like for this view? Due to database design, it should select a thread, and render a tree of its root (head) post. So the controller would look like:

So, what happens when a view is rendered? For a single thread, all messages are fetched by a single SQL query, and are converted to an array. The ActiveRecord has_many association within a thread object automatically does this; this operation is relatively fast. Then, we assemble a tree structure of the posts (it works very fast, and is not shown here; the result is returned by the build_subtree method), and render these posts in the proper order.

Trying to render a post, Rails encounters its title function that queries the latest revision of a post's title. This query is resolved as querying the TextContainer, and, subsequently, the TextItem with number = 1, and revision matching the latest_revision of the TextContainer. Then the same is repeated when Rails renders the post's user name: it queries user object for each of the posts separately. Therefore, for each post, Rails performs three separate queries to the database, each involving the whole ActiveRecord's overhead for initiating a query, finalizing it, and packing the individual objects.

These multiple and excessive database requests, followed by the subsequent conversion from SQL to "heavyweight" ActiveRecord models, contributed to approximately 25 out of 32 seconds of inefficiency at each call. Let's try to optimize them.

Optimizing ActiveRecord queries

One might infer from the reasoning in the previous paragraphs that ActiveRecord is just a bad framework for ORM. This is not true in two ways. First, the main objective of ActiveRecord is to provide an ORM that works with little configuration, or even without configuration at all. Second, ActiveRecord is mainly used for representing single-table models; in our case, however, we deal with data representation that spans several database tables.

This suggests the way to fix performance here. Indeed, we already noted that all the data required to represent the tree of posts, may be fetched with a single SELECT query. We can create an ActiveRecord model designed specifically for this query (you may call it a "view" if you like) reaping the benefit of having no configuration alongside. In the view, we alter the methods that query the information, so that they match the fields

We don't even have to alter the way the properties are queried in the view by introducing fake "proxy" objects to our new model, like this. table_tow.user returns a simple object that has a login method initialized with a value from the query. In my case that would be an overkill, but if you have a lot of views to fix, you may explore such an option.

In the case of the view presented above, we'll need the following fields, id, title, created_at, empty_body (instead of body.empty?, see sidenote), marks (which is a serialized array automatically maintained by Rails' serialize class method), unreg_name, user_login (instead of user.login, see sidenote), host, clicks, hidden. Not shown directly in the view, but used to build the posts tree is parent_id field. We'll call the model to comprise these fields FasterPost, and start with an empty model stored in app/models/faster_post.rb:

These posts will be fetched via Thread class. We already had a has_many association "Thread has many Posts", and we modify it using a standard :finder_sql attribute:

The settings_for is a global helper that denotes the current user, we'll return to it later.

While :finder_sql was originally used for slightly different purpose, we piggyback on its ability to execute arbitrary SQL query (ActiveRecord has only two such ways, the second is find_by_sql).

Let's try to fetch data bia this association, and we'll see an exception:

Mysql2::Error: Table 'database.faster_posts' doesn't exist: SHOW FIELDS FROM `faster_posts`

The original post model has a plural name (Posts) because the ruby standard library contains the singluar Post

Indeed, ActiveRecord doesn't know what columns there should be in the resulting table. We need to override a couple of methods; as of Rails 3.1.3, these are columns, and columns_hash. These methods are not really used, and we need to supply some ActiveRecord's internal structures for that. Instead of supplying the actual column information for our raw SQL, we simply copy the column information from Posts model (and cache it, as ActiveRecord itself does):

When we try to serialize this model for debugging purposes via the inspect method, we'll see that for our model is not really useful: it doesn't show all the fields. We may find the original Object's inspect method useful here:

That's all what's needed to make this model work (see sidenote for one more subtle issue). However, we may find out (with the profiler) that we still spend a lot of time inside some ActiveRecord methods. When a field method is called, it still gets through some ActiveRecord internal checks and conversions, which take a lot of time, and are, basically, useless for our lightweight model. Even that we have to enter method_missing to route the method to the actual field slows us down.

I found that ActiveRecord has a method that is close enough to the data fetched from the table, but it doesn't perform some conversions (for instance, boolean values are not converted, and are fetched as strings instead). It is read_attribute_before_type_cast (note, that its attribute should be a String, not a Symbol!). So, each fast field method should directly call it as "short cut". We can generate these methods for all the query fields with Ruby metaprogramming capabilities:

Do not forget to add overrides, which should be declared after this call, for methods that require additional processing (such as serialization):

or return Boolean or Integer values:

After these fixes are put in place, and the view is altered accordingly, all the database queries will not take any significant time. Moreover, this solution also complies to Model-View-Controller pattern, as the data are fetched in the controller, before the view is rendered. This is, probably, one of the rare cases when compliance to the Design Patterns does boost performance.

This, originally, decreased the original runtime by approx. 20 seconds out of 32. But there still are 12 seconds to optimize away. How could this be done?

Optimizing the view

ActiveRecord optimization is not enough, though. A lot of time is spent in the view itself. I had to use a profiler to do that, and made several wondrous discoveries. Using profiler is quite simple in Rails 3. All you have to do is to start a server, and invoke in the console

$ rails profiler -r 5 'get("/")'

and open tmp/performance/ProfilerTest#test_get_process_time_graph.html in your browser. An annoying downside is that it prints its log to the server's console (but you're not trying to do anything while profiling, are you?) Here are several points to optimize I found.

Link Generation with link_to

One of the greatest offenders was the code that generates links. The usual link_to and autogenerated path methods, such as post_path(post) are very slow, compared to the rest of HTML-generating code. However, if we try to generate URLs in our view on our own, it may no longer be enough to edit config/routes.rb to change all the methods in the application. We'll lose the flexibility of Rails we love it so much for!

Instead, I used a kludge for link generations. I assumed that links to all posts look the same, and the difference between them is only the post's ID. Therefore, if we generate a link for one post the Right Way (with link_to and ..._path functions), dismantle it, and create other links as simple string concatenations from the resultant parts, we'll avoid this overhead. Here's the helper I created:

It creates a link generator that generates the link in the fast way, provided all the links are the same. Calling this method from the view is as clean as calling the link_to:

The method returns a proc object (function), and brackets are used to call it with post parameter.


Declarative Authorization is a very nice plugin to manage authorization. It's declarative, and is very convenient to use once you carefully read its manual, and start naming your before_filters by its rules. However, I found out that it's terribly slow when you are to query authorization rights for thousands of posts.

I merely got rid of authorization at the index page, and moved it to a page of individual posts. But you could add more JOINs to your SQL, fetch user roles, and write some lightweight authorization methods instead, if you really need it.

Another way to optimize authorization could be the same kludge that was used in Links optimization. In most cases, if you are authorized to delete a post, you are also authorized to delete all posts in this threads (or in the whole forum). You can try to piggyback on these assumptions.

User-specific settings

How could we optimize user-specific settings on a large scale? Here's the excerpt from the view code I'm referring to:

This excerpt implements user's functionality to show/hide posts in its own view, not affecting what other users see. In the big SQL statement above you could notice that we join the hidden_posts_users to fetch the user-specific settings. Here we also encounter the same problem that with the links, but the "simple" link generator should be slightly more complicated:

but its usage will become even simpler:

Note that wile you supply the block that translates the strings thousands of times to the fast_hide function, it will be called only once, at the very beginning.


I won't dive into here too much, since it's obvious that you should not call t method that translates a string according to the current locale thousands of times for the same string. Save the result of a single call into a variable, and re-use it. It won't change as you render parts of your view anyway.

However, this doesn't save you really a lot of CPU cycles. I found out that calling t("Show subtree") (see the example above) approximately 2500 times only takes 0.05 seconds. But if you multiply that with the number of strings you have to translate, this may grow large.

Caching serialized attributes

The default (de)serialization procedure for ActiveRecord's serialize generator does not cache its results. Caching may not be useful in all cases, of course. However, in my case, there was a total of four possible values for the serialized column ("the post has/has not a picture/a link inside"), and thousands of rows. To use caching (and short cut some ActiveRecord's default useless code), you should just pass your own serializer to the method:

I'm not sure that this behavior is documented (I extracted it from the code, and can't find anything about it in the help), but it works.

Avoid excessive render calls.

This may not be the case for you, but if you call render :partial thousands of times, this just can't be fast. Instead, use a single partial template that renders many items at once. My measurements show that this may save up to 20% of ~ 11 seconds runtime.

If your template is recursive, and you user HAML (not ERB), you'll probably have to move your code in some cases to a helper, and print raw HTML in order to keep the markup (for instance, if you use recursively-folded lists).

Raw HTML instead of Templates

If it's still not fast enough, you may print raw HTML. I don't have any measurements of the effect of this operation alone, but you may try that. This will clean up your profile, and you'll see the offenders more clearly

Make sure to use String Buffer pattern (as I encountered it in many languages, I can call it a pattern). Instead or using + operator to concatenate strings, allocate a buffer string, and append other strings to it. In some languages, such as Java or OCaml, it's not enough, and you should use a separate StringBuffer objects, but, in Ruby, Strings are mutable, and any string may serve as a buffer. By the way, HAML templates are already aware of this pattern (and, I guess, ERB templates are, too), so this only should be used if you already chose to print raw HTML on your own by other reasons (such as described in the previous section).

Other speed increase advice

As you might have noticed, my blog is very slow. It takes about 2 seconds to render any page. The blog was my first Ruby on Rails application ever, so it surely had some design mistakes. On the other hand, three years ago, Rails (version 2 at that time) was much less mature than today, and many speed improvements, such as monadic database querying, was only introduced in version 3.

I figured out the reasons why the rendering is slow, but today I just don't have enough time to fix the code. I'll just describe what mistakes I made and how to avoid it.

Touch your parents when you change!

A good advice in the real life as well, touching parents is also a good practice in Rails. Assume that you need to sort threads by the update time of a thread (I'll use the forum data model as an example). An update time is a time of creation of a post in a thread, or an update time of a post in the thread. When we click "edit" on a post, and complete updating it with save call, the default behavior is that the updated_at field of the post's database record is updated to the current time.

The updated_at record of the thread, however, is not updated automatically. Unless you do this, you will have either to write SQL queries, or just to load all threads, and check all posts for each of them. Rails' belongs_to association has :touch option, that will make it automatically "update" parents when you save children. This will solve the problem of fetching updated threads with little effort.

If you have an abstract "content" type, so that all text items on your site are stored and edited in a uniform way, you will find that :touch neatly integrates with polymorphic associations.

Unfortunately, :touch is only included in Rails 3.

Use monadic database queries

I used to fetch the last blog post in my engine by blog.posts_ordered.first, where posts_ordered is a non-standard ordering of posts. It first fetched all posts, reordered them, and selected the first, which was overly inefficient.

The solution to that would be monadic queries (see find method documentation). Rails 3 allow you to "chain" database queries joining the conditions without executing the query. After you chained all the queries, you may "bind" the query object, and execute the query. It will look like this:

Only one post will be fetched and converted into the ActiveRecord object this way.

Unfortunately, this is only included in Rails 3. In Rails 2 I had to write my own "monadic" query wrappers.

Slow and Loving It

For many cases, however, the performance of Rails is good enough. It can't compete with pure C++ web applications, of course. Rather, Rail's stronger side is describing complex behavior with little effort, with just a few lines of code. A lot of problems I encountered were due to bad performance of overly complicated data models, or with showing thousands of records on a single page. Rails is not unique in such poor performance; for instance, it takes Bugzilla 3 (written in Perl), like, 20 seconds to show a bug with seven hundred comments.

The cool thing in Rails, however, that it's a flexible framework. If you really need to display a thousand of records fast, you can do it. Nothing in the framework prevents you from creating shortcuts to its low-level functionality (Ruby language plays an important role in this), and nothing prevents you from swapping its parts with your own fast code pieces, where it's really necessary. In the forum engine I used here as an example, only a couple of views should display thousands of records. Optimized models and views may be only assigned to these "hot parts" of the application. The rest of functionality, at the same time, takes the whole benefit of the easily configurable framework, of declarative coding, and of rapid feature development, everything I love Rails for.

Read on | Comments (0) | Make a comment >>

The Most Stupid Mistake You Can Make with Ruby

Programming languages have strict and mostly very compressed syntax. The origins of that are twofold.

First, actively used programming languages have a considerable history, and were started decades ago. Back then, screen space was limited, and, before that, there was little storage available for source code. Second, programmers have technically-oriented minds and like to endow abstract symbols with complex abstractions instead of using words to describe them. That's why, I guess, many programming languages heavily depend on correct usage of individual characters. Even as few characters as one may make a difference in program behavior as well as in whether program compiles.

Such caveats in different languages include:

  1. C++ templates. You can't write map<int,vector<int>> when you define a mapping from integers to arrays of integers, because C++ parsers thinks that >> as a bit-shift operator in an improper place. A correct program differs by a single space between the < signs.
  2. Makefile tabs. A rule body in the Unix Makefiles should be indented with a Tab. Spaces do not work. Being a software with more than 30 years of history, make had it fixed only a year ago.
  3. CSS delimiters. When you define a cascading style sheet, amd want to define the same block of style attributes for a certain class inside a certain tag, you write the selector as tag .class. It's just a space away from tag.class that defines the styles only for tag elements of class class.
  4. C-style equality in conditional statements. If you're a seasoned professional, you should have already forgotten about this. In many languages, including C, Ruby, Python, Javascript, C# and others, if (a = 1) is always true, since it's an assignment of 1 to a, followed by checking the a's value for truthfulness. The correct version of that has one extra equality sign: if (a == 1). More confusion is created by languages, where the first version is the legitimate way to compare values, such as Pascal.

Imagine how surprised I was when I realized that I made a variation of mistake #4 in my Ruby code today! Here's what it was about.

Ruby hash niceness

Named parameter is a way to specify function arguments at call by name (rather than by order, as in the standard function call notation).

Here you may find how the Named Parameter is implemented in various programming languages.

To emulate a Named Parameter Idiom, Ruby uses hashes and some syntax sugar. The last parameter of a function may be a hash, which maps from parameter names to values. Syntax sugar allows a programmer to write

This sugar is not unique to Ruby; Perl also supports it.

instead of

The :name notation specifies a symbolic constant, which effectively is an immutable string that is defined by only one ancillary character.

Such hashes and symbols seem as a very useful feature. It allows you to emulate DSL-s; here's an example of Ruby on Rails web framework routing configuration:

Until one day, after an hour of debugging you find yourself having written something like this:

See what's wrong here? Indeed, here's how the code of the function called might look like:

So, options[:attribute_check] should evaluate to false boolean value, but... :false is a totally different thing; it's an immutable string of five characters that evaluates to true instead! Just one colon that lurked into the code, and it made it behaving very wrong way.

Just like in a C-style typo, some expressions that are evaluated as true in boolean context look like those that are evaluated as false, and you should be careful with the borderline.

New named attribute definition style in Ruby

New named attribute passing style was not designed to address this problem. However, the abundance of colons in such a code makes it look worrying in case there is a mistake like the above:

You see that the mistake is easy to spot, because the conjunction between the name and the parameter value is so ugly, that it immediately draws attention. However, if you actually need to specify symbol as a value, then you'll have to look ugly with this style.

Moreover, you can't erase the space between the parameter name and value, because for this code:

Ruby parser will think that it's false method in an attribute_check object, as :: is a scope resolution operator, just like in C++. Space matters again, as in typo #1 desccribed above.

People say that this style resembles that of C# or JSON. So, maybe, it is a good idea to migrate to it. Only two things prevent me from doing this so far: it's not portable to previous version of Ruby, 1.8 (though it slowly becomes obsolete), and I find the old style look much more cute :-)


This was yet another typo that makes our programs behave differently than we expect them to. And again, it was just one character that breaks the expected behavior of if statements that implicitly convert the condition to boolean type. Unfortunately, while the new Ruby named parameter syntactic sugar could help, it sometimes looks even worse to me.

I hope this post will help you avoid the similar mistake if you code in Ruby.

I would like to end with a joke about a mythical C++ programmer leaving hist last commit after having been fired:

Happy debugging to you, indeed!

Read on | Comments (0) | Make a comment >>


Recently I ran a software update from Ruby on Rails version 2.3.5 to 2.3.14. You might have noticed the "maintenance" message all pages were redirected to last Sunday. Of course, after the update completed, I clicked on a bookmark entry "My Blog" in the browser window, and, having acknowledged the correct loading, prematurely considered the update completed well.

I was wrong this time. But first,...

How I perform updates on my webserver

To automate the update process, I use a couple of nonstandard scripts. During the update, the web server becomes nearly unusable, so, instead of providing poor experience to users, I decided to show a "maintenance" page while the update is in progress.

In my virtual host Apache configuration file, I use a special variable, COLDATTIC_MAINTENANCE, to distinguish whether it should ask Mongrel for a web page, or should show a simple maintenance.html page. Here's how virtual host config looks like:

# ... normal website redirects

DocumentRoot /var/www/
<Directory /var/www/>
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
# Redirect everything to index
RewriteCond /var/www/{REQUEST_URI} !-f
RewriteRule ^(.+)$ /maintenance.html [R]
ErrorDocument 503 /maintenance.html
Header always set Retry-After "18000"

This config makes the web server, if the -D COLDATTIC_MAINTENANCE is supplied to Apache's command line, to:

  • Redirect every request to maintenance.html
  • Show 503 Service Unavailable status code. Status code is important, because you should prevent web search engines think that this is an only page that's left on your site.
  • Notify bots that they should come in 5 hours (set this to average maintenance time).

To update the server I do the following:

  1. Set -D COLDATTIC_MAINTENANCE in Apache config (on my distro it's /etc/conf.d/apache2);
  2. Restarted system Apache by /etc/init.d/apache2 restart;
  3. Now, the apache is serving maintenance pages. I edit the config /etc/conf.d/apache2 back, and remove the maintenance define. However, I do not restart the server with new settings!
  4. I run a system update script, which, essentially, looks like update-software ; /etc/init.d/apache2 restart, so the server is automatically restarted without maintenance mode after the update is completed.
  5. Go to movies, since the update usually happens on early East Coast Sunday mornings, when it's Sunday Night in Russia, a good time to relax;
  6. Come home, log in to server, and deal with update failures >_<

I do not do the updates quite often, and they go well. After the update last Sunday, I noticed a sudden decrease of visits. This could be just a random spike, but after the visit number had decreased to less than ten the next day, I realized that the update broke something. I entered into the browser. It should have redirected to, but what I saw was a raw HTML text of the blog page!

How Ruby went off the Rails.

How come? I checked the logs of the website, and notices an abnormal stacktrace:

Tue Oct 18 15:52:05 +0000 2011: 
    Error calling Dispatcher.dispatch #<NoMethodError: private method `split' called for nil:NilClass>
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/cgi_process.rb:52:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:101:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:27:in `dispatch'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:76:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `synchronize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:159:in `orig_process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `orig_process_client'
/var/www/ `process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:282:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:128:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/command.rb:212:in `run'
/usr/bin/mongrel_rails:8:in `load'


It only happened when the page request should result in redirect, but it worked for direct page accesses (that's why, perhaps, the visit counter was greater than zero: people could just user their bookmarks).

I ensured the localhost version is synchronized with the production, and launched WeBrick via ./script/server to play with the website outside the production server environment. But on my local machine it worked well! (No wonder, because there's no entry of my web application in the stack trace; the whole trace consists of web server and framework subroutines.)

Solution: monkey-patching

I googled for the error, and it appeared to be more than half a year old. This bug entry is, I guess, the first place where the error was spotted. In a nutshell, new version of Rails seems to have employed a previously unsupported use-case of Rails' web server interface, Rack. It interfered with, supposedly, partly incomplete implementation of Mongrel, a specialized HTTP webserver for Rails. And mongrel updates still haven't made its way to Gentoo distribution my web server runs (-1 to Gentoo maintainers reputation counter!).

The solution a lot of people (including me) used is here. It is a small Ruby file that should be placed... into your application startup directory (for Rails it's config/initializers/)! You don't have to patch your system libraries and make new packages for your distributions: all you have to do is to add a file to the userspace. The code there uses a well-known feature of Ruby, monkey patching. It alters the implementation of external library classes. I already demonstrated how this could be used to add some Perl-like features to Ruby.

So, fixing others' bugs is another, though less legitimate, way to employ monkey-patching.


Why did this happen in the first place? Two reasons:

  • Insufficient testing after software updates. I should have tried to enter the site as an anonymous user, instead of clicking on my own bookmarks. Without the fix above, the site worked well if the user who accessed it had cookies!
  • Deploy the exactly same environment on testing machine as on the production, and perform the tests there before upgrading.

I thought that Mongrel and the whole Rails stack was stable, but, I guess, it suffers from fragmentation. Ruby on Rails stack consists of a lot of components. While this fosters innovations, the interaction between their developers is not tight, and there also is no common governance; this are the reasons why such strange bugs get shipped in stable versions.

Read on | Comments (0) | Make a comment >>

How to Use Open3 and Avoid Dead Locks

Linux pipes and open3

Three articles on how open3 is implemented in Linux, and how it is properly used

Open3 is a common name for a function that spawns a process and substitutes its standard input, output and error streams with the pipes connected to the original process. This way you may interactively feed the child with data, and read the processed data back.

The problem with open3 is that its power renders it more dangerous, and its usage—more error-prone. As the famous Spider-Man quote shares, "with great power comes great responsibility", and Linux inter-process communication is not an exception. So here comes the effective usage advice number 1.

Avoid open3 unless necessary

For the purpose of clarity, throughout the post, standard input, output, and error file handlers are called "standard filehandlers"; the process that spawns, and the process that is spawned are the "parent", and the "child" respectively.

The first advice about a safe open3 usage is... to avoid its usage. Languages that are popular in Linux world usually come with a handful of tools for process spawning, such as:

Shell redirection instead of pipes

Instead of using open3 in your program, you may try to connect process inputs and outputs with Bash redirection utilities, invoking the shell via standard functions to spawn processes.

For instance, you may write system("find | gzip >archive.gz") in Perl to spawn a process that archives recursive directory listing into a file, instead of passing the data through the parent process. You may use a usual redirection into just a text file as well, if that's what you really need.

However, I wouldn't use intermediate files if they're only to dump information temporarily for an immediate processing by something else, like this:

I consider it a bad stlye, let alone that it requires extra time on disk access and synchronization, while it may be totally unnecessary. The arguments to the commands passed that way will also undergo the interpretation by the shell, so you'll need to escape your arguments, a painful and an error-prone process.

If you need the intermediate result saved in the files for debugging purposes, you stil may dump it in debug mode, saving cycles when your project is compiled for production usage.

  • system — spawn the process, inheriting parent's standard file handlers;
  • fork+exec — ditto, but, possibly, with additional customizations you learned about in the previous post, for instance (such as tuning environment or playing with file handlers);
  • popen—open only one pipe to the process, either for reading or writing (note that popen also suffers from some of the typical misuses of pipes).
  • shell redirection untilities—Bash has a number of measures to take control over how to output and redirect input and output of the process. A section in one of my older posts is devoted to pipes in Bash. What could be helpful is the ability of using shell interpreter when spawning a process with such functions as system in Perl. For instance, you may write system("find | gzip >archive.gz") in Perl to spawn a process that archives recursive directory listing into a file, instead of passing the data through the parent process (see side note). You may use a usual redirection into file as well, if that's what you really need.

If one of these tools fulfills your needs, then go for it! You can replace it with open3 at any time anyway—unlike houses, software is relatively easy to rebuild.

Where open3 should be used

However, if you'll find the alternatives listed above inefficient, or not powerful enough, you may opt our for using open3. I'll try to enumerate the cases where I would certainly advise using open3.

What is select?

The select function (man page) is a common name for a function to wait on file handlers. Assume you develop a task scheduler that connects to several machines via sockets; it should wait for the first socket to have the result in to schedule the next task to that specific machine (since others are obviously busy). How do you accomplish that?

Of course, there is a faster and more easy-to-use alternative than trying a non-blocking read on each socket once per several milliseconds. It is using the select() function (or one of its companions, such as poll() or kpoll) to perform a blocking wait on all of them. This function will return the filehandler list of those that have the data readily available in them—as soon as there will be at least one such descriptor!

You may find a plenty of manuals on how to use select; later in this post you'll see an example.

  1. several consumers of one source—if you are given a source of data that should be redirected to several consumers (such as some data that should be both printed onto the terminal and saved into an archived log file for history keeping reasons), you should connect a pipe to its output and error streams, and when you read some amount of data from there (for instance, when you read a line), send it to all the consumers you have. In some cases you can do it with Bash (not with sh), but the code with open3 should look much more clear.
  2. aggregation of several providers—if you maintain several data sources, which all should be aggregated into a common place (a text file, for instance) that has some concurrency issues, you might benefit from open3. You may spawn the processes, and then select() (see sidenote) their output streams, thus making writing of the data to the aggregator thread-safe.
  3. interactive data exchange with a continuously-running process—you can't avoid open3 if you spawn a process that responds to your input with some text to stdout. A lot of axillary processes that are launched thousands of times, have the interface of reading from stdin, and writing to stdout (such as SAT solvers), and you most likely don't have the time to write or read anything from the disk.

One of the examples of the application of these patterns (more specifically, a combination of the first and the second) is transparent logging—it's a blatant crime to use anything but open3 for this. By transparent logging I mean combining logs from the child processes into a common logging sink in the parent one. Usually it's done automatically: the system call just routes standard output of the child to that of parent. However, assume you spawn several concurrent processes, and unless you prefix each of them with a unique identifier, you'll get lost in the log quickly.

This may be solved by opening these processes with open3, and attaching prefixes to their output lines before printing them. Note also that this way you may control severity of logs: for instance, you might want treat standard error and output streams from the children differently, and that can not be achieved with a simple popen call.

Synchronous vs. asynchronous

In this post we will mostly study issues with a synchronous processing of data exchange with the child process. Being synchronous means that nothing else but the interaction with the process is performed while the child is alive. In other words, there is only one thread in the parent.

Some notes on asynchronous processing will be given at the end of this post. Still, I consider synchronous experience extremely valuable, as it helps to shape the vision what a good asynchronous interaction via open3 should be.

Issues with open3

So far I've been telling that open3 is not straightforward to use, but what causes thee complexity? Why could there be the "Dead Locks" referenced in the title of this post? I had to learn this in the hard way, tacking cumbersome bugs in our project, and here's what it was about.

Following the pattern "aggregation of several providers", I spawned the process that wrote to both stdout and stderr with the intent to combine them both in the archived log file. The code (simplified) looked like this:

I expected to get a file that has all the lines from the stdout of the child, and prefixed lines from the stderr of the child afterwards. However, sometimes the application just deadlocked!

A quick strace demonstrated that the parent hung up on read from child's stdout, and the child hung up... on write to its stderr! How could that be?

Remember that in the first post about pipes, I listed the limited capacity as one of the properties of the pipes. I stressed it on purpose, because it plays its role right here, when you try to use open3. The pipe that connected the parent to the child was full, and the child wanted to write an error message there. While the parent was still reading from the output pipe, because the child was still running, and its pipes were not closed! That's the open3 Dead Lock.

Of course, this could happen with any pair of pipes here. Assume the process just echoes every character it takes on the input to the output (of course, real programs will be doing a useful transformations, but for clarity we may assume it as identical). We want to feed it twice as much characters as the pipe's capacity.

You might think that this won't strike you unless you're dealing with extremely long inputs and outputs. Sorry to disappoint you, but, quoting the man 7 pipe:

In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 65536 bytes.

It's not much, though, in certain cases, it's big enough to let badly written programs work. On a larger scale, we definitely need a generic, limit-agnostic solution.

How to prevent the dead lock

It's relatively simple to devise a generic rule of mitigating the effect of such a limitation. To avoid deadlocks with open3, you should clear each of the output pipes (and fill the input pipe) as soon as possible, and do not put a dependency between clearing a pipe and waiting for another pipe. So we need to watch closely to all the file handlers (up to 3) which we plan to read/write data from/to—and it's important that you wait for all of them at once! If you read the sidenote above, you already know that we could use select for this.

Using select to react to input promptly

So, a potential way to fix the program above is to write something like this:

This program, however, only outlines the approach to tackling deadlocks with open3. It is still prone to deadlocks, though less than the original one. Assume that the child started to print a line to its stdout, but haven't finished it, because it got a debugging interrupt. Having not finishing printing the line, it spits 100 Kb of debugging information to stderr with the intent to continue printing to stdout the normal output. However, the parent is still blocked in the out.readline() call, waiting for the line termination character to appear there, and the child gets blocked at the write to stderr, because the err pipe is full, and no one's going to remove data from it. Deadlock again. (You may play with this deadlock by open3-ing this sample program).

The issue here is that we still do not "remove data from pipes as soon as possible". For that, we need nonblocking reads, more low-level than those of the readline()- and scanf-like functions.

Using nonblocking reads and your own buffers

The problem with nonblocking low-level reads, as Capt. Obvious notes, is that they are low-level. We can't read more or less structured data from them. Assume that we want to read a number (a number of seconds to wait before launching the starship, for instance) from the child's stdout. If that debugging interrupt described above is triggered just in the middle of printing 1000, our nonblocking read will read 10 (before turning to reading from stderr), and act accordingly, launching the multi-billion-dollar ship prematurely. From the child's viewpoint, however, doing so is totally legitimate, since it printed 1000 and debugging information to the different output channels (stdout and stderr), and if the reader confuses these numbers, it's its problem.

Do not use strings as buffers (like I do here)

In the samples below we used "strings" as buffers. However, strings in the modern scripting languages (including those used here, as it's Ruby) consist of multi-byte characters with variable length (see Joel's post on Unicode), and not with one-byte symbols. On the other hand, in some less modern and "less scripting" languages, strings can not contain zero bytes, as they would be treated as the end of the string.

Therefore, "strings" are going to misbehave if chosen as a buffer for an abstract byte stream. I used them for simplicity, and for the sake of demonstration of open3 usage; in real programs, however, you should not use them.

Therefore, we need to handle these situations accordingly, adding another level of indirection between the child and the parent. We will store the data we read from pipes in the intermediate storage; we could name it a buffer. In fact, we are going to re-implement buffered read.

In the next example we will implement a linewise-open3, a subroutine that invokes user-defined callbacks at reading a complete line from stdin or stderr. You could play with it more, introducing scanf-like behavior (instead of these regexps you'll see). However, we have two more issues to discuss before getting to the code.

Reactive interaction with both input and output data

The select with callbacks works well for reading output with the input pipe closed at all. What should we do to if we want to write something to the input pipe of the child based what's being read from it?

To talk interactively with the child, you'll most likely need an asynchronous processing. However, there is one pattern which allows the exchange of data through both the input and the output in the synchronous mode. We already agreed that we will invoke user-defined callbacks after reading lines from stdin and stdout. However, we didn't use the return value of these callbacks in any way! The idea about this arises immediately:

If the user-defined callbacks to stdout and stderr lines return a non-empty string, we feed this string to stdin of the child as soon as possible. To get more control over the child, we treat NULL return value from a callback as a sign to close the standard input. We will also make our wrapper get a string as an input and feed it at the very beginning, as it is what could trigger the outputting of the data in the first place.

This still sounds not powerful enough, but you still have aplenty of asynchronous options, such as interrupting pselect with a specific signal, or setting up a separate thread for feeding data into the input pipe. We will omit these options in this blog post.

Watching for the child termination

As we're implementing a synchronous open3 wrapper, the natural assumption would be that at return from the wrapper the child should be dead and reaped. This way we can also return its exit status to the parent.

As we discussed in the previous post, process termination does not depend on the status of the pipes, so we should watch for it independently.

What we actually want is a version of select that watches for filehandlers and for the child's termination. If you know such a version of select (which I don't), make a comment, please. For now, let's search for another solution.

Such functionality could be simulated with a special wrapper thread that only waits for the child (with wait), and sends signal at its termination. This signal would interrupt select, and we would handle that.

In our project I implemented a simpler solution. It is based on the assumption that the more time the child is running, the less it's likely to terminate during the next second. So we can use timely wakeups to check for a process status (implemented as a non-null timeout to select), and increase the wait period with the course of time. Having the upper boundary for that period is a good idea as well.

Note that if a process has terminated, and the pipes are still open, we assume that the last nonblocking read will fetch us all the data we're interested in, and we may close the pipes. This may not be true, but in such specific cases you'll need specific solutions anyway.

Linewise-open3 code

Let's sum up what we're up to. Here's a listing of an open3 wrapper that prints the string supplied into stdin of the child, and then invokes one of two user-specified callbacks when a complete, \n-terminated line is read from stdin or stdout respectively. These callbacks may return more data to put into the stdin of the process. The execution terminates when the child is terminated. The wrapper returns the return code of the process (everything else is assumed to be done by callbacks).

I tested this program on random echo. You may also view the complete listing for open3 usage, and test it either with echo or with sshfs installed on your system.

Note also, that in our project we have also developed an open3 wrapper for Perl; you can view it here. It is less capable (without interactiveness), but it's live and working.

Notes on asynchronous open3 usage

In some cases you may trade the complexity for efficiency. The problem with the dead lock above was that we had a single thread designated to read from several pipes. Instead of introducing a buffering system, we may just spawn several threads, and attach each of them to its own endpoint. The burden of waking up the thread that actually has something to process doesn't vanish, it just gets imposed on the OS scheduler, which should contain fewer bugs. Or, to a language runtime scheduler (if it employs green threads, as in Ruby).

This may sound like an easier solution, especially in the multicore era, where developers cheer upon every possibility to develop a multithreaded program that makes the high-end hardware stall less. To me, it's a bit of an overkill for simple tasks (if the can be expressed in the terms of the interface discussed above). And in the context we used open3 in our project, too many competing threads had already started to become a problem.

Perhaps, I'll study the asynchronous option in other posts.


This post concludes what I initially planned for the series of blog posts of how to work with pipes in Linux. Another topic emerged, on the asynchronous interaction with an open3-driven process, but I don't know if I will write about it.

We have observed what are pipes, how they are used for inter-process communication, what options we have for spawning processes with pipes attached to them, and how it may be achieved in modern Linux scripting languages. However, we mainly focused on open3, studying details of its implementation, and the use-cases it's the most efficient in. We studied its quirks and how to avoid the traps set up among the joint of pipes we have to deal with.

I have learned this all spending a couple of days messing with processes that magically deadlocked without any obvious reasons, and with nontrivial multithreaded debugging. I hope these posts will help you when you will be working with pipes, so that you'll avoid my mistakes.

Read on | Comments (4) | Make a comment >>

How to Implement open3

Linux pipes and open3

Three articles on how open3 is implemented in Linux, and how it is properly used

Open3 (or popen3) is a common name for a library function to spawn a child process with its standard streams (input, output and error) both connected to newly created filehandlers in the original process. You may encounter that function in such languages as Perl, Ruby or Python.

Standard Linux C library, unfortunately, does not contain such a function. Only its less capable counterpart, popen, is a member of C library. The popen function, which is more common, opens only one pipe, either to the input or to the output of the process being spawned. This means that if you program in C you'll have to implement open3 on your own.

Another reason why you would have to re-implement open3, even if your standard (or non-standard) library provides it, is when you need more control over what happens during its call. For instance, Perl and Ruby 1.8 versions of open3 lack the ability to adjust the environment of the process being invoked (see my older rant). This functionality is why I re-implemented open3 in the project I work on.

The choice of Ruby was driven by the fact that I used it in the project I work on, and that it has a killer feature, unhandled exceptions (so that you don't have to check return code of each library call invoked, as in C). Besides, it's a lot less verbose.

The subsequent sections of this article depict sample implementation of open3 in Ruby. However, it relies on the certain system and C library calls I will also outline, so you'll be able to do the same in C or in any other language with access to such primitives.

Foundation for open3

The open3 implementation contains no magic. It relies on two powerful features of Linux system.


Good old forking comes to help in this case. The process spawning in Linux is done via fork call, cloning the current process, followed by exec call, replacing the newly spawned child with the command we wanted to invoke in the first place. It may sound unnecessarily complex, but in our case it has a feature that makes open3 as a library function possible. After fork, but prior to exec we can invoke commands in the context of the new process, while retaining the knowledge we had in the old one. How does it help us?

First, we may alter the environment of the child being invoked, which was my original motivation for reimplementing open3. Many scripting languages have a global, process-wide ENV hash with read/write access, which is used as the environment at execve system calls. If we need to alter the environment in the child process in a thread-safe way, we may modify ENV between fork and exec, so that it will be executed at the context of child process without interfering with the parent.

Second, some filehandlers (internal OS "pointers" to open files, pipes and sockets) preserve across exec! Namely, those that have the flag FD_CLOEXEC cleared. The flag is set by default (I guess, for all except standard 0, 1, and 2 descriptors), but we can change it via fcntl call.

So what we could do is to open pipes in parent process, and inculcate the knowledge about them into the child process before the target command is executed. How should we do the inculcation?

File handler renaming

The child process should spawn with the pipes instead of its STDIN, STDOUT, and STDERR. However, after forking, it already has some standard streams assigned by the OS or the language runtime. What should we do now? The answer is obvious: rename the relevant file handlers.

Closing and opening a file IO object in a high-level language (such as in C++) is not the same as "reopening" it, as the OS filehandler will not be retained. Which is extremely important to altering standard streams.

Language runtime libraries usually have a "reopen" functions (freopen in C, or IO.reopen in Ruby). The thing is that it substitutes the file pointed to by OS file handler, as well as the underlying file of the "language" file handler (this is an integer value in C, or IO object in Ruby or, say, C++).

Changing IO objects intrinsics left to the specific languages, OS filehandler swapping is performed via C library function dup2. A usual dup function just duplicates a filehandler. Called as dup2(keep,replace), it will close(replace), and re-name keep to replace. This renaming is done via fcntl system call, and should be relatively cheap. Additionally, this fcntl call will reset FD_CLOEXEC flag.

Forking and threading in POSIX is an interesting subject itself. I one had a lot of fun, reading this thread, then the FAQ of the newsgroup.

Note that fcntl and dup2 are async-signal-safe function, and therefore it can be called between the fork and exec safely in a multithreaded process.

Simple implementation

Okay, let's assemble all the tools together. Opening pipes is easy, as described before, then comes the fork, and renaming the file descriptors.

Instead of just assigning an environment variable, we may allow user to specify the actions they desire to be executed in child's context as a function pointer (or as a block, in Ruby).

Process termination control

From this simple implementation we have completely omitted an important way to control what's happening with the process. Controlling when it's terminated (or forcing it to be).

The thing is that whether the process is terminated does not depend on the status of its standard pipes. First, closing the standard input doesn't necessarily make the process terminate (as with the standard grep command). Actually, very few of them do. So if we've got all the results we wanted from our forked creature, we should have a means to kill it before it gets too late...

Second, the filehandlers we read from do not tell us whether the process is alive either. Remember that filehandlers may survive the exec if FD_CLOEXEC is not set? The process we spawned might, in turn, have given birth to a daemon, which we don't want to mess with. It could inherit the filehandlers, and just not use them. This actually happened with SSHFS mounter: the process wouldn't close the output pipes.

Therefore, pipes won't help us to control the process spawned. What could? Of course, its PID. The fork returns to us the process identifier of the child spawned, and our process, as a parent, has a full access to its status and to its return code. So, along with the pipe filehandlers, we may return child's PID to the caller for it to control the process in a more fine-grained way.

Anything else?

We've learned how to implement open3 itself. However, the bigger troubles are coming when we try to use it.

The thing is that open3 is a powerful tool, but with great power comes great responsibility. As outlined in the previous post, reads and writes to pipes may block at each endpoint, and this renders the usage of open3 error-prone. You should not use it for each process spawn. If a simpler system, fork, or popen would suffice, use them. But if it's not, you're welcome to read the next article on how to actually use open3 in an efficient and correct way.

Proceed to "How to Use Open3 and Avoid Dead Locks".

Read on | Comments (0) | Make a comment >>

More posts about ruby >>