Recent blog updates

Shown here are only posts related to web-programming. You can view all posts here.

Nonlinear Effects when Web Server is at 100% CPU

I used to wonder what this Error 502 meant when I observed when I visited my favorite websites. When I retried the error usually seemed to disappear, and I found it's linked with the service being overwhelmed with requests at certain times. But why do I see it? Why doesn't the web server start many threads leveraging OS capabilities of CPU sharing? If it queues incoming requests, why doesn't it just delay their fulfillment, just making them wait in a line longer? Just like, you know, in any cafe, bank, and public office where people form lines, and get what they needed--eventually?

There are three answers to this. First, small-latency error is more important than serving all requests. Second, there are interesting nonlinear effects that happen when the server is at capacity if you aren't using queueing. Which leads to the third point, if more requests arrive than what you can process in one second, you will not be able to serve them, ever. You have to drop them. Let's take a closer look.

Fast error is better than slow result

If a service is consuming responses of several other services, it's often in a situation where it can ignore output of one of them, and still provide value to the user. An example in the paper linked above is Google web search that can skip displaying ads, maps, and other additional information, and simply serve search results.

This alone could be a good reason to drop a request, but this doesn't sound that convincing for the frontend server that faces a human end-user. Let's explore two ways of keeping the requests.

Latency Grows Exponentially If You're Not Limiting Rate

Imagine we're not limiting the number of worker threads, and just start processing everything we receive. In other words, we do not keep a queue of requests, and rely on the operating system to provide fair distribution of computing power among working threads. Say, if we have 4 cores, and every requests takes 1 second of CPU work, we can serve at 4 QPS; this is considered "at capacity".

What would happen if a rate grows beyond 4; say, to 5? Requests arrived at the beginning would not complete when a pack of new requests arrives. At the beginning of 2nd second, 5 earlier requests each would have 20% left, and execution speed would drop twice, so they would be completed only after 1.5 seconds. At the beginning of second 2, requests that arrived at second 1 would have completed in 0.4 seconds, but 5 more requests arrive, decreasing the speed by a factor of two again. The latency for these requests would be 2 seconds. The latency for the next pack arriving at 3rd second would be as much as 2¾ seconds, etc. Here's an illustration on how CPU time is distributed among requests, requests arrived at the same second have the same color:

Let's assess how the latency grows with rate. In other words, let's compute function L(t, r), latency of a request arrived at the moment t if the rate of requests is r times higher than capacity. For instance, if our capacity is 4 QPS, but there are 5 incoming requests, then the rate parameter r = 1.25.

First, the number of in-flight requests at the beginning of a given second N(t) grows linearly with t. Indeed, N(t) < QPS*t; on the other hand, the amount of CPU time owed to previous requests is QPS*t*(r-1)/r, which equals QPS*t*(1-1/r). So, (1-1/r)*QPS*t < N(t) < QPS*t, which means that N is linear function of t, e.g. N(t) = a*t, where a > QPS*(1-1/r).

Second, the CPU time allotted to a request witinh a second [t, t+1) is at most 1/N(t). The latency of the request is then the minimal x such that S = 1/N(t) + 1/N(t+1) + ... + 1/N(t+x) ≥ r. Since N is linear, this sum equals to 1/(a*t) + 1/(a*t + a) + 1/(a*t + 2a) + ... 1/(a*t + xa). From integral calculus we recall we can approximate the sum 1/2 + 1/3 + ... + 1/x as ln(x), so S ≅ 1/a * (ln x - ln t)). Hence, x ≥ t * exp(a*r).

We proved that L(t, r) ≳ t * exp(a*r) where a > (1-1/r), so latency grows linearly with time, and exponentially with rate of incoming requests if we try to serve them all right away.

Practical results, however, seem to suggest that exponent power is sublinear with respect to r. Latency definitely grows greater than polynomially, but the true function in the power remains a mystery. The closest I came to so far is sqrt(r), but let me juggle it a bit first, and I'll share my calculations and graphs. Here's the simulation script; built-in rational numbers in Python are cool, btw.

Newly arriving requests slow down the currently executing ones, prolonging their latency. There is a way to mitigate it: queueing.

Latency Grows Linearly with Queues

Imagine the most naive solution: first come, first served queue. All incoming requests are stored in the queue, and are served in the order they arrive, handler popping them from the queue as soon as it has capacity (a free CPU share or an idle working thread).

When the server is below capacity, the queue is always empty, and the latency of each request is L0, which is the time it takes handler to construct a response. What happens when the server reaches capacity? The latency of a request that arrives at time t (we assume constant rate r) is L(t, r) = Q(t, r)*L0 + L0, where Q(t) is the queue size when the request arrives. The size of the queue is the number of undone requests, which is t*(r-1). Therefore, L(t, r) = t*(r-1)*L0 + L0, concluding that latency grows linearly with both rate and time with queues.

You Can't Serve More Requests than What You Can

This may seem obvious, but is often overlooked. If you start queueing requests and serve them later, then you need to have the capacity dedicated to these requests later. Meanwhile, the next second more requests arrive than you can serve that second, and you have these old requests, too. Now you have three options:

  1. Serve old requests first, but then your latency will start growing at least linearly as you spend more time over capacity;
  2. Serve new requests first, but then old requests will remain in the queue until the first second where more requests arrive. In most cases, this would mean waiting hours as usage pattern are usually circular.
  3. Discard some requests without spending CPU time on serving them. Which leads us to error 502.

In other words, we have to drop (r-1) of all requests regardless of which scheduling mechanism we use. Whether or not the latency grows linearly or exponentially is irrelevant: it has to grow, and this is bad. Its exponential, turbulent growth without queueing is just an interesting artefact because you have another reason to avoid starting all the incoming requests at once: it would simply choke OS scheduler.

Read on | Comments (0) | Make a comment >>

Misuse of Caching in Web Applications

I noticed a pattern in some of my web software architecture failures, and I also notice similar pattern in others' designs. In many cases, people simply misuse memcache, hurting performance of their web applications.

If performance is your goal, you can’t pretend that your computations and rendering is fast, and add transparent caching where it later appears to be slow. Transparent caching is a fairy tale programmers have been telling one another, and we’ve also been writing tools and building hardware to remain in for decades. Turns out, it doesn’t scale. A high-performance web application is designed around serving static content that--sometimes--get recomputed

  1. under very specific conditions;
  2. only partially;
  3. after very specific events;
  4. most likely, by a component separate from the one serving the content.

Where would you keep such a static content derived from the raw data (I will later call it ”derived content”)? Memory caches are often unfit for the job of storing derived data: your items may be removed from cache ("evicted") if you don't have enough memory, or if your another web application or component is working within the same cache. And evictions are much more severe than you might think.

Assume it takes your webapp several seconds to recompute the item you want to serve even if it hasn’t changed, and all requests that come during this period are going to have at least these several seconds of latency. In modern multi-component web-services high latency for a small fraction of requests makes much more harm than it seems, as corroborated by this overview paper by Googlers, so you might want to avoid it. Not to mention the simple reasoning that if this eviction happens under heavy load, the stack of HTTP requests waiting for the data to be computed might kill you.

Is there a problem with caching data altogether? Yes, and no. The idea of memoization--storing results of complex computation for later reuse--is perfectly fine. The problem with caching as I see it is that many now view caching exclusively as storing data in unreliable but fast-to-access caches (think memcached). This is not how caches in web applications have to work despite that’s how they usually do.

Caching Primer

There is another storage you can use for precomputed results, the Database (SQL, or NoSQL, or NewSQL) where you probably store the rest of your data.

Assume you are presenting users with a set of analytics over the data they entered yesterday; say some breakdown of yesterday’s sales by zip code. Say, it takes 20 seconds to compute all the data. Where would you store it?

What you could do is to cache the result of the first request, spending 20 seconds to serve it. You can be smarter, and add a cron job that hits that page every midnight. But if your data gets evicted, your next user will suffer these 20 seconds.

Instead, store the data in your persistent storage (database), the storage query being triggered by the same daily cron job. When the user requests the analytics, just serve it from there; it will never delay by 20 seconds since the data aren’t going anywhere unless you explicitly delete them, which you’ll be wise not to do needlessly. It will take you one database query to retrieve the results, and you can also cache this query in memcache for even faster results. No 20-second latency anymore, just the latency of one database select of a single row. Neat, right? Here’s a lame picture to illustrate this that shows that recomputing-on-write is more efficient as writes are less frequent usually:

Why People Use Memcache Then

So if the world of using permanent storage for caches is all ponies and rainbows, why isn’t it used everywhere? The reasons are performance tradeoff, abusing unreliability of caches, and simple storage management.

Preformance Tradeoff

If your computation is as fast as querying storage, there’s no sense to use persistent storage as cache. In rare cases when you have insufficient latency, but overprovisioned on CPU throughput persistent caching will improve your app, but most likely not. Also, if you think of updating cache often in background as triggered by user requests, you need a fast task offload solution that avoids write contention (sich as App Engine’s Task Queues).

Abusing Cache Unreliability for Eventual Consistency

Cache expiration algorithms make the state of the cache eventually correct with respect your changing data. To build a correct permanent storage cache, you must make sure that the data in this cache are explicitly updated or deleted if unused. Unreliable cache will recycle stale data for you: if a key is never going to be accessed, it will be simply evicted in favor of more recent keys without occupying extra storage. Impersistence of memcache is an asset rather than liability in such cases. One of my previous posts, about sharded counter, contains an example of this.

The problem with efficient record recycling is that you can’t update record’s access time in the persistent database every time you access it without significant performance penalty (see the previous section on fast task offloading). On the other hand, you implicitly do this on every access with specialized in-memory caches. In some cases, a background recurring thread that performs DELETE FROM cache WHERE created_at > NOW() - 600 will take care of this, but this isn’t always possible.

Automatic Storage Management

A memcache can successfully and automatically operate when you don’t have enough memory to store all the results. Its efficiency doesn’t roll downhill with the pace increase because more frequently accessed entries stay hot, while performance suffers only on less “hot” entries hence rarely. Finding space for storage of one extra column per database row is rarely a problem today, though.

Narrowing Down the Scope

Given the above limitations, storing intermediate results in database makes perfect sense when the cost of computing these results is greater than the cost of database query, and when the data are tied to other database records. Say, storing a pre-rendered version of a blog post or a wiki page that uses a custom markup (especially if it’s badly implemented like the one in this blog) makes sense: you simply update it on write, and remove if the post is deleted. Storing a result of 10-minutes analytics query in a table rather than in a cache also makes sense (and some still rely on memcache to hold such data). Storing a join by key of two SQL tables is probably not worth it.

You need to design your application so that it is capable of making derived data updates after writes. Relying on caching will hurt you in the long run, because your application will be improperly architectured.

Read on | Comments (0) | Make a comment >>

Counting Visits Efficiently with "Wide Caching"

One of the features of the web forum engine, a clone of which I develop in Ruby on Rails, is that it displays some site usage statistics.

There are two kins of statistics it displays. First, it tracks and shows how many times a particular message was read (i.e. clicked), tracking if a user doesn't self-click his or her message. Second, it shows the generic site usage activity, unique visitors, and page views in the top-right corner.

In this post I'll tell how I solved the problem of calculation the page view and visitors statistics. Impatient developers may jump directly to the solution section, while others are welcome to read about the path I walked from the most naïve activity tracking to the Wide Cache.

Naïve solution

The naïve solution was to store all site accesses in a table in the Application MySQL database. Each line in this table would represent one access to a site, keeping a user's hostname, and the access time. At each access, a corresponding line is inserted into the table. Then, if we are to display the information gathered, we run two simple db queries to get the pageview information (the time is the current time minus 10 minutes):

SELECT COUNT(distinct host) FROM `activities` ;
SELECT COUNT(*) FROM `activities` WHERE (created_at < '2012-04-01 18:33:25');

The writes to and reads from the activities table happened at each request. This was achieved via a before_filter in the root controller which is usually named ApplicationController, as hinted in this StackOverflow question:

It spawned another process that handled all the activity business, and didn't prevent the main process from serving pages timely.

As you might expect, the performance measurements were disastrous: the application could serve only (four) requests per second, while with all activity stuff turned off the throughput was about 40 rps.

Storing the table in memcached

The reason why the naive solution was not working as expected was lack of throughput. At first, I thought that since we could defer the database writes, and perform them after the request was served, we wont have any problems. However, this only addressed one of the performance metrics, response time, while what solution lacked was throughput.

I also didn't want to introduce other technologies specifically for activity tracking. What I had at my disposal was memcached. Other components of my application used memcached to cache database reads and whole pages. We could, I thought, cache database writes with it as well.

Rails storage with memcached backend supports two operations. The first is to write something under a string key, perhaps with a limited lifetime. the second is to read from a specified storage entry if anything has been written to it before, and if its lifetime is not over yet. That's basically all.

We could try to use memcached to store the SQL table. itself. Indeed, that table may be viewed as an array of rows, so we could just read the table, append a row, and write the result back. However, for the sake of performance, memcached doesn't support any locking, so we can't just store the table described in the naive approach in a cache bucket. Two threads may read the array, then both append the next row, and both write, one row being discarded. This is a classical race condition. And my aim was to serve 30-40 requests per second, which means that race conditions that appear if this approach is used were inevitable.

Besides, even if we could use locking (for instance, via Linux named locks (flocks)), it could even perform worse than the database, because it wouldn't be able to handle enough requests sequentially. We need a certain degree of concurrency here.

Wide caching

To mitigate the race conditions that happen during database writes, I used a technique I named "wide caching". I certainly reinvented the wheel here, but for the sake of clarity, I'll use this name in the rest of the post.

The race condition from the previous section could be mitigated by distributing the big "table" into smaller, independent tables stored in different memcached chunks. Writes to each of the chunks would not be protected by a mutex, so a race condition would be possible. We could try to mitigate it by using a round-robin chunk allocation, i.e. a new thread writes to the chunk next to the one allocated to the previous thread.

This solution, however, could be improved.

First, round-robin allocation requires a central, synchronized authority to return proper chunks. This brings the same level of technological complexity as a single bucket with a lock.

Second, round-robin doesn't guarantee the lack of race conditions either. A thread may stall long enough for the round-robin to make a whole "round", and to consider the chunk currently in process again. To avoid this, the allocator should be even more complex.

Third, let's realize that we may trade precision for speed. The result will not be absolutely accurate. People look at the site usage statistics out of curiosity, and the figures may not be precise. We'll try to make them fast first.

This all suggests a simple idea: get rid of the allocator! Just use a random bucket in each thread.

This will not prevent us from race conditions either, but it will make them predictable. If the allocation is completely random, we can carry out experiments, and extend their results in time being sure that they will be reproducible.

Decreasing effect of race conditions

The previous paragraph consisted of reasons and measures that addressed the amount of potential accesses to the same cache bucket within a period of time. What also matters is how long this cache access is. The shorter it is, the more writes to a hash bucket per second we are going to handle correctly.

Memcache request structure

Rails provides a transient interface to various caches. The interface allows to store serialized Ruby structures in the cache. This picture shows the steps that happen during an update of a table in hash, the scale roughly representing the time each of the steps take.

A typical hash bucket access consists of these steps:

  1. reading raw data from cache;
  2. deserializing, converting to Ruby format from a character format (the opposite to serializing;
  3. appending a row to the deserialized table;
  4. serializing to the character format (the opposite to deserializing);
  5. writing raw data to the cache.

We can see that steps 1, 2, 4, and 5 depend on the amount of records in the array we store in the hash bucket. The more information we record, the more time they take, and when the data are large enough, the time becomes proportional to the size. And if we just store all the activity data in cache, the size of the arrays being (de)serialized only grows in time!

How could we decrease the size of buckets? The wide cache already decreases the expected size by a factor of width, could we do better?

It turns out we can. Activity is a time-based data. The older the data are, the less interesting they become. In our use-case, we were only interested in the data for the latest 10 minutes. Of course, the routine cleanup of tables that drops all records with old enough timestamps is not sufficient: the activity data for 10 minutes are large enough (with 50 rps, and 50 being the cache width, the mean size would be 600).

Time reminds us about another idea. Why do we load and then write the old data for the table? They don't change anyway! Since we're using a generic cache solution, and can't lurk into its read/write algorithm, we can't do it on a per-record basis (i.e. do not load the table, just append the record being inserted. What can we do instead?

We may utilize another highly-accessible resource, which usage is however totally discouraged in building a reliable distributed algorithm. We have the clock that are easy and fast to access. The clock provides us with a natural discriminator: each, say, three seconds, abandon the old hash bucket, and start a new that will be responsible for storing the requests for the next three seconds. Given current time, we can always find the "layer" of the cache we should randomly select the bucket from.

Reading the data back

The discussion above was devoted to how to record the activity data. However, the data should be displayed in a timely manner. How timely? Let's consider that if the data are updated at least once per 30 seconds, it's good enough.

Then, each 30 seconds we should reconcile the activity data written into the memcached, and "compress" it to an easily readable format. Since I already had MySQL implementation before, I piggybacked on this, and merely inserted the activity information to the SQL table, so the reading procedures do not have change at all!

To get the statistics, we'll have to iterate all hash buckets the writing algorithm could fill, because gathering the ids of the buckets filled will require additional synchronization (additional writes), and, since the majority of them will be filled under high load, we'd better just collect them all. Theoretically, we should collect all the buckets that might have been filled during the last 10 minutes, the period we show the activity for. However, if we run the collecting algorithm each 30 seconds, we can only collect the data for the latest 30 seconds as well, since all the previous data should have already been collected.

We'll have another race condition here. A worker may get the current time, generate a hash bucket id X, and then stall for several seconds. During that stall, the writing worker collects the commits all the data to the MySQL table, including the piece stored in X. The first worker wakes up, and writes the visit information to X, from which it will never be read, so the request is lost.

To mitigate this race condition, we may collect and commit the data only from the layers that are deep enough. This won't help to avoid it completely, but will decrease its probability to the degree at which we can ignore it.

The final Wide Caching solution

If we assemble all of the above, we'll get the final solution that looks like this:

When a page request arrives, it asks for the current site usage statistics as one of the steps during the page printing. The statistic data are requested from the `activity` table in MySQL database, and are cached for a medium amount of time (30 seconds), because more frequent updates do not improve user experience.

After the page has been served, the request notifies the wide cache about the access to the site. First, it determines the "row" of the cache by rounding the current time in seconds since the Epoch down to the number divisible by three. Then, it determines the "column" by choosing a random number from 1 to 15. These numbers are appended; they form a unique identifier of a memcache entry. The website then reads the table from that entry, appends a row, and updates the same entry.

Dumping the collected accesses to the DB is performs like this. After notification, the request also checks if there is a live memcached entry with a special name, and a lifetime equal to 30 seconds. If there is such entry, it means that the information in the DB is obsolete, so the algorithm starts to commit the information into the MySQL DB.

While the information is being committed, there may appear other requests that would also check the lifetime of the memcached entry, and start the commit process. This is why the order, in which the memcached entries are being read is randomized, so that the amount of cases when the same information is committed twice is minimized.

Here's the code. It really looks much shorter than this post that describes it.

The latest version of the source code may be found here.


I mentioned that the setup contains race conditions that will lead to losing some requests. With the cache width of 15, and bucket height of 3 seconds, the payload of 35 requests per second made the tracker lose 350 out of 6800 requests. This is approximately 5% of total number of requests, which is acceptable. Because we randomized request queries, we may conclude that 5%—given these figures for requests per seconds, cache width, and the equipment—will be an average visit loss factor.


In previous sections, I claimed that spawn-ing threads using spawn Rails gem, and writing to the database in the spawned processes/threads is slow. Indeed, spawn reinitializes the connection in the spawned thread, and this is already a considerable burden on the server if several dozens of threads are spawned each second. Here are the experimental details (see bug #67 where I first posted them info):

MethodReqs. per sec.
No activity40.2
With activity; no spawning27.4
With activity and spawning13.0
With wide cache36.7

This shows that (a) you should not use spawning for short requests, and (b) wide cache is really fast enough.

Background periodic job or per-request?

In the algorithm described above, activity data collection was started when a request arrives, and the application finds out that the currently cached activity stats are out of date. We were careful to make this case as painless as possible, but it has other implications that are not related to race conditions, and experiments show that the loss of throughput is acceptable if we don't try to spawn a thread for each this activity.

Surprisingly, this method starts performing badly when the site activity is low rather than high. On a low-activity site, we can't rely on requests coming each second. Rather, they may arrive even less frequently that activity cache times. So, to support the low-activity case, we have to collect the information for caches that might have been filled in the latest 10 minutes (the maximum value a visit information could still be relevant for), not 30 seconds (the lowest possible time since the previous data collection). Otherwise, if users arrive less frequently than 30 seconds, the engine would always show zero activity, which would make users visit the site even less.

This wouldn't be important, unless I used HAML template engine, which—and this is a known, sad bug—doesn't flush the page until the request is complete. Even putting the code to after_filter doesn't help. Experiments demonstrate that activity data reconciliation may take up to 1 second, so some requests will be served with an excessive 1-second delay, and when the rate of requests is low, this will constitute a considerable fraction of them. And I surely don't want my users to experience frequent delays during their infrequent visits.

Spawn instead of after_filter? We have already seen that spawn-ing will make our server choke at the other extreme, when the load is high.

Luckily, we have a solution that suits both cases equally well. It is to periodically invoke the activity data collection procedure, without tying it to requests. However, the periodic invocation of a task is not straightforward in Rails. I'll get back to it in another post.

Future work

The current implementation of the activity tracker is parametrized with cache attributes (cache lifetime, and width). However, this Wide Cache could be parametrized further, with the procedures that are executed, namely:

  1. Cache bucket updating
  2. Commit procedure
  3. Reading what we previously committed

I think that I'm going to need the same cache that doesn't always touch the database for the first kind of the activity, for visits of individual posts. This parametrization will help me to keep the code DRY, and to re-use the Wide Cache.

This will require refactoring of visits and hits to a single function that calls a lambda function passed in the constructor.

Other approaches

Parse apache/nginx/rails logs.

Indeed, the topmost web serving layer already collects the activity data: it carefully logs each visit, with the URL being visited, a user's IP address and user-agent. Instead of spending time on activity in a comparatively slow Rails application, we could use the logs of a "fast" web server.

I have seen production systems that display activity data based on parsing nginx logs, and it may be integrated into Rails in such a way that it doesn't look like a kludge. Perhaps, free log parsers are already available on github... but this solution just doesn't look cool enough to me.

Do no track activity

Does anyone care about the visits at all? Maybe we just shouldn't track them?

First, from my observation, everybody tries to stick a visit tracker into a website. A bigger activity also means that you're visiting something valuable. A Russian clone of the Facebook social network even used to display a fake registered user number counter at their homepage. Such fraud could only be justified by a huge benefit from displaying it, which means that the statistics is considered valuable by at least some very popular sites.

Second, in my case, there was an easier answer. I should reimplement everything the engine I'm trying to clone contains by politics-related reasons: unless I reimplement everything, and manage to keep the site fast at the same time, my approach to the development will be considered a failure.

Use a specialized activity tracking solution

Too late. I have already implemented my own. But if you want some activity data on your website, do not hesitate to use one. It is hard, see above.


In one of the discussions on the very forum I'm reimplementing, someone wondered, why the view count on Youtube is not update in realtime for popular videos. Having gotten through a series of trial-and-failure experiments with visit tracking, I realize that the developers of Youtube surely had their reasons. My activity information is also already delayed, while the traffic volume still insignificant compared to even a single Lady Gaga video.

It seems that during the development of this tracking I have reinvented a lot of wheels. This is how, I guess, the most primitive database and file systems caches look like. Their algorithms are, of course, more sophisticated, and are not built on top of the generic cache and serialization solutions, using their own, custom ones instead.

But as any other re-invention of a wheel, it was fun, and I surely got a lot of understanding about higher-load web serving while trying to optimize this component of the forum engine.

Read on | Comments (0) | Make a comment >>

Ruby on Rails Errors and Solutions

When you develop something in Ruby on Rails, you will most likely encounter strange errors here and there. These errors may look like language or framework bugs, but they are not. Most likely, they are programming bugs you were not aware of before you encountered them.

The reasons for that are twofold. First, Ruby is a dynamically-typed language, and this alone introduces subtleties you should know about. Second, the web framework values convention over documentation. It means that methods and functions should just work as expected, without making the user read the whole documentation page.

And when something just doesn't work, you'll have troubles finding the cause in the API docs. You'll have to google the exception message, and hope to find the cause.

This post is another contribution to this Exception-Driven Development, and lists some errors a Ruby/Rails programmer may make. Some of the errors were found via web search (I tried to provide a proper attribution where possible), and some of them were solved by my own. I'll try to keep this list up-to-date, and add more errors here as I write more Rails programs.

Generic Ruby Issues

Calling setter methods

You are setting a value of an ActiveRecord field from inside a method of your model, but it does nothing? You are doing it wrong. Perhaps, you write something like this:

But the value is reset after this method. What's happening? The field = ... line actually creates a new local variable, rather than calling the object's field=(value) method ActiveRecord has created for you. Just use self to distinguish:

Note that you don't have to do the same when you read from the field—unless you have created a local variable with the same name! This is why the second line of the method doesn't have to contain self in this case.

Sometimes this strikes you from a different angle. For instance, if you have a quality attribute in your model, and you want to set conversion quality with RMagick, you have to write:

because you can't easily access the outside's variable quality from inside the block, as it's routed to the RMagick object instead.

Other gotchas

You might have specified :false instead of false when you pass a hash attributes to Ruby. I devoted a separate post to this.

Surprising Bugs

Error to clean your cookies

If you have this error:

ActionController::InvalidAuthenticityToken in SessionsController#create 

RAILS_ROOT: /home/rails/app/
Application Trace | Framework Trace | Full Trace 
/home/rails/app/vendor/rails/actionpack/lib/action_controller/request_forgery_protection.rb:79:in `verify_authenticity_token'
/home/rails/app/vendor/rails/activesupport/lib/active_support/callbacks.rb:178:in `send'

you should clear your cookies. Perhaps, you have deployed a different site under the same hostname. If this alone does not help, check if you have set up constant DOMAIN or SESSION_DOMAIN in your config (I had them in config/environments/production.rb), adjust them, and restart your server.

Strange remote_ip error

This error:

undefined method `remote_ip' for nil:NilClass

means that you accidentally named one of your actions "request". And you can't name your actions "request", as there's some clash when the engine wants to get a current request, but calls your controller method instead. Rename your action.

Spawn plugin and Rails 3

Your spawn plugin is throwing this kind of error:

spawn> Exception in child[26055] - NameError: uninitialized class variable @@connection_handler in ActiveRecord::Base

That means that it's too old for your current version of Rails. Upgrade it—to the latest commit it has (this is not on the master, but on its edge branch), or use it as a gem. If you use Rails3, here's the line for your Gemfile:

gem 'spawn', :git => 'git://', :branch => 'edge'

Commit 0e1ab99b worked for me, the corresponding gem version is 1.1). Source.

Unicode problems

UTF-8 regular expressions do not work by default? Add #coding: utf-8 as the first line of your ruby file. If it's an ERB or HAML template, add this at the beginning anyway.

If you want to make your application completely Unicode, from the database to the tips of your views, you should do all of the following.

  • Make sure that all of your source code files are in UTF-8;
  • Make sure you added # coding: utf-8 at the beginning of any source .rb file (views, initializers, library files) that actually contains non-ASCII characters;
  • Make you database Unicode:
    • make sure you set utf-8 character set for the database as a whole;
    • set utf-8 character set for each table;
    • set utf-8 character set for each text or string column in the table;
    This is especially important if you dump your database and load if from the dump on another server. In this case you may also need to cast @SET names='utf-8' in your mysql console;
  • the "default" rails gem doesn't support Unicode. Use mysql2 gem instead as your MySQL driver.

If you miss one of the steps, you may encounter problems. For instance, if you do everything except for the mysql2 gem, you may see these errors for your views:

  Showing app/views/layouts/application.html.erb where line #48  raised:
  incompatible character encodings: ASCII-8BIT and UTF-8

If you do not set up the character set for the database, migrations may create new tables with latin1 encoding, and this may result in ?????????? instead of non-ascii lines in the data user enters to your DB.

Models, Databases, and Validations

Ever-failing presence validation

Do NOT add empty? method to your objects unless you really mean it! This can strike you from you back when you don't expect.

Assume you want to validate that a "belongs_to" association presents at save. You might use validates_presence_of for this. However, as specified in the documentation, it calls Rails' Object.blank? to check presence. It, in turn, calls empty? method (documentation) if it exists.

In my forum engine, I used empty? method to convey if the post's body is empty. Therefore, you couldn't answer to posts with empty bodies, because validation thought that such posts are nonexistent.

Instead of using a complex validation, I suggest renaming the empty? method. When the author of the next model forgets—or won't even happen to know—about this bug, he/she will still try to use validates_presence_of. It's better to make it work, following the principle of least surprise.

Model name clashes

You can't name your models like Ruby classes, i.e. Thread (conflicts with internal class for threading) or Post (conflicts with HTTP library).

It's interesting, that the default scaffolding automatically names your model "Threads" when you instruct it to create a "Thread" model to avoid conflict. However, you still need to specify the proper class name in associations:

Check the scaffolded controller for other declarative methods that may also need adjustment.

Using several databases in the application

If you want to specify several databases for your app (on a per-model basis), you can use a built-in ActiveRecord capabilities. Just add a section in your config/database.yml with a distinguished name (i.e. second_db instead of production), and add this to your model:

See the docs. Found here.

Troubles with making a form with validation but without database

Rails3 has a built-in solution for creating a form that use ActiveRecord-style validations, but are not backed up by a DB table. Indeed, not reusing this functionality would be a waste. But just including ActiveModel::Validations makes your form yeild errors like this:

undefined method `to_key' for #<Loginpost:0x000000038de700>.

Listen to a short RailsCast on ActiveModel, and learn how to add some magic to your model. Here's how it should look like:

Validation of non-database attributes

Want to validate the range of numbers in your temporary attributes? You write

but get a kind of "undefined method something_before_type_cast" error:

undefined method `quality_before_type_cast' for #<Image:0x7f579f5fe940>
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/attribute_methods.rb:255:in `method_missing'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `send'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `validates_numericality_of'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:479:in `validates_each'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:476:in `each'

The error trace makes a hint about the solution:

For ActiveRecord attributes, these methods are automatically generated. However, for our attributes that are not backed by DB columns, we should create on our own.

Uniqueness validation column name error

I created a model, added an `validates_uniqueness_on`—uniqueness validation—and tried to create a valid record. However, I was getting the NoMethodError: undefined method `text?' for nil:NilClass error no matter how hard I tried:

NoMethodError: undefined method `text?' for nil:NilClass
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:57:in `build_relation'
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:25:in `validate_each'
        from /activemodel-3.2.2/lib/active_model/validator.rb:153:in `block in validate'
        from /activemodel-3.2.2/lib/active_model/validator.rb:150:in `each'

I checked the code, and realized that my uniqueness validation referred to an association name, instead of the database column name:

It seems that you must specify the exact database column name in the uniqueness validations since they are closely integrated with the database.

Connecting to a local database via TCP sockets

If you want to avoid using file sockets for connection to a local database, it's not enough to just remove the socket line. Rails will replace it with the default socket configuration if you use localhost as your host. To connect via a network socket, you have to change "localhost" to something else by adding an entry to /etc/hosts, for instance.


Accessing methods from console

Contrary to the popular belief, there is a way to easily access helpers and other useful application methods from console. Here is a couple of examples:

>> app.project_path(Project.first)
=> "/projects/130349783-with-attachments"
>> helper.link_to "Home", app.root_path
=> "Home"

You may read the complete guide here.

Get list of Rake/Capistrano tasks

When you use rake tasks, or a similar program (such as Capistrano), it's easy to forget what actions it provides. Googling for the list doesn't help either. Where can you find them?

They're right there:

These commands print the list of all tasks. With cap -e task:name, you can get a more extended help.

Redo specific migration

$ rake db:migrate:redo VERSION=20111122143422

Rake can also go several steps back/forward in migration application; you now know how to see the docs (rake -T).

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 has

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

MySQL gem and Mac

If you install mysql onto a MacOS with MySQL 5.5, you may encounter problems that show up as a strange exception:

rake aborted!
Constant MysqlCompat::MysqlRes from mysql_compat/mysql_res.rb not found
Constant MysqlRes from mysql_res.rb not found

Tasks: TOP => db:migrate
(See full trace by running task with --trace)

It turns out that the problem here is a wrong mysql gem version. Uninstall the existing gem, and install 2.7 version, fixing some errors afterwards with a symlink:

$ gem uninstall mysql
$ gem install mysql -v 2.7 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

Source: Sergey Ler told me about it.

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 supports OS-level threading, and some multithreading bugs fixed.

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

JavaScript in Rails

jQuery ajax events do not work

I used Unobtrusive jQuery JavaScript gem for Rails. I read in its Installaion instructions that jQuery is among its "requirements". I downloaded jQuery, and put it into my assets folder, and then included the //= lines into the application.js.

As a result, my jQuery didn't work. It sometimes worked, but its AJAX events did not fire although the server reported that queries had been made. The mistake was that I should had not downloaded and placed the jQuery js file into the assets folder. I should only had added the "require" lines into the application.js.

Thanks to StackOverflow... as usual.

Talking to Web APIs

Specifying POST and GET PARAMS at the same time

If you are to POST some HTTP data to a URL that has GET parameters, documentation used to elide this case. The correct way to do this is to create URI object, and supply it as a whole, not just uri.path.

res = Net::HTTP.post_form(URI('' + TOKEN + '&function=api&id=50'), {:post_param => post_value})

Alternatively, you may supply a URL instead of URI, but I recall we had issues with this, though the exact reasons are now forgotten.

Converting data structures to POST form data

It's strange that Rails doesn't have such a method (I couldn't find it in version 2.13). When you invoke API of a web service, you sometimes need to convert arrays, hashes and other structures to a result that looks like a HTML form that Rails itself uses. For instance, this structure:

should be "flattened" to a "form field name"-"form value" set like this:

The "solutions" found in the internet seem to have been written by programmers who are not aware of recursion. Here's the correct one I wrote:

Russian Support

Here's a gem that adds Russian support to your Rails messages in default form validation and helpers. Russian morphology differs from that of English. In this plugin, the issues with date and proper clause of plural nouns are solved, though the difference in nouns on form create/update nouns is not.

Read on | Comments (0) | Make a comment >>

Optimizing Massive Data Views in Ruby on Rails

Ruby on Rails is a web application development framework most known for its flexibility and heavy use of metaprogramming. Rails programs, as one of my peers described, "look like they explain what to do rather than how". That's true. This declarative nature of Rails, however, comes at a cost. At a cost of performance, of course.

A couple of days ago, I was developing a tree-shaped web board engine (some of you have probably forgot what these are; here's the engine I'm creating a clone of). I designed a data model for users, authorization, threads, messages, message revision history, etc, and implemented a comprehensive set of operations with them, such as creating, displaying, editing, and so forth. The implementations looked nice; it consisted of carefully engineered bits of declarative programming.

This is how the index page of the forum should look like:

You may notice that all messages from all threads are displayed However, when I tried to display several thousands of messages on a single web page, which is an average figure for an board like this, it appeared deadly slow. Nobody would wait 32 seconds for the index page to show. Caching would help, but as several messages are posted each minute, most cache accesses would be misses.

In the rest of the post, I'll show you how the speed of page displaying and rendering may be increased by a factor of 50 (fifty).

Data model

The key source of why the original speed was so surprisingly slow is that I designed a database layout that fits a relational database requirements, in which the data span across several tables that may be join-ed. The query to display the whole information should thus not be slow to achieve. Here's how it looked like:

The image above was generated with "MySQL Workbench" tool. If you supply a database with properly set up foreign keys, which I did manually, as Rails doesn't set them up for compatibility reasons, it will show you a nice entity-relationship diagram.

The messages to display consisted of the posts made to the newest threads, which were created within the last day. It's obvious that you could get all the necessary information from the database with a single SQL query with a lot of joins though, but no unions, the query being executed negligibly fast by even the "default" MySQL server. Why was it so slow?

How Rails make you program slower

There are two main reasons why the naïve way of displaying these posts was slow. By the "naïve way" I mean a simple view that renders each post and its parameters by querying its association attributes, like this:

You saw the tree that should appear as a result of this above.

How the controller would look like for this view? Due to database design, it should select a thread, and render a tree of its root (head) post. So the controller would look like:

So, what happens when a view is rendered? For a single thread, all messages are fetched by a single SQL query, and are converted to an array. The ActiveRecord has_many association within a thread object automatically does this; this operation is relatively fast. Then, we assemble a tree structure of the posts (it works very fast, and is not shown here; the result is returned by the build_subtree method), and render these posts in the proper order.

Trying to render a post, Rails encounters its title function that queries the latest revision of a post's title. This query is resolved as querying the TextContainer, and, subsequently, the TextItem with number = 1, and revision matching the latest_revision of the TextContainer. Then the same is repeated when Rails renders the post's user name: it queries user object for each of the posts separately. Therefore, for each post, Rails performs three separate queries to the database, each involving the whole ActiveRecord's overhead for initiating a query, finalizing it, and packing the individual objects.

These multiple and excessive database requests, followed by the subsequent conversion from SQL to "heavyweight" ActiveRecord models, contributed to approximately 25 out of 32 seconds of inefficiency at each call. Let's try to optimize them.

Optimizing ActiveRecord queries

One might infer from the reasoning in the previous paragraphs that ActiveRecord is just a bad framework for ORM. This is not true in two ways. First, the main objective of ActiveRecord is to provide an ORM that works with little configuration, or even without configuration at all. Second, ActiveRecord is mainly used for representing single-table models; in our case, however, we deal with data representation that spans several database tables.

This suggests the way to fix performance here. Indeed, we already noted that all the data required to represent the tree of posts, may be fetched with a single SELECT query. We can create an ActiveRecord model designed specifically for this query (you may call it a "view" if you like) reaping the benefit of having no configuration alongside. In the view, we alter the methods that query the information, so that they match the fields

We don't even have to alter the way the properties are queried in the view by introducing fake "proxy" objects to our new model, like this. table_tow.user returns a simple object that has a login method initialized with a value from the query. In my case that would be an overkill, but if you have a lot of views to fix, you may explore such an option.

In the case of the view presented above, we'll need the following fields, id, title, created_at, empty_body (instead of body.empty?, see sidenote), marks (which is a serialized array automatically maintained by Rails' serialize class method), unreg_name, user_login (instead of user.login, see sidenote), host, clicks, hidden. Not shown directly in the view, but used to build the posts tree is parent_id field. We'll call the model to comprise these fields FasterPost, and start with an empty model stored in app/models/faster_post.rb:

These posts will be fetched via Thread class. We already had a has_many association "Thread has many Posts", and we modify it using a standard :finder_sql attribute:

The settings_for is a global helper that denotes the current user, we'll return to it later.

While :finder_sql was originally used for slightly different purpose, we piggyback on its ability to execute arbitrary SQL query (ActiveRecord has only two such ways, the second is find_by_sql).

Let's try to fetch data bia this association, and we'll see an exception:

Mysql2::Error: Table 'database.faster_posts' doesn't exist: SHOW FIELDS FROM `faster_posts`

The original post model has a plural name (Posts) because the ruby standard library contains the singluar Post

Indeed, ActiveRecord doesn't know what columns there should be in the resulting table. We need to override a couple of methods; as of Rails 3.1.3, these are columns, and columns_hash. These methods are not really used, and we need to supply some ActiveRecord's internal structures for that. Instead of supplying the actual column information for our raw SQL, we simply copy the column information from Posts model (and cache it, as ActiveRecord itself does):

When we try to serialize this model for debugging purposes via the inspect method, we'll see that for our model is not really useful: it doesn't show all the fields. We may find the original Object's inspect method useful here:

That's all what's needed to make this model work (see sidenote for one more subtle issue). However, we may find out (with the profiler) that we still spend a lot of time inside some ActiveRecord methods. When a field method is called, it still gets through some ActiveRecord internal checks and conversions, which take a lot of time, and are, basically, useless for our lightweight model. Even that we have to enter method_missing to route the method to the actual field slows us down.

I found that ActiveRecord has a method that is close enough to the data fetched from the table, but it doesn't perform some conversions (for instance, boolean values are not converted, and are fetched as strings instead). It is read_attribute_before_type_cast (note, that its attribute should be a String, not a Symbol!). So, each fast field method should directly call it as "short cut". We can generate these methods for all the query fields with Ruby metaprogramming capabilities:

Do not forget to add overrides, which should be declared after this call, for methods that require additional processing (such as serialization):

or return Boolean or Integer values:

After these fixes are put in place, and the view is altered accordingly, all the database queries will not take any significant time. Moreover, this solution also complies to Model-View-Controller pattern, as the data are fetched in the controller, before the view is rendered. This is, probably, one of the rare cases when compliance to the Design Patterns does boost performance.

This, originally, decreased the original runtime by approx. 20 seconds out of 32. But there still are 12 seconds to optimize away. How could this be done?

Optimizing the view

ActiveRecord optimization is not enough, though. A lot of time is spent in the view itself. I had to use a profiler to do that, and made several wondrous discoveries. Using profiler is quite simple in Rails 3. All you have to do is to start a server, and invoke in the console

$ rails profiler -r 5 'get("/")'

and open tmp/performance/ProfilerTest#test_get_process_time_graph.html in your browser. An annoying downside is that it prints its log to the server's console (but you're not trying to do anything while profiling, are you?) Here are several points to optimize I found.

Link Generation with link_to

One of the greatest offenders was the code that generates links. The usual link_to and autogenerated path methods, such as post_path(post) are very slow, compared to the rest of HTML-generating code. However, if we try to generate URLs in our view on our own, it may no longer be enough to edit config/routes.rb to change all the methods in the application. We'll lose the flexibility of Rails we love it so much for!

Instead, I used a kludge for link generations. I assumed that links to all posts look the same, and the difference between them is only the post's ID. Therefore, if we generate a link for one post the Right Way (with link_to and ..._path functions), dismantle it, and create other links as simple string concatenations from the resultant parts, we'll avoid this overhead. Here's the helper I created:

It creates a link generator that generates the link in the fast way, provided all the links are the same. Calling this method from the view is as clean as calling the link_to:

The method returns a proc object (function), and brackets are used to call it with post parameter.


Declarative Authorization is a very nice plugin to manage authorization. It's declarative, and is very convenient to use once you carefully read its manual, and start naming your before_filters by its rules. However, I found out that it's terribly slow when you are to query authorization rights for thousands of posts.

I merely got rid of authorization at the index page, and moved it to a page of individual posts. But you could add more JOINs to your SQL, fetch user roles, and write some lightweight authorization methods instead, if you really need it.

Another way to optimize authorization could be the same kludge that was used in Links optimization. In most cases, if you are authorized to delete a post, you are also authorized to delete all posts in this threads (or in the whole forum). You can try to piggyback on these assumptions.

User-specific settings

How could we optimize user-specific settings on a large scale? Here's the excerpt from the view code I'm referring to:

This excerpt implements user's functionality to show/hide posts in its own view, not affecting what other users see. In the big SQL statement above you could notice that we join the hidden_posts_users to fetch the user-specific settings. Here we also encounter the same problem that with the links, but the "simple" link generator should be slightly more complicated:

but its usage will become even simpler:

Note that wile you supply the block that translates the strings thousands of times to the fast_hide function, it will be called only once, at the very beginning.


I won't dive into here too much, since it's obvious that you should not call t method that translates a string according to the current locale thousands of times for the same string. Save the result of a single call into a variable, and re-use it. It won't change as you render parts of your view anyway.

However, this doesn't save you really a lot of CPU cycles. I found out that calling t("Show subtree") (see the example above) approximately 2500 times only takes 0.05 seconds. But if you multiply that with the number of strings you have to translate, this may grow large.

Caching serialized attributes

The default (de)serialization procedure for ActiveRecord's serialize generator does not cache its results. Caching may not be useful in all cases, of course. However, in my case, there was a total of four possible values for the serialized column ("the post has/has not a picture/a link inside"), and thousands of rows. To use caching (and short cut some ActiveRecord's default useless code), you should just pass your own serializer to the method:

I'm not sure that this behavior is documented (I extracted it from the code, and can't find anything about it in the help), but it works.

Avoid excessive render calls.

This may not be the case for you, but if you call render :partial thousands of times, this just can't be fast. Instead, use a single partial template that renders many items at once. My measurements show that this may save up to 20% of ~ 11 seconds runtime.

If your template is recursive, and you user HAML (not ERB), you'll probably have to move your code in some cases to a helper, and print raw HTML in order to keep the markup (for instance, if you use recursively-folded lists).

Raw HTML instead of Templates

If it's still not fast enough, you may print raw HTML. I don't have any measurements of the effect of this operation alone, but you may try that. This will clean up your profile, and you'll see the offenders more clearly

Make sure to use String Buffer pattern (as I encountered it in many languages, I can call it a pattern). Instead or using + operator to concatenate strings, allocate a buffer string, and append other strings to it. In some languages, such as Java or OCaml, it's not enough, and you should use a separate StringBuffer objects, but, in Ruby, Strings are mutable, and any string may serve as a buffer. By the way, HAML templates are already aware of this pattern (and, I guess, ERB templates are, too), so this only should be used if you already chose to print raw HTML on your own by other reasons (such as described in the previous section).

Other speed increase advice

As you might have noticed, my blog is very slow. It takes about 2 seconds to render any page. The blog was my first Ruby on Rails application ever, so it surely had some design mistakes. On the other hand, three years ago, Rails (version 2 at that time) was much less mature than today, and many speed improvements, such as monadic database querying, was only introduced in version 3.

I figured out the reasons why the rendering is slow, but today I just don't have enough time to fix the code. I'll just describe what mistakes I made and how to avoid it.

Touch your parents when you change!

A good advice in the real life as well, touching parents is also a good practice in Rails. Assume that you need to sort threads by the update time of a thread (I'll use the forum data model as an example). An update time is a time of creation of a post in a thread, or an update time of a post in the thread. When we click "edit" on a post, and complete updating it with save call, the default behavior is that the updated_at field of the post's database record is updated to the current time.

The updated_at record of the thread, however, is not updated automatically. Unless you do this, you will have either to write SQL queries, or just to load all threads, and check all posts for each of them. Rails' belongs_to association has :touch option, that will make it automatically "update" parents when you save children. This will solve the problem of fetching updated threads with little effort.

If you have an abstract "content" type, so that all text items on your site are stored and edited in a uniform way, you will find that :touch neatly integrates with polymorphic associations.

Unfortunately, :touch is only included in Rails 3.

Use monadic database queries

I used to fetch the last blog post in my engine by blog.posts_ordered.first, where posts_ordered is a non-standard ordering of posts. It first fetched all posts, reordered them, and selected the first, which was overly inefficient.

The solution to that would be monadic queries (see find method documentation). Rails 3 allow you to "chain" database queries joining the conditions without executing the query. After you chained all the queries, you may "bind" the query object, and execute the query. It will look like this:

Only one post will be fetched and converted into the ActiveRecord object this way.

Unfortunately, this is only included in Rails 3. In Rails 2 I had to write my own "monadic" query wrappers.

Slow and Loving It

For many cases, however, the performance of Rails is good enough. It can't compete with pure C++ web applications, of course. Rather, Rail's stronger side is describing complex behavior with little effort, with just a few lines of code. A lot of problems I encountered were due to bad performance of overly complicated data models, or with showing thousands of records on a single page. Rails is not unique in such poor performance; for instance, it takes Bugzilla 3 (written in Perl), like, 20 seconds to show a bug with seven hundred comments.

The cool thing in Rails, however, that it's a flexible framework. If you really need to display a thousand of records fast, you can do it. Nothing in the framework prevents you from creating shortcuts to its low-level functionality (Ruby language plays an important role in this), and nothing prevents you from swapping its parts with your own fast code pieces, where it's really necessary. In the forum engine I used here as an example, only a couple of views should display thousands of records. Optimized models and views may be only assigned to these "hot parts" of the application. The rest of functionality, at the same time, takes the whole benefit of the easily configurable framework, of declarative coding, and of rapid feature development, everything I love Rails for.

Read on | Comments (0) | Make a comment >>


Recently I ran a software update from Ruby on Rails version 2.3.5 to 2.3.14. You might have noticed the "maintenance" message all pages were redirected to last Sunday. Of course, after the update completed, I clicked on a bookmark entry "My Blog" in the browser window, and, having acknowledged the correct loading, prematurely considered the update completed well.

I was wrong this time. But first,...

How I perform updates on my webserver

To automate the update process, I use a couple of nonstandard scripts. During the update, the web server becomes nearly unusable, so, instead of providing poor experience to users, I decided to show a "maintenance" page while the update is in progress.

In my virtual host Apache configuration file, I use a special variable, COLDATTIC_MAINTENANCE, to distinguish whether it should ask Mongrel for a web page, or should show a simple maintenance.html page. Here's how virtual host config looks like:

# ... normal website redirects

DocumentRoot /var/www/
<Directory /var/www/>
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
# Redirect everything to index
RewriteCond /var/www/{REQUEST_URI} !-f
RewriteRule ^(.+)$ /maintenance.html [R]
ErrorDocument 503 /maintenance.html
Header always set Retry-After "18000"

This config makes the web server, if the -D COLDATTIC_MAINTENANCE is supplied to Apache's command line, to:

  • Redirect every request to maintenance.html
  • Show 503 Service Unavailable status code. Status code is important, because you should prevent web search engines think that this is an only page that's left on your site.
  • Notify bots that they should come in 5 hours (set this to average maintenance time).

To update the server I do the following:

  1. Set -D COLDATTIC_MAINTENANCE in Apache config (on my distro it's /etc/conf.d/apache2);
  2. Restarted system Apache by /etc/init.d/apache2 restart;
  3. Now, the apache is serving maintenance pages. I edit the config /etc/conf.d/apache2 back, and remove the maintenance define. However, I do not restart the server with new settings!
  4. I run a system update script, which, essentially, looks like update-software ; /etc/init.d/apache2 restart, so the server is automatically restarted without maintenance mode after the update is completed.
  5. Go to movies, since the update usually happens on early East Coast Sunday mornings, when it's Sunday Night in Russia, a good time to relax;
  6. Come home, log in to server, and deal with update failures >_<

I do not do the updates quite often, and they go well. After the update last Sunday, I noticed a sudden decrease of visits. This could be just a random spike, but after the visit number had decreased to less than ten the next day, I realized that the update broke something. I entered into the browser. It should have redirected to, but what I saw was a raw HTML text of the blog page!

How Ruby went off the Rails.

How come? I checked the logs of the website, and notices an abnormal stacktrace:

Tue Oct 18 15:52:05 +0000 2011: 
    Error calling Dispatcher.dispatch #<NoMethodError: private method `split' called for nil:NilClass>
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/cgi_process.rb:52:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:101:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:27:in `dispatch'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:76:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `synchronize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:159:in `orig_process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `orig_process_client'
/var/www/ `process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:282:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:128:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/command.rb:212:in `run'
/usr/bin/mongrel_rails:8:in `load'


It only happened when the page request should result in redirect, but it worked for direct page accesses (that's why, perhaps, the visit counter was greater than zero: people could just user their bookmarks).

I ensured the localhost version is synchronized with the production, and launched WeBrick via ./script/server to play with the website outside the production server environment. But on my local machine it worked well! (No wonder, because there's no entry of my web application in the stack trace; the whole trace consists of web server and framework subroutines.)

Solution: monkey-patching

I googled for the error, and it appeared to be more than half a year old. This bug entry is, I guess, the first place where the error was spotted. In a nutshell, new version of Rails seems to have employed a previously unsupported use-case of Rails' web server interface, Rack. It interfered with, supposedly, partly incomplete implementation of Mongrel, a specialized HTTP webserver for Rails. And mongrel updates still haven't made its way to Gentoo distribution my web server runs (-1 to Gentoo maintainers reputation counter!).

The solution a lot of people (including me) used is here. It is a small Ruby file that should be placed... into your application startup directory (for Rails it's config/initializers/)! You don't have to patch your system libraries and make new packages for your distributions: all you have to do is to add a file to the userspace. The code there uses a well-known feature of Ruby, monkey patching. It alters the implementation of external library classes. I already demonstrated how this could be used to add some Perl-like features to Ruby.

So, fixing others' bugs is another, though less legitimate, way to employ monkey-patching.


Why did this happen in the first place? Two reasons:

  • Insufficient testing after software updates. I should have tried to enter the site as an anonymous user, instead of clicking on my own bookmarks. Without the fix above, the site worked well if the user who accessed it had cookies!
  • Deploy the exactly same environment on testing machine as on the production, and perform the tests there before upgrading.

I thought that Mongrel and the whole Rails stack was stable, but, I guess, it suffers from fragmentation. Ruby on Rails stack consists of a lot of components. While this fosters innovations, the interaction between their developers is not tight, and there also is no common governance; this are the reasons why such strange bugs get shipped in stable versions.

Read on | Comments (0) | Make a comment >>

How I applied for a web server developer, or why the fastest servers are written in C instead of C++

I realized why I was so aggressive in attempts to guide one of the latest StackOverflow questions so it would gain useful answers and wouldn't be closed as subjective. The question title is "Why are most really fast servers written in C instead of C++", and I really wondered why they are. Because that's... a bit personal to me. So, let me tell you this story.

Some time ago, after I suffered a fiasco to get a job in the USA, I thought it was a good idea to change my career to a less ivory-tower-ish. I thought about what was going on in the industry, and decided that since everything was moving to web, it would be a fruitful choice to become a web server developer. So I sent my resume to three companies that were hiring a newbie Linux C++ web server developer. Being under impression of "Smart and Gets things done" stuff, I wrote a resume that way and explicitly stated that I don't know anything about web serving.


One company, which is the top Russian search engine, declined me with a note "Your skills are valuable, but we can't offer you a job now" (bold is mine). I still wonder what that means: am I good or bad? Should I get more experience and try again or that's just a polite rejection? But well, alright, that's the number-one company, so I didn't frown upon that.


The other company was an emerging startup. They had a shiny big office with free coffee, and a large restroom (at least that's what they said). A sign of warning was that the team failed its previous startup, but hell, why not give it a try? I was contacted by the CEO (it's a small startup, isn't it? Anyway in one Silicon Valley startup I was also interviewed by a CEO, so that didn't bother me) and was sent a sample task.

The task was to develop a C++ anti-DDOS module that drops any IP that exceeds a connection limit per a prespecified time. The task had been sent without any prior notice, and had a week-long time limit (who cares that I have plans for the week?). Period.

Well, alright. I didn't have any experience with Apache and web serving at all, so I started googling like crazy. After five hours of googling, I learned how to create apache2 modules, I devised a way to create a C++, not a C, module and I wrote a module that worked but was thread-local (apache starts a lot of threads). Essentially it was useless, but, well, spending more than 5 hours for a job application is a commonly acknowledged overkill!

I thought it did prove that I was a fast learner and a capable programmer. I sent that solution with notes how it should be improved to make up a complete module. However, "the architect was not satisfied with my code," and I was asked to finish what I had been assigned to. A note that I already had invested enough time for them to see if I was capable good enough to do the job was replied with a hilarious letter. It happens that I don't profess the company's philosophy that values result rather than capability! And unless I finish the code I'm not accepted.

The startup I applied to has, I guess, already failed; they still are not spoken about--whole google output is their SEO technologies. And they still require a registration on their website to all applicants--pathetic, isn't it?

Well, if the company values working for it for free, then it indeed is of a different philosophy than me. Good luck to these guys, I hope they won't fail many startups until they learn something.


The reasons why I don't and won't apply to Google deserve a separate post, I think.

The third was another top Russian Internet company. They assigned an interview in their shiny big office with free a pong table, PS3 avaliable, as well as the usual free coffee and a large restroom! The interview started with a question "Why didn't you apply to Google?" -- apparently because of my Summer of Code participation (or because I was a candidate to replace a guy who transferred to Google from there).

The interview went smoothly, even considering that I didn't know anything about how HTTP or web servers work. After more usual questions, I was given a coding assignment. The clouds started gathering: they asked to write a C program to parse apache log and detect how many different IP addresses were in it. "Why not C++?" I asked, "it's easy and fast in C++!" The reply was that they wanted to judge my algorithm skills, not the knowledge of standard libraries. Well, OK, that seemed fair.. So, let's reckon how I did it in programming contests... *rubbing hands*

In 20 minutes the program was ready, but the remote server I was ssh-ed to suddenly went out of space, and the source somehow managed to get erased after a failed attempt to save it (Vim let me down this time). I spent another 20 minutes to code it again; it was all OK, and the interviewer even didn't see anything to criticize my code about.

The interview seemed to have gone great. The architect that interviewed me happened to come from the same university as me, was a fan of Vim as well, and sounded pleasant. We proceeded to discussing my salary, I made an offer, and there it happened.

"Okay, now I must confess," the architect started, "We lied to you a bit. All our software is actually written in C. Although we were looking for a C++ developer, you'll have to do C programming mostly."


The thing, yet unknown to the interviewer, is that I hate C; I hate it as much as I love C++. Feeling so deceived, I burst into explaining that writing in C is disgusting, since C is the new assembler, regardless of that I can do it quite well... No wonder I didn't get this job either.

Of course, I asked why they did development in C. And the answer was, "well, we sometimes feel urges to start rewriting it all in C++, module by module. But then we question ourselves—"but why?", and leave things as they are".


And now I also question myself—"why?" The fact that servers are mostly developed in C, I think, was crucial for my life. Perhaps, if it weren't the case, I would already have said goodbye to science and my masters studies, and would become a full-time corporate server developer. But sometimes at nights I still wonder why... why didn't they rewrite it in C++? And that's why I wanted that SO question answered.

Read on | Comments (0) | Make a comment >>

SVG is useless

Recently I tried to add a feature to this site. The feature is to display a Russian flag near links that point to web-sites, language of which is Russian. Of course, I wanted its height to be equal to that of a letter.

So, basically, I want my image to scale up to the size of a letter by user's browser capabilities. "What format does that mean?" I thought. Of course Scalable vector graphics (wiki)! Then this stuff would be scaled with a simple style

But it doesn't work. I tried googling, and it seems that there's no way to make it work. The conclusion is that a Scalable Vector Graphics image just doesn't scale!

A typical attempt to resize SVG via HTML

Here's a sample code:

And here's a sample result:

Vector graphics surely can scale, and what "scalable" should mean is that the format is designed in such a way, that image is easy to scale in size. Perhaps, format allows it, but the way it is integrated in HTML code doesn't allow it.

It was a great surprise to me: I misunderstood the word "scalable" within the definition of SVG. The ability to grow/shrink in size is not appealing enough for standard developers that live in their ivory tower (SVG has been arount for 10 years, and is still not fully supported anywhere). Here's what "scalable" really means:

To be scalable means to increase or decrease uniformly. In terms of graphics, scalable means not being limited to a single, fixed, pixel size. On the Web, scalable means that a particular technology can grow to a large number of files, a large number of users, a wide variety of applications. SVG, being a graphics technology for the Web, is scalable in both senses of the word.

SVG graphics are scalable to different display resolutions, so that for example printed output uses the full resolution of the printer and can be displayed at the same size on screens of different resolutions. The same SVG graphic can be placed at different sizes on the same Web page, and re-used at different sizes on different pages. SVG graphics can be magnified to see fine detail, or to aid those with low vision.

SVG graphics are scalable because the same SVG content can be a stand-alone graphic or can be referenced or included inside other SVG graphics, thereby allowing a complex illustration to be built up in parts, perhaps by several people. The symbol, marker and font capabilities promote re-use of graphical components, maximize the advantages of HTTP caching and avoid the need for a centralized registry of approved symbols.

from Concepts section of SVG v1.1 specification

Scalability happens to mean something more cool than just resizing, wow! The ability to be re-sized was lost in the row of cool practices SVG allows. But still, while the standard says about different sizes that "it can", in realty it can't. Totally useless.

Read on | Comments (3) | Make a comment >>

More posts about web-programming >>