Recently I ran a software update from Ruby on Rails version 2.3.5 to 2.3.14. You might have noticed the "maintenance" message all pages were redirected to last Sunday. Of course, after the update completed, I clicked on a bookmark entry "My Blog" in the browser window, and, having acknowledged the correct loading, prematurely considered the update completed well.

I was wrong this time. But first,...

How I perform updates on my webserver

To automate the update process, I use a couple of nonstandard scripts. During the update, the web server becomes nearly unusable, so, instead of providing poor experience to users, I decided to show a "maintenance" page while the update is in progress.

In my virtual host Apache configuration file, I use a special variable, COLDATTIC_MAINTENANCE, to distinguish whether it should ask Mongrel for a web page, or should show a simple maintenance.html page. Here's how virtual host config looks like:

<IfDefine !COLDATTIC_MAINTENANCE>
# ... normal website redirects
</IfDefine>

<IfDefine COLDATTIC_MAINTENANCE>
DocumentRoot /var/www/coldattic.info/htdocs/
<Directory /var/www/coldattic.info/htdocs>
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>
# Redirect everything to index
RewriteCond /var/www/coldattic.info/htdocs%{REQUEST_URI} !-f
RewriteRule ^(.+)$ /maintenance.html [R]
ErrorDocument 503 /maintenance.html
Header always set Retry-After "18000"
</IfDefine>

This config makes the web server, if the -D COLDATTIC_MAINTENANCE is supplied to Apache's command line, to:

  • Redirect every request to maintenance.html
  • Show 503 Service Unavailable status code. Status code is important, because you should prevent web search engines think that this is an only page that's left on your site.
  • Notify bots that they should come in 5 hours (set this to average maintenance time).

To update the server I do the following:

  1. Set -D COLDATTIC_MAINTENANCE in Apache config (on my distro it's /etc/conf.d/apache2);
  2. Restarted system Apache by /etc/init.d/apache2 restart;
  3. Now, the apache is serving maintenance pages. I edit the config /etc/conf.d/apache2 back, and remove the maintenance define. However, I do not restart the server with new settings!
  4. I run a system update script, which, essentially, looks like update-software ; /etc/init.d/apache2 restart, so the server is automatically restarted without maintenance mode after the update is completed.
  5. Go to movies, since the update usually happens on early East Coast Sunday mornings, when it's Sunday Night in Russia, a good time to relax;
  6. Come home, log in to server, and deal with update failures >_<

I do not do the updates quite often, and they go well. After the update last Sunday, I noticed a sudden decrease of visits. This could be just a random spike, but after the visit number had decreased to less than ten the next day, I realized that the update broke something. I entered coldattic.info into the browser. It should have redirected to http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/, but what I saw was a raw HTML text of the blog page!

How Ruby went off the Rails.

How come? I checked the logs of the website, and notices an abnormal stacktrace:

Tue Oct 18 15:52:05 +0000 2011: 
    Error calling Dispatcher.dispatch #<NoMethodError: private method `split' called for nil:NilClass>
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/cgi_process.rb:52:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:101:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:27:in `dispatch'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:76:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `synchronize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:159:in `orig_process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `orig_process_client'
/var/www/coldattic.info/main/coldattic/vendor/plugins/spawn/lib/patches.rb:61:in `process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:282:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:128:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/command.rb:212:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:281
/usr/bin/mongrel_rails:8:in `load'
/usr/bin/mongrel_rails:8

Hm.

It only happened when the page request should result in redirect, but it worked for direct page accesses (that's why, perhaps, the visit counter was greater than zero: people could just user their bookmarks).

I ensured the localhost version is synchronized with the production, and launched WeBrick via ./script/server to play with the website outside the production server environment. But on my local machine it worked well! (No wonder, because there's no entry of my web application in the stack trace; the whole trace consists of web server and framework subroutines.)

Solution: monkey-patching

I googled for the error, and it appeared to be more than half a year old. This bug entry is, I guess, the first place where the error was spotted. In a nutshell, new version of Rails seems to have employed a previously unsupported use-case of Rails' web server interface, Rack. It interfered with, supposedly, partly incomplete implementation of Mongrel, a specialized HTTP webserver for Rails. And mongrel updates still haven't made its way to Gentoo distribution my web server runs (-1 to Gentoo maintainers reputation counter!).

The solution a lot of people (including me) used is here. It is a small Ruby file that should be placed... into your application startup directory (for Rails it's config/initializers/)! You don't have to patch your system libraries and make new packages for your distributions: all you have to do is to add a file to the userspace. The code there uses a well-known feature of Ruby, monkey patching. It alters the implementation of external library classes. I already demonstrated how this could be used to add some Perl-like features to Ruby.

So, fixing others' bugs is another, though less legitimate, way to employ monkey-patching.

***

Why did this happen in the first place? Two reasons:

  • Insufficient testing after software updates. I should have tried to enter the site as an anonymous user, instead of clicking on my own bookmarks. Without the fix above, the site worked well if the user who accessed it had cookies!
  • Deploy the exactly same environment on testing machine as on the production, and perform the tests there before upgrading.

I thought that Mongrel and the whole Rails stack was stable, but, I guess, it suffers from fragmentation. Ruby on Rails stack consists of a lot of components. While this fosters innovations, the interaction between their developers is not tight, and there also is no common governance; this are the reasons why such strange bugs get shipped in stable versions.