Recent blog updates

Logging in Expressive Languages

I am not aware of any strict mathematical definition of expressiveness of a language. Informally, a language is more expressive if it allows you to achieve the same with less keystrokes. From this point of view, for instance, Perl or Python are more expressive than C. Of course, you can't always achieve with Python most things that C is capable of at low level, and therefore one may question whether Python is really more capable than C. But capability has little to do with expressiveness. In certain domains, one language may be more expressive than the other, and in other it could be just vice versa. (For instance, Python is more expressive in string operations while C is more expressive in low-level raw data representation.)

One may notice, however, that the majority of languages are Turing complete, and you basically may express the same with all of them. What is the difference in expressiveness then, as seen by a Practicioner rather than a Computer Scientist?

I usually use the following criterion. If you write one non-splittable expression instead of a chain of statements, the language is more expressive. Compare the following two programs that compute a sum of an array in C:

...and in Ruby:

..and, in C, we also should find a place to deallocate the array when it's no longer needed.

You may notice that Ruby can achieve with a single expression what C needs several operators and variable declarations for. Three Ruby features contribute to this, anonymous lambda functions, syntax sugar to pass them to functions (such as inject), and to avoid the really useless variable declarations. Isn't it always that beautiful?

In some cases it's not.

What it takes, to log?

Now let's try to add some logging to the programs above. Assume we actually want to hook into the loop, and make it print the square result it has calculated to a logging stream.

Let's take a look at the code again.

C:

Ruby:

Ewww! Why does our Ruby code looks like C? Where is our one-liner? Where is the expressiveness? How come we're explicitly introducing a temporary variable just like in C?

I spotted such patterns, "compute-log-return" in a lot of places in my practice. For i instance, the BLAST codebase I maintain, contains these here and there, because it's written in an expressive functional language (OCaml), and uses logging as a primary debugging mechanism. Because of it, the code that should look beautiful looks as an old prostitute trying to mask their weary skin with tons of makeup, and I'm not happy with this.

Trying to get asleep on a plane, I suddenly realized that this logging could be achieved with a better code. Since "assign-debugprint-return" is a recurring pattern, let's define a helper function

Then, the resultant Ruby code will look like this:

Now our Ruby code looks like Ruby again.

Of course, this is not limited to Ruby. Though less versatile, the sane function may be implemented for the C code as well.

And this brings us back to the original comparison with 1 line in Ruby and five--in C.

Discussion

while you may write such function for a less expressive language (as seen above), it will "stand out" in a large C code sheet and look like ugly duckling. What may benefit most are languages that mix functional and imperative coding styles, such as the aforementioned Ruby and OCaml. The reason why this helper might be useful is that it's relevant to the functional languages in two different ways.

First, functional language allow the programmer to define anonymous function inline, which makes them be frequently passed as parameters to other functions. In C, defining a separate function is, inmost cases, uglier than just adding a convenient logging shortcut.

Second, logging an intermediate result contradicts the functional level paradigm. In functional languages you define a computation as a whole rather than specify concrete sequence of actions. In case of logging, you have to violate a chain of functions with a side-effect operation. Of course, you can try to make it look nice, as we did, but the very reason we need the kludge is that we're doing something a bit wrong.

But it's not limited to functional languages. Most expressive languages that do not require from user a strict specification of type of every identifier can. For example, we might want to debug Perl's regular expression substitution with using a similar function:

(we couldn't use log because it conflicted with the mathematical logarithm function that bears the same name).

***

In any case, the "compute-log-return" function that encapsulated the debug print is a widely useful pattern. And it's especially useful when you're writing a program in an expressive language. Do not let the programming cruft conceal the actual idea of your code, use loggers that hide all the details.

Read on | Comments (0) | Make a comment >>


Counting Visits Efficiently with "Wide Caching"

One of the features of the web forum engine, a clone of which I develop in Ruby on Rails, is that it displays some site usage statistics.

There are two kins of statistics it displays. First, it tracks and shows how many times a particular message was read (i.e. clicked), tracking if a user doesn't self-click his or her message. Second, it shows the generic site usage activity, unique visitors, and page views in the top-right corner.

In this post I'll tell how I solved the problem of calculation the page view and visitors statistics. Impatient developers may jump directly to the solution section, while others are welcome to read about the path I walked from the most naïve activity tracking to the Wide Cache.

Naïve solution

The naïve solution was to store all site accesses in a table in the Application MySQL database. Each line in this table would represent one access to a site, keeping a user's hostname, and the access time. At each access, a corresponding line is inserted into the table. Then, if we are to display the information gathered, we run two simple db queries to get the pageview information (the time is the current time minus 10 minutes):

SELECT COUNT(distinct host) FROM `activities` ;
SELECT COUNT(*) FROM `activities` WHERE (created_at < '2012-04-01 18:33:25');

The writes to and reads from the activities table happened at each request. This was achieved via a before_filter in the root controller which is usually named ApplicationController, as hinted in this StackOverflow question:

It spawned another process that handled all the activity business, and didn't prevent the main process from serving pages timely.

As you might expect, the performance measurements were disastrous: the application could serve only (four) requests per second, while with all activity stuff turned off the throughput was about 40 rps.

Storing the table in memcached

The reason why the naive solution was not working as expected was lack of throughput. At first, I thought that since we could defer the database writes, and perform them after the request was served, we wont have any problems. However, this only addressed one of the performance metrics, response time, while what solution lacked was throughput.

I also didn't want to introduce other technologies specifically for activity tracking. What I had at my disposal was memcached. Other components of my application used memcached to cache database reads and whole pages. We could, I thought, cache database writes with it as well.

Rails storage with memcached backend supports two operations. The first is to write something under a string key, perhaps with a limited lifetime. the second is to read from a specified storage entry if anything has been written to it before, and if its lifetime is not over yet. That's basically all.

We could try to use memcached to store the SQL table. itself. Indeed, that table may be viewed as an array of rows, so we could just read the table, append a row, and write the result back. However, for the sake of performance, memcached doesn't support any locking, so we can't just store the table described in the naive approach in a cache bucket. Two threads may read the array, then both append the next row, and both write, one row being discarded. This is a classical race condition. And my aim was to serve 30-40 requests per second, which means that race conditions that appear if this approach is used were inevitable.

Besides, even if we could use locking (for instance, via Linux named locks (flocks)), it could even perform worse than the database, because it wouldn't be able to handle enough requests sequentially. We need a certain degree of concurrency here.

Wide caching

To mitigate the race conditions that happen during database writes, I used a technique I named "wide caching". I certainly reinvented the wheel here, but for the sake of clarity, I'll use this name in the rest of the post.

The race condition from the previous section could be mitigated by distributing the big "table" into smaller, independent tables stored in different memcached chunks. Writes to each of the chunks would not be protected by a mutex, so a race condition would be possible. We could try to mitigate it by using a round-robin chunk allocation, i.e. a new thread writes to the chunk next to the one allocated to the previous thread.

This solution, however, could be improved.

First, round-robin allocation requires a central, synchronized authority to return proper chunks. This brings the same level of technological complexity as a single bucket with a lock.

Second, round-robin doesn't guarantee the lack of race conditions either. A thread may stall long enough for the round-robin to make a whole "round", and to consider the chunk currently in process again. To avoid this, the allocator should be even more complex.

Third, let's realize that we may trade precision for speed. The result will not be absolutely accurate. People look at the site usage statistics out of curiosity, and the figures may not be precise. We'll try to make them fast first.

This all suggests a simple idea: get rid of the allocator! Just use a random bucket in each thread.

This will not prevent us from race conditions either, but it will make them predictable. If the allocation is completely random, we can carry out experiments, and extend their results in time being sure that they will be reproducible.

Decreasing effect of race conditions

The previous paragraph consisted of reasons and measures that addressed the amount of potential accesses to the same cache bucket within a period of time. What also matters is how long this cache access is. The shorter it is, the more writes to a hash bucket per second we are going to handle correctly.

Memcache request structure

Rails provides a transient interface to various caches. The interface allows to store serialized Ruby structures in the cache. This picture shows the steps that happen during an update of a table in hash, the scale roughly representing the time each of the steps take.

A typical hash bucket access consists of these steps:

  1. reading raw data from cache;
  2. deserializing, converting to Ruby format from a character format (the opposite to serializing;
  3. appending a row to the deserialized table;
  4. serializing to the character format (the opposite to deserializing);
  5. writing raw data to the cache.

We can see that steps 1, 2, 4, and 5 depend on the amount of records in the array we store in the hash bucket. The more information we record, the more time they take, and when the data are large enough, the time becomes proportional to the size. And if we just store all the activity data in cache, the size of the arrays being (de)serialized only grows in time!

How could we decrease the size of buckets? The wide cache already decreases the expected size by a factor of width, could we do better?

It turns out we can. Activity is a time-based data. The older the data are, the less interesting they become. In our use-case, we were only interested in the data for the latest 10 minutes. Of course, the routine cleanup of tables that drops all records with old enough timestamps is not sufficient: the activity data for 10 minutes are large enough (with 50 rps, and 50 being the cache width, the mean size would be 600).

Time reminds us about another idea. Why do we load and then write the old data for the table? They don't change anyway! Since we're using a generic cache solution, and can't lurk into its read/write algorithm, we can't do it on a per-record basis (i.e. do not load the table, just append the record being inserted. What can we do instead?

We may utilize another highly-accessible resource, which usage is however totally discouraged in building a reliable distributed algorithm. We have the clock that are easy and fast to access. The clock provides us with a natural discriminator: each, say, three seconds, abandon the old hash bucket, and start a new that will be responsible for storing the requests for the next three seconds. Given current time, we can always find the "layer" of the cache we should randomly select the bucket from.

Reading the data back

The discussion above was devoted to how to record the activity data. However, the data should be displayed in a timely manner. How timely? Let's consider that if the data are updated at least once per 30 seconds, it's good enough.

Then, each 30 seconds we should reconcile the activity data written into the memcached, and "compress" it to an easily readable format. Since I already had MySQL implementation before, I piggybacked on this, and merely inserted the activity information to the SQL table, so the reading procedures do not have change at all!

To get the statistics, we'll have to iterate all hash buckets the writing algorithm could fill, because gathering the ids of the buckets filled will require additional synchronization (additional writes), and, since the majority of them will be filled under high load, we'd better just collect them all. Theoretically, we should collect all the buckets that might have been filled during the last 10 minutes, the period we show the activity for. However, if we run the collecting algorithm each 30 seconds, we can only collect the data for the latest 30 seconds as well, since all the previous data should have already been collected.

We'll have another race condition here. A worker may get the current time, generate a hash bucket id X, and then stall for several seconds. During that stall, the writing worker collects the commits all the data to the MySQL table, including the piece stored in X. The first worker wakes up, and writes the visit information to X, from which it will never be read, so the request is lost.

To mitigate this race condition, we may collect and commit the data only from the layers that are deep enough. This won't help to avoid it completely, but will decrease its probability to the degree at which we can ignore it.

The final Wide Caching solution

If we assemble all of the above, we'll get the final solution that looks like this:

When a page request arrives, it asks for the current site usage statistics as one of the steps during the page printing. The statistic data are requested from the `activity` table in MySQL database, and are cached for a medium amount of time (30 seconds), because more frequent updates do not improve user experience.

After the page has been served, the request notifies the wide cache about the access to the site. First, it determines the "row" of the cache by rounding the current time in seconds since the Epoch down to the number divisible by three. Then, it determines the "column" by choosing a random number from 1 to 15. These numbers are appended; they form a unique identifier of a memcache entry. The website then reads the table from that entry, appends a row, and updates the same entry.

Dumping the collected accesses to the DB is performs like this. After notification, the request also checks if there is a live memcached entry with a special name, and a lifetime equal to 30 seconds. If there is such entry, it means that the information in the DB is obsolete, so the algorithm starts to commit the information into the MySQL DB.

While the information is being committed, there may appear other requests that would also check the lifetime of the memcached entry, and start the commit process. This is why the order, in which the memcached entries are being read is randomized, so that the amount of cases when the same information is committed twice is minimized.

Here's the code. It really looks much shorter than this post that describes it.

The latest version of the source code may be found here.

Experiments

I mentioned that the setup contains race conditions that will lead to losing some requests. With the cache width of 15, and bucket height of 3 seconds, the payload of 35 requests per second made the tracker lose 350 out of 6800 requests. This is approximately 5% of total number of requests, which is acceptable. Because we randomized request queries, we may conclude that 5%—given these figures for requests per seconds, cache width, and the equipment—will be an average visit loss factor.

Spawning

In previous sections, I claimed that spawn-ing threads using spawn Rails gem, and writing to the database in the spawned processes/threads is slow. Indeed, spawn reinitializes the connection in the spawned thread, and this is already a considerable burden on the server if several dozens of threads are spawned each second. Here are the experimental details (see bug #67 where I first posted them info):

MethodReqs. per sec.
No activity40.2
With activity; no spawning27.4
With activity and spawning13.0
With wide cache36.7

This shows that (a) you should not use spawning for short requests, and (b) wide cache is really fast enough.

Background periodic job or per-request?

In the algorithm described above, activity data collection was started when a request arrives, and the application finds out that the currently cached activity stats are out of date. We were careful to make this case as painless as possible, but it has other implications that are not related to race conditions, and experiments show that the loss of throughput is acceptable if we don't try to spawn a thread for each this activity.

Surprisingly, this method starts performing badly when the site activity is low rather than high. On a low-activity site, we can't rely on requests coming each second. Rather, they may arrive even less frequently that activity cache times. So, to support the low-activity case, we have to collect the information for caches that might have been filled in the latest 10 minutes (the maximum value a visit information could still be relevant for), not 30 seconds (the lowest possible time since the previous data collection). Otherwise, if users arrive less frequently than 30 seconds, the engine would always show zero activity, which would make users visit the site even less.

This wouldn't be important, unless I used HAML template engine, which—and this is a known, sad bug—doesn't flush the page until the request is complete. Even putting the code to after_filter doesn't help. Experiments demonstrate that activity data reconciliation may take up to 1 second, so some requests will be served with an excessive 1-second delay, and when the rate of requests is low, this will constitute a considerable fraction of them. And I surely don't want my users to experience frequent delays during their infrequent visits.

Spawn instead of after_filter? We have already seen that spawn-ing will make our server choke at the other extreme, when the load is high.

Luckily, we have a solution that suits both cases equally well. It is to periodically invoke the activity data collection procedure, without tying it to requests. However, the periodic invocation of a task is not straightforward in Rails. I'll get back to it in another post.

Future work

The current implementation of the activity tracker is parametrized with cache attributes (cache lifetime, and width). However, this Wide Cache could be parametrized further, with the procedures that are executed, namely:

  1. Cache bucket updating
  2. Commit procedure
  3. Reading what we previously committed

I think that I'm going to need the same cache that doesn't always touch the database for the first kind of the activity, for visits of individual posts. This parametrization will help me to keep the code DRY, and to re-use the Wide Cache.

This will require refactoring of visits and hits to a single function that calls a lambda function passed in the constructor.

Other approaches

Parse apache/nginx/rails logs.

Indeed, the topmost web serving layer already collects the activity data: it carefully logs each visit, with the URL being visited, a user's IP address and user-agent. Instead of spending time on activity in a comparatively slow Rails application, we could use the logs of a "fast" web server.

I have seen production systems that display activity data based on parsing nginx logs, and it may be integrated into Rails in such a way that it doesn't look like a kludge. Perhaps, free log parsers are already available on github... but this solution just doesn't look cool enough to me.

Do no track activity

Does anyone care about the visits at all? Maybe we just shouldn't track them?

First, from my observation, everybody tries to stick a visit tracker into a website. A bigger activity also means that you're visiting something valuable. A Russian clone of the Facebook social network even used to display a fake registered user number counter at their homepage. Such fraud could only be justified by a huge benefit from displaying it, which means that the statistics is considered valuable by at least some very popular sites.

Second, in my case, there was an easier answer. I should reimplement everything the engine I'm trying to clone contains by politics-related reasons: unless I reimplement everything, and manage to keep the site fast at the same time, my approach to the development will be considered a failure.

Use a specialized activity tracking solution

Too late. I have already implemented my own. But if you want some activity data on your website, do not hesitate to use one. It is hard, see above.

Conclusion

In one of the discussions on the very forum I'm reimplementing, someone wondered, why the view count on Youtube is not update in realtime for popular videos. Having gotten through a series of trial-and-failure experiments with visit tracking, I realize that the developers of Youtube surely had their reasons. My activity information is also already delayed, while the traffic volume still insignificant compared to even a single Lady Gaga video.

It seems that during the development of this tracking I have reinvented a lot of wheels. This is how, I guess, the most primitive database and file systems caches look like. Their algorithms are, of course, more sophisticated, and are not built on top of the generic cache and serialization solutions, using their own, custom ones instead.

But as any other re-invention of a wheel, it was fun, and I surely got a lot of understanding about higher-load web serving while trying to optimize this component of the forum engine.

Read on | Comments (0) | Make a comment >>


Ruby on Rails Errors and Solutions

When you develop something in Ruby on Rails, you will most likely encounter strange errors here and there. These errors may look like language or framework bugs, but they are not. Most likely, they are programming bugs you were not aware of before you encountered them.

The reasons for that are twofold. First, Ruby is a dynamically-typed language, and this alone introduces subtleties you should know about. Second, the web framework values convention over documentation. It means that methods and functions should just work as expected, without making the user read the whole documentation page.

And when something just doesn't work, you'll have troubles finding the cause in the API docs. You'll have to google the exception message, and hope to find the cause.

This post is another contribution to this Exception-Driven Development, and lists some errors a Ruby/Rails programmer may make. Some of the errors were found via web search (I tried to provide a proper attribution where possible), and some of them were solved by my own. I'll try to keep this list up-to-date, and add more errors here as I write more Rails programs.

Generic Ruby Issues

Calling setter methods

You are setting a value of an ActiveRecord field from inside a method of your model, but it does nothing? You are doing it wrong. Perhaps, you write something like this:

But the value is reset after this method. What's happening? The field = ... line actually creates a new local variable, rather than calling the object's field=(value) method ActiveRecord has created for you. Just use self to distinguish:

Note that you don't have to do the same when you read from the field—unless you have created a local variable with the same name! This is why the second line of the method doesn't have to contain self in this case.

Sometimes this strikes you from a different angle. For instance, if you have a quality attribute in your model, and you want to set conversion quality with RMagick, you have to write:

because you can't easily access the outside's variable quality from inside the block, as it's routed to the RMagick object instead.

Other gotchas

You might have specified :false instead of false when you pass a hash attributes to Ruby. I devoted a separate post to this.

Surprising Bugs

Error to clean your cookies

If you have this error:

ActionController::InvalidAuthenticityToken in SessionsController#create 
ActionController::InvalidAuthenticityToken

RAILS_ROOT: /home/rails/app/
Application Trace | Framework Trace | Full Trace 
/home/rails/app/vendor/rails/actionpack/lib/action_controller/request_forgery_protection.rb:79:in `verify_authenticity_token'
/home/rails/app/vendor/rails/activesupport/lib/active_support/callbacks.rb:178:in `send'

you should clear your cookies. Perhaps, you have deployed a different site under the same hostname. If this alone does not help, check if you have set up constant DOMAIN or SESSION_DOMAIN in your config (I had them in config/environments/production.rb), adjust them, and restart your server.

Strange remote_ip error

This error:

undefined method `remote_ip' for nil:NilClass

means that you accidentally named one of your actions "request". And you can't name your actions "request", as there's some clash when the engine wants to get a current request, but calls your controller method instead. Rename your action.

Spawn plugin and Rails 3

Your spawn plugin is throwing this kind of error:

spawn> Exception in child[26055] - NameError: uninitialized class variable @@connection_handler in ActiveRecord::Base

That means that it's too old for your current version of Rails. Upgrade it—to the latest commit it has (this is not on the master, but on its edge branch), or use it as a gem. If you use Rails3, here's the line for your Gemfile:

gem 'spawn', :git => 'git://github.com/tra/spawn', :branch => 'edge'

Commit 0e1ab99b worked for me, the corresponding gem version is 1.1). Source.

Unicode problems

UTF-8 regular expressions do not work by default? Add #coding: utf-8 as the first line of your ruby file. If it's an ERB or HAML template, add this at the beginning anyway.

If you want to make your application completely Unicode, from the database to the tips of your views, you should do all of the following.

  • Make sure that all of your source code files are in UTF-8;
  • Make sure you added # coding: utf-8 at the beginning of any source .rb file (views, initializers, library files) that actually contains non-ASCII characters;
  • Make you database Unicode:
    • make sure you set utf-8 character set for the database as a whole;
    • set utf-8 character set for each table;
    • set utf-8 character set for each text or string column in the table;
    This is especially important if you dump your database and load if from the dump on another server. In this case you may also need to cast @SET names='utf-8' in your mysql console;
  • the "default" rails gem doesn't support Unicode. Use mysql2 gem instead as your MySQL driver.

If you miss one of the steps, you may encounter problems. For instance, if you do everything except for the mysql2 gem, you may see these errors for your views:

  Showing app/views/layouts/application.html.erb where line #48  raised:
  incompatible character encodings: ASCII-8BIT and UTF-8

If you do not set up the character set for the database, migrations may create new tables with latin1 encoding, and this may result in ?????????? instead of non-ascii lines in the data user enters to your DB.

Models, Databases, and Validations

Ever-failing presence validation

Do NOT add empty? method to your objects unless you really mean it! This can strike you from you back when you don't expect.

Assume you want to validate that a "belongs_to" association presents at save. You might use validates_presence_of for this. However, as specified in the documentation, it calls Rails' Object.blank? to check presence. It, in turn, calls empty? method (documentation) if it exists.

In my forum engine, I used empty? method to convey if the post's body is empty. Therefore, you couldn't answer to posts with empty bodies, because validation thought that such posts are nonexistent.

Instead of using a complex validation, I suggest renaming the empty? method. When the author of the next model forgets—or won't even happen to know—about this bug, he/she will still try to use validates_presence_of. It's better to make it work, following the principle of least surprise.

Model name clashes

You can't name your models like Ruby classes, i.e. Thread (conflicts with internal class for threading) or Post (conflicts with HTTP library).

It's interesting, that the default scaffolding automatically names your model "Threads" when you instruct it to create a "Thread" model to avoid conflict. However, you still need to specify the proper class name in associations:

Check the scaffolded controller for other declarative methods that may also need adjustment.

Using several databases in the application

If you want to specify several databases for your app (on a per-model basis), you can use a built-in ActiveRecord capabilities. Just add a section in your config/database.yml with a distinguished name (i.e. second_db instead of production), and add this to your model:

See the docs. Found here.

Troubles with making a form with validation but without database

Rails3 has a built-in solution for creating a form that use ActiveRecord-style validations, but are not backed up by a DB table. Indeed, not reusing this functionality would be a waste. But just including ActiveModel::Validations makes your form yeild errors like this:

undefined method `to_key' for #<Loginpost:0x000000038de700>.

Listen to a short RailsCast on ActiveModel, and learn how to add some magic to your model. Here's how it should look like:

Validation of non-database attributes

Want to validate the range of numbers in your temporary attributes? You write

but get a kind of "undefined method something_before_type_cast" error:

undefined method `quality_before_type_cast' for #<Image:0x7f579f5fe940>
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/attribute_methods.rb:255:in `method_missing'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `send'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:1030:in `validates_numericality_of'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:479:in `validates_each'
/usr/lib64/ruby/gems/1.8/gems/activerecord-2.3.14/lib/active_record/validations.rb:476:in `each'

The error trace makes a hint about the solution:

For ActiveRecord attributes, these methods are automatically generated. However, for our attributes that are not backed by DB columns, we should create on our own.

Uniqueness validation column name error

I created a model, added an `validates_uniqueness_on`—uniqueness validation—and tried to create a valid record. However, I was getting the NoMethodError: undefined method `text?' for nil:NilClass error no matter how hard I tried:

NoMethodError: undefined method `text?' for nil:NilClass
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:57:in `build_relation'
        from /activerecord-3.2.2/lib/active_record/validations/uniqueness.rb:25:in `validate_each'
        from /activemodel-3.2.2/lib/active_model/validator.rb:153:in `block in validate'
        from /activemodel-3.2.2/lib/active_model/validator.rb:150:in `each'
...

I checked the code, and realized that my uniqueness validation referred to an association name, instead of the database column name:

It seems that you must specify the exact database column name in the uniqueness validations since they are closely integrated with the database.

Connecting to a local database via TCP sockets

If you want to avoid using file sockets for connection to a local database, it's not enough to just remove the socket line. Rails will replace it with the default socket configuration if you use localhost as your host. To connect via a network socket, you have to change "localhost" to something else by adding an entry to /etc/hosts, for instance.

Housekeeping

Get list of Rake/Capistrano tasks

When you use rake tasks, or a similar program (such as Capistrano), it's easy to forget what actions it provides. Googling for the list doesn't help either. Where can you find them?

They're right there:

These commands print the list of all tasks. With cap -e task:name, you can get a more extended help.

Redo specific migration

$ rake db:migrate:redo VERSION=20111122143422

Rake can also go several steps back/forward in migration application; you now know how to see the docs (rake -T).

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 has

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

MySQL gem and Mac

If you install mysql onto a MacOS with MySQL 5.5, you may encounter problems that show up as a strange exception:

rake aborted!
Constant MysqlCompat::MysqlRes from mysql_compat/mysql_res.rb not found
Constant MysqlRes from mysql_res.rb not found

Tasks: TOP => db:migrate
(See full trace by running task with --trace)

It turns out that the problem here is a wrong mysql gem version. Uninstall the existing gem, and install 2.7 version, fixing some errors afterwards with a symlink:

$ gem uninstall mysql
$ gem install mysql -v 2.7 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

Source: Sergey Ler told me about it.

Using Ruby 1.9 on Ubuntu

Ruby 1.9's default virtual machine is superior to that of 1.8. Besides, version 1.9 supports OS-level threading, and some multithreading bugs fixed.

If you are using Ubuntu, you have "alternatives" mechanism for setting ruby version. But. please, do not forget to update alternatives both for ruby, and for gem command to 1.9! Otherwise your gems (such as passenger) may end up in an improper place, and won't run successfully.

Talking to Web APIs

Specifying POST and GET PARAMS at the same time

If you are to POST some HTTP data to a URL that has GET parameters, documentation used to elide this case. The correct way to do this is to create URI object, and supply it as a whole, not just uri.path.

res = Net::HTTP.post_form(URI('http://example.com/webservice.php?token=' + TOKEN + '&function=api&id=50'), {:post_param => post_value})

Alternatively, you may supply a URL instead of URI, but I recall we had issues with this, though the exact reasons are now forgotten.

Converting data structures to POST form data

It's strange that Rails doesn't have such a method (I couldn't find it in version 2.13). When you invoke API of a web service, you sometimes need to convert arrays, hashes and other structures to a result that looks like a HTML form that Rails itself uses. For instance, this structure:

should be "flattened" to a "form field name"-"form value" set like this:

The "solutions" found in the internet seem to have been written by programmers who are not aware of recursion. Here's the correct one I wrote:

Russian Support

Here's a gem that adds Russian support to your Rails messages in default form validation and helpers. Russian morphology differs from that of English. In this plugin, the issues with date and proper clause of plural nouns are solved, though the difference in nouns on form create/update nouns is not.

Read on | Comments (0) | Make a comment >>


Optimizing Massive Data Views in Ruby on Rails

Ruby on Rails is a web application development framework most known for its flexibility and heavy use of metaprogramming. Rails programs, as one of my peers described, "look like they explain what to do rather than how". That's true. This declarative nature of Rails, however, comes at a cost. At a cost of performance, of course.

A couple of days ago, I was developing a tree-shaped web board engine (some of you have probably forgot what these are; here's the engine I'm creating a clone of). I designed a data model for users, authorization, threads, messages, message revision history, etc, and implemented a comprehensive set of operations with them, such as creating, displaying, editing, and so forth. The implementations looked nice; it consisted of carefully engineered bits of declarative programming.

This is how the index page of the forum should look like:

You may notice that all messages from all threads are displayed However, when I tried to display several thousands of messages on a single web page, which is an average figure for an board like this, it appeared deadly slow. Nobody would wait 32 seconds for the index page to show. Caching would help, but as several messages are posted each minute, most cache accesses would be misses.

In the rest of the post, I'll show you how the speed of page displaying and rendering may be increased by a factor of 50 (fifty).

Data model

The key source of why the original speed was so surprisingly slow is that I designed a database layout that fits a relational database requirements, in which the data span across several tables that may be join-ed. The query to display the whole information should thus not be slow to achieve. Here's how it looked like:

The image above was generated with "MySQL Workbench" tool. If you supply a database with properly set up foreign keys, which I did manually, as Rails doesn't set them up for compatibility reasons, it will show you a nice entity-relationship diagram.

The messages to display consisted of the posts made to the newest threads, which were created within the last day. It's obvious that you could get all the necessary information from the database with a single SQL query with a lot of joins though, but no unions, the query being executed negligibly fast by even the "default" MySQL server. Why was it so slow?

How Rails make you program slower

There are two main reasons why the naïve way of displaying these posts was slow. By the "naïve way" I mean a simple view that renders each post and its parameters by querying its association attributes, like this:

You saw the tree that should appear as a result of this above.

How the controller would look like for this view? Due to database design, it should select a thread, and render a tree of its root (head) post. So the controller would look like:

So, what happens when a view is rendered? For a single thread, all messages are fetched by a single SQL query, and are converted to an array. The ActiveRecord has_many association within a thread object automatically does this; this operation is relatively fast. Then, we assemble a tree structure of the posts (it works very fast, and is not shown here; the result is returned by the build_subtree method), and render these posts in the proper order.

Trying to render a post, Rails encounters its title function that queries the latest revision of a post's title. This query is resolved as querying the TextContainer, and, subsequently, the TextItem with number = 1, and revision matching the latest_revision of the TextContainer. Then the same is repeated when Rails renders the post's user name: it queries user object for each of the posts separately. Therefore, for each post, Rails performs three separate queries to the database, each involving the whole ActiveRecord's overhead for initiating a query, finalizing it, and packing the individual objects.

These multiple and excessive database requests, followed by the subsequent conversion from SQL to "heavyweight" ActiveRecord models, contributed to approximately 25 out of 32 seconds of inefficiency at each call. Let's try to optimize them.

Optimizing ActiveRecord queries

One might infer from the reasoning in the previous paragraphs that ActiveRecord is just a bad framework for ORM. This is not true in two ways. First, the main objective of ActiveRecord is to provide an ORM that works with little configuration, or even without configuration at all. Second, ActiveRecord is mainly used for representing single-table models; in our case, however, we deal with data representation that spans several database tables.

This suggests the way to fix performance here. Indeed, we already noted that all the data required to represent the tree of posts, may be fetched with a single SELECT query. We can create an ActiveRecord model designed specifically for this query (you may call it a "view" if you like) reaping the benefit of having no configuration alongside. In the view, we alter the methods that query the information, so that they match the fields

We don't even have to alter the way the properties are queried in the view by introducing fake "proxy" objects to our new model, like this. table_tow.user returns a simple object that has a login method initialized with a value from the query. In my case that would be an overkill, but if you have a lot of views to fix, you may explore such an option.

In the case of the view presented above, we'll need the following fields, id, title, created_at, empty_body (instead of body.empty?, see sidenote), marks (which is a serialized array automatically maintained by Rails' serialize class method), unreg_name, user_login (instead of user.login, see sidenote), host, clicks, hidden. Not shown directly in the view, but used to build the posts tree is parent_id field. We'll call the model to comprise these fields FasterPost, and start with an empty model stored in app/models/faster_post.rb:

These posts will be fetched via Thread class. We already had a has_many association "Thread has many Posts", and we modify it using a standard :finder_sql attribute:

The settings_for is a global helper that denotes the current user, we'll return to it later.

While :finder_sql was originally used for slightly different purpose, we piggyback on its ability to execute arbitrary SQL query (ActiveRecord has only two such ways, the second is find_by_sql).

Let's try to fetch data bia this association, and we'll see an exception:

Mysql2::Error: Table 'database.faster_posts' doesn't exist: SHOW FIELDS FROM `faster_posts`

The original post model has a plural name (Posts) because the ruby standard library contains the singluar Post

Indeed, ActiveRecord doesn't know what columns there should be in the resulting table. We need to override a couple of methods; as of Rails 3.1.3, these are columns, and columns_hash. These methods are not really used, and we need to supply some ActiveRecord's internal structures for that. Instead of supplying the actual column information for our raw SQL, we simply copy the column information from Posts model (and cache it, as ActiveRecord itself does):

When we try to serialize this model for debugging purposes via the inspect method, we'll see that for our model is not really useful: it doesn't show all the fields. We may find the original Object's inspect method useful here:

That's all what's needed to make this model work (see sidenote for one more subtle issue). However, we may find out (with the profiler) that we still spend a lot of time inside some ActiveRecord methods. When a field method is called, it still gets through some ActiveRecord internal checks and conversions, which take a lot of time, and are, basically, useless for our lightweight model. Even that we have to enter method_missing to route the method to the actual field slows us down.

I found that ActiveRecord has a method that is close enough to the data fetched from the table, but it doesn't perform some conversions (for instance, boolean values are not converted, and are fetched as strings instead). It is read_attribute_before_type_cast (note, that its attribute should be a String, not a Symbol!). So, each fast field method should directly call it as "short cut". We can generate these methods for all the query fields with Ruby metaprogramming capabilities:

Do not forget to add overrides, which should be declared after this call, for methods that require additional processing (such as serialization):

or return Boolean or Integer values:

After these fixes are put in place, and the view is altered accordingly, all the database queries will not take any significant time. Moreover, this solution also complies to Model-View-Controller pattern, as the data are fetched in the controller, before the view is rendered. This is, probably, one of the rare cases when compliance to the Design Patterns does boost performance.

This, originally, decreased the original runtime by approx. 20 seconds out of 32. But there still are 12 seconds to optimize away. How could this be done?

Optimizing the view

ActiveRecord optimization is not enough, though. A lot of time is spent in the view itself. I had to use a profiler to do that, and made several wondrous discoveries. Using profiler is quite simple in Rails 3. All you have to do is to start a server, and invoke in the console

$ rails profiler -r 5 'get("/")'

and open tmp/performance/ProfilerTest#test_get_process_time_graph.html in your browser. An annoying downside is that it prints its log to the server's console (but you're not trying to do anything while profiling, are you?) Here are several points to optimize I found.

Link Generation with link_to

One of the greatest offenders was the code that generates links. The usual link_to and autogenerated path methods, such as post_path(post) are very slow, compared to the rest of HTML-generating code. However, if we try to generate URLs in our view on our own, it may no longer be enough to edit config/routes.rb to change all the methods in the application. We'll lose the flexibility of Rails we love it so much for!

Instead, I used a kludge for link generations. I assumed that links to all posts look the same, and the difference between them is only the post's ID. Therefore, if we generate a link for one post the Right Way (with link_to and ..._path functions), dismantle it, and create other links as simple string concatenations from the resultant parts, we'll avoid this overhead. Here's the helper I created:

It creates a link generator that generates the link in the fast way, provided all the links are the same. Calling this method from the view is as clean as calling the link_to:

The method returns a proc object (function), and brackets are used to call it with post parameter.

Authorization

Declarative Authorization is a very nice plugin to manage authorization. It's declarative, and is very convenient to use once you carefully read its manual, and start naming your before_filters by its rules. However, I found out that it's terribly slow when you are to query authorization rights for thousands of posts.

I merely got rid of authorization at the index page, and moved it to a page of individual posts. But you could add more JOINs to your SQL, fetch user roles, and write some lightweight authorization methods instead, if you really need it.

Another way to optimize authorization could be the same kludge that was used in Links optimization. In most cases, if you are authorized to delete a post, you are also authorized to delete all posts in this threads (or in the whole forum). You can try to piggyback on these assumptions.

User-specific settings

How could we optimize user-specific settings on a large scale? Here's the excerpt from the view code I'm referring to:

This excerpt implements user's functionality to show/hide posts in its own view, not affecting what other users see. In the big SQL statement above you could notice that we join the hidden_posts_users to fetch the user-specific settings. Here we also encounter the same problem that with the links, but the "simple" link generator should be slightly more complicated:

but its usage will become even simpler:

Note that wile you supply the block that translates the strings thousands of times to the fast_hide function, it will be called only once, at the very beginning.

Translations

I won't dive into here too much, since it's obvious that you should not call t method that translates a string according to the current locale thousands of times for the same string. Save the result of a single call into a variable, and re-use it. It won't change as you render parts of your view anyway.

However, this doesn't save you really a lot of CPU cycles. I found out that calling t("Show subtree") (see the example above) approximately 2500 times only takes 0.05 seconds. But if you multiply that with the number of strings you have to translate, this may grow large.

Caching serialized attributes

The default (de)serialization procedure for ActiveRecord's serialize generator does not cache its results. Caching may not be useful in all cases, of course. However, in my case, there was a total of four possible values for the serialized column ("the post has/has not a picture/a link inside"), and thousands of rows. To use caching (and short cut some ActiveRecord's default useless code), you should just pass your own serializer to the method:

I'm not sure that this behavior is documented (I extracted it from the code, and can't find anything about it in the help), but it works.

Avoid excessive render calls.

This may not be the case for you, but if you call render :partial thousands of times, this just can't be fast. Instead, use a single partial template that renders many items at once. My measurements show that this may save up to 20% of ~ 11 seconds runtime.

If your template is recursive, and you user HAML (not ERB), you'll probably have to move your code in some cases to a helper, and print raw HTML in order to keep the markup (for instance, if you use recursively-folded lists).

Raw HTML instead of Templates

If it's still not fast enough, you may print raw HTML. I don't have any measurements of the effect of this operation alone, but you may try that. This will clean up your profile, and you'll see the offenders more clearly

Make sure to use String Buffer pattern (as I encountered it in many languages, I can call it a pattern). Instead or using + operator to concatenate strings, allocate a buffer string, and append other strings to it. In some languages, such as Java or OCaml, it's not enough, and you should use a separate StringBuffer objects, but, in Ruby, Strings are mutable, and any string may serve as a buffer. By the way, HAML templates are already aware of this pattern (and, I guess, ERB templates are, too), so this only should be used if you already chose to print raw HTML on your own by other reasons (such as described in the previous section).

Other speed increase advice

As you might have noticed, my blog is very slow. It takes about 2 seconds to render any page. The blog was my first Ruby on Rails application ever, so it surely had some design mistakes. On the other hand, three years ago, Rails (version 2 at that time) was much less mature than today, and many speed improvements, such as monadic database querying, was only introduced in version 3.

I figured out the reasons why the rendering is slow, but today I just don't have enough time to fix the code. I'll just describe what mistakes I made and how to avoid it.

Touch your parents when you change!

A good advice in the real life as well, touching parents is also a good practice in Rails. Assume that you need to sort threads by the update time of a thread (I'll use the forum data model as an example). An update time is a time of creation of a post in a thread, or an update time of a post in the thread. When we click "edit" on a post, and complete updating it with save call, the default behavior is that the updated_at field of the post's database record is updated to the current time.

The updated_at record of the thread, however, is not updated automatically. Unless you do this, you will have either to write SQL queries, or just to load all threads, and check all posts for each of them. Rails' belongs_to association has :touch option, that will make it automatically "update" parents when you save children. This will solve the problem of fetching updated threads with little effort.

If you have an abstract "content" type, so that all text items on your site are stored and edited in a uniform way, you will find that :touch neatly integrates with polymorphic associations.

Unfortunately, :touch is only included in Rails 3.

Use monadic database queries

I used to fetch the last blog post in my engine by blog.posts_ordered.first, where posts_ordered is a non-standard ordering of posts. It first fetched all posts, reordered them, and selected the first, which was overly inefficient.

The solution to that would be monadic queries (see find method documentation). Rails 3 allow you to "chain" database queries joining the conditions without executing the query. After you chained all the queries, you may "bind" the query object, and execute the query. It will look like this:

Only one post will be fetched and converted into the ActiveRecord object this way.

Unfortunately, this is only included in Rails 3. In Rails 2 I had to write my own "monadic" query wrappers.

Slow and Loving It

For many cases, however, the performance of Rails is good enough. It can't compete with pure C++ web applications, of course. Rather, Rail's stronger side is describing complex behavior with little effort, with just a few lines of code. A lot of problems I encountered were due to bad performance of overly complicated data models, or with showing thousands of records on a single page. Rails is not unique in such poor performance; for instance, it takes Bugzilla 3 (written in Perl), like, 20 seconds to show a bug with seven hundred comments.

The cool thing in Rails, however, that it's a flexible framework. If you really need to display a thousand of records fast, you can do it. Nothing in the framework prevents you from creating shortcuts to its low-level functionality (Ruby language plays an important role in this), and nothing prevents you from swapping its parts with your own fast code pieces, where it's really necessary. In the forum engine I used here as an example, only a couple of views should display thousands of records. Optimized models and views may be only assigned to these "hot parts" of the application. The rest of functionality, at the same time, takes the whole benefit of the easily configurable framework, of declarative coding, and of rapid feature development, everything I love Rails for.

Read on | Comments (0) | Make a comment >>


A Visit to the Computer History Museum

There are not many museums around the world that are specifically devoted to computers. The two I know about are Bletchley Park, located 70 km away from London, and Computer History Museum. The former is not about computers per se, though; rather, it's about the decrypting German messaging during the World War II that happened right there. And the latter is completely devoted to computers.

I visited this Computer History Museum two weeks ago, during my short trip to the Silicon Valley. Here's a couple of photos I made there.

Here's my personal favorite, a slide rule tie clip (as I love ties, I want such thing either!)

Oh, here's another Penn&Paper's solution. It's a clone of Microsoft Visio diagram sketching tool:

Nowadays, you may measure the uptime with a single shell command. Back then, there were devices specifically designed for this:

"Man" is a shorthand of "manual" in Linux word, and not what you, a feminist, thought about.

Need a man to learn what uptime is? Here's your man:

And, of course, a full-size working copy of a Babbage machine (I didn't see the demo, bu they say it works!)

The museum has more of these. Due to my schedule, I only had a bit more than an hour to visit it (note that there's no admission on Mondays and Tuesdays!) But if you'll be around, it's totally worth it. Don't miss it, it looks like this:

(And if you happen to find yourself thirsty, don't go to that bar across the street: the service is quite bad there.)

Read on | Comments (0) | Make a comment >>


The Most Stupid Mistake You Can Make with Ruby

Programming languages have strict and mostly very compressed syntax. The origins of that are twofold.

First, actively used programming languages have a considerable history, and were started decades ago. Back then, screen space was limited, and, before that, there was little storage available for source code. Second, programmers have technically-oriented minds and like to endow abstract symbols with complex abstractions instead of using words to describe them. That's why, I guess, many programming languages heavily depend on correct usage of individual characters. Even as few characters as one may make a difference in program behavior as well as in whether program compiles.

Such caveats in different languages include:

  1. C++ templates. You can't write map<int,vector<int>> when you define a mapping from integers to arrays of integers, because C++ parsers thinks that >> as a bit-shift operator in an improper place. A correct program differs by a single space between the < signs.
  2. Makefile tabs. A rule body in the Unix Makefiles should be indented with a Tab. Spaces do not work. Being a software with more than 30 years of history, make had it fixed only a year ago.
  3. CSS delimiters. When you define a cascading style sheet, amd want to define the same block of style attributes for a certain class inside a certain tag, you write the selector as tag .class. It's just a space away from tag.class that defines the styles only for tag elements of class class.
  4. C-style equality in conditional statements. If you're a seasoned professional, you should have already forgotten about this. In many languages, including C, Ruby, Python, Javascript, C# and others, if (a = 1) is always true, since it's an assignment of 1 to a, followed by checking the a's value for truthfulness. The correct version of that has one extra equality sign: if (a == 1). More confusion is created by languages, where the first version is the legitimate way to compare values, such as Pascal.

Imagine how surprised I was when I realized that I made a variation of mistake #4 in my Ruby code today! Here's what it was about.

Ruby hash niceness

Named parameter is a way to specify function arguments at call by name (rather than by order, as in the standard function call notation).

Here you may find how the Named Parameter is implemented in various programming languages.

To emulate a Named Parameter Idiom, Ruby uses hashes and some syntax sugar. The last parameter of a function may be a hash, which maps from parameter names to values. Syntax sugar allows a programmer to write

This sugar is not unique to Ruby; Perl also supports it.

instead of

The :name notation specifies a symbolic constant, which effectively is an immutable string that is defined by only one ancillary character.

Such hashes and symbols seem as a very useful feature. It allows you to emulate DSL-s; here's an example of Ruby on Rails web framework routing configuration:

Until one day, after an hour of debugging you find yourself having written something like this:

See what's wrong here? Indeed, here's how the code of the function called might look like:

So, options[:attribute_check] should evaluate to false boolean value, but... :false is a totally different thing; it's an immutable string of five characters that evaluates to true instead! Just one colon that lurked into the code, and it made it behaving very wrong way.

Just like in a C-style typo, some expressions that are evaluated as true in boolean context look like those that are evaluated as false, and you should be careful with the borderline.

New named attribute definition style in Ruby

New named attribute passing style was not designed to address this problem. However, the abundance of colons in such a code makes it look worrying in case there is a mistake like the above:

You see that the mistake is easy to spot, because the conjunction between the name and the parameter value is so ugly, that it immediately draws attention. However, if you actually need to specify symbol as a value, then you'll have to look ugly with this style.

Moreover, you can't erase the space between the parameter name and value, because for this code:

Ruby parser will think that it's false method in an attribute_check object, as :: is a scope resolution operator, just like in C++. Space matters again, as in typo #1 desccribed above.

People say that this style resembles that of C# or JSON. So, maybe, it is a good idea to migrate to it. Only two things prevent me from doing this so far: it's not portable to previous version of Ruby, 1.8 (though it slowly becomes obsolete), and I find the old style look much more cute :-)

***

This was yet another typo that makes our programs behave differently than we expect them to. And again, it was just one character that breaks the expected behavior of if statements that implicitly convert the condition to boolean type. Unfortunately, while the new Ruby named parameter syntactic sugar could help, it sometimes looks even worse to me.

I hope this post will help you avoid the similar mistake if you code in Ruby.

I would like to end with a joke about a mythical C++ programmer leaving hist last commit after having been fired:

Happy debugging to you, indeed!

Read on | Comments (0) | Make a comment >>


Relativity of Simultaneity in Distributed Computing

About a year ago, I described an allusion to physical phenomena in computing in "Caching and Levers". This post is devoted to a more complex theory, namely the Special Theory of Relativity (STR), and mostly to one of its implications, the "relativity of simultaneity".

Relativity of Simultaneity

Special Theory of Relativity may be approached as a set of equations for coordinate translation that take their effect as the speeds of objects approach the speed of light, c.

STR is based on two postulates, one of which states that light is always propagated in empty space with a constant velocity c regardless of the speed of the emitting body. This may be used to introduce a tricky paradox.

Train car experiment

As seen by a passenger of the train. The events happen simultaneously.

As seen by a spectator on the station. The "back" event happens earlier than the "front" event.

(pictures from Wikipedia)

Assume that a train car is passing you when you're standing on a station platform. The train car has a lamp installed exactly at its middle. When the train passes you, a lamp flashes, and then you watch when the light reaches the front and the back of the car. It's interesting that those who sit in the car will notice the light hitting the two walls simultaneously, while those who stand at a platform will notice the back wall lit earlier than the front one. This is implied by the postulate described above: as seen by a spectator on the platform, the light propagates with equal speed at all directions, so it will hit the rear wall before the front one, as the former moves toward the light source, while the latter reproaches it. See wikipedia for more info.

This paradox, known as relativity of simultaneity may be formulated more generically: whether several events are simultaneous, depends on the location one watches them.

But what does it have to do with the computing? Before I answer this, I'd like to share what I've learned from a CACM magazine.

Quiescent consistency

In the multicore age, the classical data structures we all know about are about to change. The reason for that is the increasing demand on computation speed and volume that careful "sequential" CPU chip engineering is no longer capable to provide.

This challenge makes us rethink our approach to algorithm and data structure design. If we want data strctures to be efficient, we no longer may expect them to behave as good old ones in the sequential edge.

In distributed programming, there is an approach to describe data structure behavior known as quiescent consistency. There is a number of consistency conditions, sequential consistency, linearizability and others. These conditions describe how an object behaves when there are several threads calling its methods.

A data structure possesses quiescent consistency if it is consistent between its states of quiescence, i.e. when there are no methods currently in progress. As soon as a quiescently consistent structure has no operations pending (i.e. reaches the quiescence), we may be sure that the executions of methods before this state and after this state is never interpositioned.

Imagine a quiescently consistent stack. An implementation of it is described in this CACM paper "Data Structures in the Multicore Age", the one where I first encountered the quiescent consistency concept. Assume the following sequence of events happen to the q.c. stack:

  1. Three pushes x,y,z
  2. Quiescence (the pushes are processed)
  3. Three more pushes a,b,c
  4. Quiescence (the pushes are processed)
  5. Three pops
  6. Quiescence (the pushes are processed)
  7. Three more pops
  8. Quiescence

Note that q.c. doesn't mean that a data structure guarantees nothing except for this specific consistency. Consistency condition only maps data structure behavior in a concurrent setting to a behavior in a single-threaded environment, i.e. it only limits the number of a multitude of different sequences of method calls that may happen for a specific set of multithreaded events. All the properties a data structure exhibit in this sequential processing should still apply.

Quiescent consistency guarantees that the first three pops return x, y, and z (in an arbitrary order), and the next three pops return a, b, and c, somehow intermixed as well.

Note that, unlike linearizability, quiescent consistency doesn't impose any ordering on results of pops if there was no quiescence between the initial pushes. For instance, if processing of the first and the third pushes do not overlap, the linearizability ensures the pops will respect this order, while q.c. (in case that the second push overlaps with both of them) doesn't ensure that.

Having noted that, q.c. looks like a very weak and useless property. However, it still implies correctness, which, however, is enough in many circumstances. Imagine that you need a pool of unique numbers. It is enough to use a q.c. counter; the numbers it returns will be unique, which should fulfill our initial purpose.

Do we really need stronger consistency?

However, the reasons why we may be satisfied with weaker consistency conditions are not constrained with the specific examples. Let's try to prove the opposite. Assume we're implementing a stack that is accessed from a number of threads. Quiescent consistency may be not enough because if a push of A precedes the push of B, then the pops should be ordered in the specific way, and quiescent consistency may not guarantee that.

But wait... We say, "If one push precedes the other," but do we really know what "precedes" mean? If two threads in our distributed computational system invoke push call independently, how can we be sure that one of them precedes the other? And is there any sense in defining a measure for that?

Let's reckon the paradox we described at the beginning. In one reference frame, one event precedes the other, and in a different reference frame it's vice versa. So whether one push happens before the other, depends on the place we look from, and is not—and may not be—determined by a central, ubiquitous authority. That's a law of the universe, and we can't change this in computing.

Surprisingly, it makes quiescent consistency be a fairly appropriate constraint for a distributed data structure. If there's no strict sense of precedence between events that happen in different places of our network then why can't we use this to develop a more efficient data structure? The correct answer is, "indeed, why not?", and such data structures are well-defined nowadays.

OSI model

OSI model is an abstraction that aims to make the design of distributed computation systems easier. The chain of events that happen during a process of data transmission is separated to several separated layers: each layer has its own protocol, which is independent of the actual implementation of the underlying layers. You may learn more at the wikipedia.

STR vs computing: the difference

We have successfully used a metaphor from physics in distributed computing, but there is an inherent limitation of applying the concept further. In physics, if we have a car of a specific length (as measured by its riders), we may install the light source in such a way that the events at two sides of it happen predictably simultaneous in the reference frame of the spectator at the station.

In computing, we can't do that. OSI model prevents us from predicting how fast the events happen by masking out the capabilities of a system at lower layers. In physical reality, we know that as soon as the light "hits" the back and front covers of the car, the event "happens" (or, more precisely, a piece of reflected light is launched towards the spectator). In computing, however, we don't even know how long it takes for a signal to reach another node. Moreover, the OSI model makes this impossible to predict. All the layers guarantee correctness, but do not have any timing conditions.

Conclusion

The absence of timing in the OSI model suggests that may be no sense in saying, "this structure may not be used, as it processed the request in the different order than they're issued in". The "order they're issued in" is inherently unpredictable in our computing models.

This inherent unpredictability of timings of processes and events, as described in OSI model, is why we really can't apply the special theory of relativity to computing. However, we still may notice that simultaneity in the world around us, as confirmed by physical experimentation, is relative. And the nature still works well!

And if the nature accepts it, why can't we learn from it, and allow our distributed systems to be less predictable, and trade this predictability for speed?..

Read on | Comments (1) | Make a comment >>


Typical Janitor's Failures

How easy is it, to write a program that removes a file or a folder? Okay, assume you have all the power a modern operating system provides you, including rm shell convenience command, and unlink system call. Do you really think the answer is still "very"?.. Then take a look at these examples.

Permission trap

Today one of my peers, Denis, was writing a program that somehow analyzed packages in Mandriva Linux distribution. Since trying to analyze the contents of a package without unpacking it looks more like gynecologist's work than that of a programmer, the software Denis wrote unpacked it first. Then it analyzed the package, and erased it with rm -rf in order to avoid disk pollution.

But sometimes it failed to erase it, which led to strange failures of the whole program.

The reason was the permission trap. Common sense tells us that if you can create a file, then you can remove it. This is wrong when it comes to uncommon permissions. Certain software that is very secure sets up specific attributes for certain files. In particular, it revoked writing permissions from certain catalogs, which prevented recursive removal of the unpacked files. You may try it yourself:

$ mkdir -p /tmp/try/this; chmod a-w /tmp/try; rm -rf /tmp/try
rm: cannot remove `/tmp/try/this': Permission denied

Oops. Even if you're the owner of the file, you won't remove it. Even --force doesn't help. To avoid this, I guess, you should chmod files recursively before removing them.

Why does Chromium remove its files twice?

By the way, I encountered a couple of engineers from PVS Studio when they visited Institute for System Programming, in which I worked; they were looking for a mentor for a Ph.D. thesis one of them was going to defend. Small world!

One more totally not interesting thing is that they are not going to support Linux because there is no enough users of their software on that platform. This is both funny and sad: on the one hand, Linux should have much more support, but on the other hand, if your static analysis tool is not decoupled enough to be somewhat platform-independent, then you do not deserve a Ph.D.

I read about this interesting issue here. Most likely, it won't hit you in the nearest years if you're using Linux, but for Windows users it might be interesting and maybe useful. Let me quote the piece of code they referenced

Why in the world would anyone try to remove a file twice determinedly? What was the programmer trying to achieve? Let's quote that article again:

A file can be definitely removed or cannot be removed at all only in textbooks and in some abstract world. In the real system it often happens that a file cannot be removed right now and can be removed an instance later. There may be many reasons for that: antivirus software, viruses, version control systems and whatever. Programmers often do not think of such cases. They believe that when you cannot remove a file you cannot remove it at all. But if you want to make everything well and avoid littering in directories, you should take these extraneous factors into account.

"Remove" vs. "unlink"

Okay, so you've managed to remove a file. The question is, have you freed some space on your disk? The answer is maybe.

Removing is called unlinking on purpose. What you do by "unlinking" is removing a reference from a directory index to a certain chunk on your hard disk. The chunk, nevertheless, may persist, especially if there are more references to it. One of the possibilities is hard links, but you explicitly created them. Another could be a badly working file system, but that was your choice.

However, there is a very harsh way to experience the difference between removing and unlinking. Assume you have a long-running program that writes something to its log file. The information it writes is not very useful, and you might want to discard it. As the program spends hours of working, the log file grows larger and larger, and you realize that you really should erase the log to prevent your disk from filling up. You rm -rf log_file, and... nothing happens. The log is still being written somewhere, and there is as little space as there was.

The thing is that you unlinked the file, but the file system's directory index wasn't the last place it was referenced from. Another place is the file handler of the program running, and it will be referenced until the file is closed. Otherwise, as far as I know, the only way to fix it is to shut the program down. So, do not turn on too much logging by default.

Uncannily removing sensitive files

Of course, you know this. It's like knowing that you should move carefully in a room full of rakes and nevertheless getting hit by one because it's just so obvious...

These are classical rm -rf $FOO/* when $FOO happens to be unset... These are when you type rm -rf /etc, and thus crush your system, instead of rm -rf etc to remove a backup copy of /etc you were so prudent to create. These are the famous bumblebee bugfix—or was it mere viral marketing?

***

Usually, you think that creating is easy, and destroying is simple. In many cases, however, it's not true. In Russia, for instance, it's much easier to open a business than to shut it. And when it gets to removing files, it's also sometimes true. So, if your program really depends on successful file removing, it should be more careful with such a seemingly simple operation.

Read on | Comments (2) | Make a comment >>


Do not use Btrfs!

"Btrfs" is a new, modern filesystem, which has always been a nice thing to try for any Linux aficionado. When you're bored, and if your Linux installation is too stable, you can always try one of new cool filesystems to make your machine faster, cooler, and less stable.

ReiserFS, named after its developer Hans Reiser, was the first Linux filesystem with journaling support. It also was known as the fastest file system to deal with large amounts of small files. It had none of the Btrfs' problems I describe here.

The successor of this filesystem, Reiser4 should have had even better small file support. It also was innovative, and allowed transient compression of content (which worked, by the way), but it has never been finished.

Hans, being in prison, doesn't continue the development of the new filesystem, and, it seems, that others just fail to keep the pace. reiser4 is still not included in Linux kernel. I'm pretty sure that it's not just because of non-technical "personality issues", but of architectural incompatibility with the kernel philosophy (and this) as well.

It's sad that prison seems to have disrupted Hans' mind. We probably will never have the same drive in Linux file system world.

After Hanz Reizer has been sentenced for a dozen of years in prison for killing his wife (I'm pretty sure he thought he had good reasons), we can't take reiser4 seriously anymore. I had a sore experience installing reiser4, but, unfortunately, it has never entered a stable state (and, most likely, never will).

When Hans, in addition to his wife, had killed my root filesystem through his unstable evil creation, I decided to install something cool again. My choice was Btrfs. It is a new fast filesystem, and it was claimed to be fast as hell. Moreover, it became the default filesystem for MeeGo mobile Linux OS standard, which is now nearly dead due to unrelated circumstances.

Aside from being just new, Btrfs offers a lot of unique features. One of them is that it's a copy-on-write filesystem, which allows a user to make snapshots of the whole file system on the same partition without consuming twice as much disk space. You can also roll back to a saved state immediately, without re-writing everything. However, I recently learned an already known way to make these operations fast with any filesystem, so Btrfs' advantage here is questionable. An "only" advantage left is its speed.

After I used Btrfs both at home and at work, I now advise to steer clear of it. At least, until its issues are resolved.

I'm not a kernel hacker of any sort, I haven't looked at Btrfs source code, and I never even read its specs. I just share my experience with it as a user. The post may already even be out-of date, and the state-of-the art tools and implementation may have already fixed the problems I describe.

Lack of proper tools

Btrfs still doesn't have file system checker that fixes the FS when computer was powered off or reset. Reference: its own wiki. I really don't understand how come Btrfs was used at real devices. I remember I really once had to run file-system check for my N900 mobile phone after an unfortunate battery exhaustion. Will my MeeGo device lose all my data after its battery dies?

And you know what? This "we still don't have file system recovery tool" message has been hanging there for more than a year. I guess, it will never be actually implemented.

So, it's already a good reason to never use it for sensitive data, such as /home partition.

Irrecoverable disk pollution

Okay, so, Btrfs is not very reliable to have, for instance, your /home partition on it. But we could use it for a temporary file system that doesn't store important data!.. could we?

I tried Btrfs on one of the nodes of our Linux Driver Verification cluster as a file system for a 130 Gb partition. Here's the workflow the filesystem underwent:

  1. Write a lot of small files
  2. Realize that the disk is nearly full
  3. Erase the small files written
  4. Go to step 1

On day I realised that the cluster node just was not working. I checked the logs and I realized that there was no space left on the device. I ran df -h to check how much space there really was, and it showed that there were... 13 gigabytes free!

It could mean only that Btrfs just damn secretly used more than 10% of disk for its own metadata. Using more disk space than the overall file size is normal, but here it uses it unpredictably: Btrfs lies about the real amount of space left. Some mailing list postings confirmed that, but I can't find the links now. The only suggestion available was to "balance" the btrfs with its tools. But it didn't work for me!. The file system remained in its unusable state.

Then I decided to try to remove some files. I typed rm -rf directory/, and... File removal resulted in a "no space left on device" error! So when I try to clear some space with rm it just outputs "no space left" errors. How nice.

However, it actually did remove the files, but the process merely took more time than usual (about 2 files removed per minute), and had less than 1% success rate. The recommended elsewhere echo >file/to/remove didn't work either. After 30 hours of removing, the filesystem has traversed the 130 Gb folder, and removed a couple of gigabytes successfully. The subsequent rm -rf cleared the disk in no time...

... just to enter the same crippled, unusable state, but with 27 Gb "free" (more than 20% of the filesystem) after a couple of weeks! That time, even rm -rfs couldn't help. We re-formatted the partition into ext4.

By the way, Edward Shishkin is the primary developer of reiser4 at the moment, i.e. he's a competitor of Btrfs. If it was not for my own experience, I wouldn't think that his observations and conclusions are grass-root.

I guess that my observations confirm Edward Shishkin's review of Btrfs: the filesystem is space-ineffeicient without any good reasons, and its tree balancing algorithms are apparently broken.

And it is also not fast!

Okay, but, initially, I mentioned that Btrfs was, at least, fast. Not just working with lots of small files should be faster, but the overall workflow should have had the speed increase.

It does not.

My home PC uses Btrfs as its root filesystem. A media storage, which I keep movies on, however, uses XFS. Now guess what happens when I try to play an HD movie from the XFS partition?

No, you're wrong. It doesn't play smoothly. It just stalls unless I increase the I/O priority of the movie player to the highest value. Because btrfs-transactio process wakes up each five seconds and sends my PC to a non-interruptible I/O sleep. This renders my PC unusable for less than 0.5 second, but it is enough to crush my HD movie experience: it just wouldn't play because disk throughput is not enough for a small, but significant amount of time.

In production, however, I also noticed that. A month ago I introduced a change to my BLAST project. I now realize that it was the Btrfs to blame, but back then I just saw that when a process is about to write to a file 10 times per second it sometimes stalls for a second before the write succeeds. This is intolerable for any serious file system.

Perhaps, during the common usage cycle, when files are not re-recorded several times per second, it wouldn't be an issue at all. However, the way I use file systems at work is not "common", and it should satisfy more requirements than those of users that just never write to their files.

***

Finally, the solution to Disk I/O speed seems to have come not from tricky file systems. Solid-state drives (SSDs) with their awesome read/write speeds have made filesystems much less relevant. The main challenge for file system developers would be to adapt to SSD's requirements. And it is natural! We all are pretty damn sure that sequential reads-writes should be faster than random I/O; we are used to placing our swap partitions at the beginning of the drive, to increase the speed they're accessed. We treat magnet disks requirements as granted without really noticing this.

Time to change the paradigm. As of today, if you want a faster disk I/O, the best choice would probably be an SSD, not a faster file system.

To me, as I described above, Btrfs doesn't seem a good choice both at home and in production. Its features do not overcome the fact that it's just unusable. Until it's fixed, do not use Btrfs.

Read on | Comments (0) | Make a comment >>


Ruby-ran-off-the-Rails

Recently I ran a software update from Ruby on Rails version 2.3.5 to 2.3.14. You might have noticed the "maintenance" message all pages were redirected to last Sunday. Of course, after the update completed, I clicked on a bookmark entry "My Blog" in the browser window, and, having acknowledged the correct loading, prematurely considered the update completed well.

I was wrong this time. But first,...

How I perform updates on my webserver

To automate the update process, I use a couple of nonstandard scripts. During the update, the web server becomes nearly unusable, so, instead of providing poor experience to users, I decided to show a "maintenance" page while the update is in progress.

In my virtual host Apache configuration file, I use a special variable, COLDATTIC_MAINTENANCE, to distinguish whether it should ask Mongrel for a web page, or should show a simple maintenance.html page. Here's how virtual host config looks like:

<IfDefine !COLDATTIC_MAINTENANCE>
# ... normal website redirects
</IfDefine>

<IfDefine COLDATTIC_MAINTENANCE>
DocumentRoot /var/www/coldattic.info/htdocs/
<Directory /var/www/coldattic.info/htdocs>
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>
# Redirect everything to index
RewriteCond /var/www/coldattic.info/htdocs%{REQUEST_URI} !-f
RewriteRule ^(.+)$ /maintenance.html [R]
ErrorDocument 503 /maintenance.html
Header always set Retry-After "18000"
</IfDefine>

This config makes the web server, if the -D COLDATTIC_MAINTENANCE is supplied to Apache's command line, to:

  • Redirect every request to maintenance.html
  • Show 503 Service Unavailable status code. Status code is important, because you should prevent web search engines think that this is an only page that's left on your site.
  • Notify bots that they should come in 5 hours (set this to average maintenance time).

To update the server I do the following:

  1. Set -D COLDATTIC_MAINTENANCE in Apache config (on my distro it's /etc/conf.d/apache2);
  2. Restarted system Apache by /etc/init.d/apache2 restart;
  3. Now, the apache is serving maintenance pages. I edit the config /etc/conf.d/apache2 back, and remove the maintenance define. However, I do not restart the server with new settings!
  4. I run a system update script, which, essentially, looks like update-software ; /etc/init.d/apache2 restart, so the server is automatically restarted without maintenance mode after the update is completed.
  5. Go to movies, since the update usually happens on early East Coast Sunday mornings, when it's Sunday Night in Russia, a good time to relax;
  6. Come home, log in to server, and deal with update failures >_<

I do not do the updates quite often, and they go well. After the update last Sunday, I noticed a sudden decrease of visits. This could be just a random spike, but after the visit number had decreased to less than ten the next day, I realized that the update broke something. I entered coldattic.info into the browser. It should have redirected to http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/, but what I saw was a raw HTML text of the blog page!

How Ruby went off the Rails.

How come? I checked the logs of the website, and notices an abnormal stacktrace:

Tue Oct 18 15:52:05 +0000 2011: 
    Error calling Dispatcher.dispatch #<NoMethodError: private method `split' called for nil:NilClass>
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/cgi_process.rb:52:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:101:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:27:in `dispatch'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:76:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `synchronize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:159:in `orig_process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `orig_process_client'
/var/www/coldattic.info/main/coldattic/vendor/plugins/spawn/lib/patches.rb:61:in `process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:282:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:128:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/command.rb:212:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:281
/usr/bin/mongrel_rails:8:in `load'
/usr/bin/mongrel_rails:8

Hm.

It only happened when the page request should result in redirect, but it worked for direct page accesses (that's why, perhaps, the visit counter was greater than zero: people could just user their bookmarks).

I ensured the localhost version is synchronized with the production, and launched WeBrick via ./script/server to play with the website outside the production server environment. But on my local machine it worked well! (No wonder, because there's no entry of my web application in the stack trace; the whole trace consists of web server and framework subroutines.)

Solution: monkey-patching

I googled for the error, and it appeared to be more than half a year old. This bug entry is, I guess, the first place where the error was spotted. In a nutshell, new version of Rails seems to have employed a previously unsupported use-case of Rails' web server interface, Rack. It interfered with, supposedly, partly incomplete implementation of Mongrel, a specialized HTTP webserver for Rails. And mongrel updates still haven't made its way to Gentoo distribution my web server runs (-1 to Gentoo maintainers reputation counter!).

The solution a lot of people (including me) used is here. It is a small Ruby file that should be placed... into your application startup directory (for Rails it's config/initializers/)! You don't have to patch your system libraries and make new packages for your distributions: all you have to do is to add a file to the userspace. The code there uses a well-known feature of Ruby, monkey patching. It alters the implementation of external library classes. I already demonstrated how this could be used to add some Perl-like features to Ruby.

So, fixing others' bugs is another, though less legitimate, way to employ monkey-patching.

***

Why did this happen in the first place? Two reasons:

  • Insufficient testing after software updates. I should have tried to enter the site as an anonymous user, instead of clicking on my own bookmarks. Without the fix above, the site worked well if the user who accessed it had cookies!
  • Deploy the exactly same environment on testing machine as on the production, and perform the tests there before upgrading.

I thought that Mongrel and the whole Rails stack was stable, but, I guess, it suffers from fragmentation. Ruby on Rails stack consists of a lot of components. While this fosters innovations, the interaction between their developers is not tight, and there also is no common governance; this are the reasons why such strange bugs get shipped in stable versions.

Read on | Comments (0) | Make a comment >>


More posts >>