A Visit to the Computer History Museum

There are not many museums around the world that are specifically devoted to computers. The two I know about are Bletchley Park, located 70 km away from London, and Computer History Museum. The former is not about computers per se, though; rather, it's about the decrypting German messaging during the World War II that happened right there. And the latter is completely devoted to computers.

I visited this Computer History Museum two weeks ago, during my short trip to the Silicon Valley. Here's a couple of photos I made there.

Here's my personal favorite, a slide rule tie clip (as I love ties, I want such thing either!)

Oh, here's another Penn&Paper's solution. It's a clone of Microsoft Visio diagram sketching tool:

Nowadays, you may measure the uptime with a single shell command. Back then, there were devices specifically designed for this:

"Man" is a shorthand of "manual" in Linux word, and not what you, a feminist, thought about.

Need a man to learn what uptime is? Here's your man:

And, of course, a full-size working copy of a Babbage machine (I didn't see the demo, bu they say it works!)

The museum has more of these. Due to my schedule, I only had a bit more than an hour to visit it (note that there's no admission on Mondays and Tuesdays!) But if you'll be around, it's totally worth it. Don't miss it, it looks like this:

(And if you happen to find yourself thirsty, don't go to that bar across the street: the service is quite bad there.)

The Most Stupid Mistake You Can Make with Ruby

Programming languages have strict and mostly very compressed syntax. The origins of that are twofold.

First, actively used programming languages have a considerable history, and were started decades ago. Back then, screen space was limited, and, before that, there was little storage available for source code. Second, programmers have technically-oriented minds and like to endow abstract symbols with complex abstractions instead of using words to describe them. That's why, I guess, many programming languages heavily depend on correct usage of individual characters. Even as few characters as one may make a difference in program behavior as well as in whether program compiles.

Such caveats in different languages include:

  1. C++ templates. You can't write map<int,vector<int>> when you define a mapping from integers to arrays of integers, because C++ parsers thinks that >> as a bit-shift operator in an improper place. A correct program differs by a single space between the < signs.
  2. Makefile tabs. A rule body in the Unix Makefiles should be indented with a Tab. Spaces do not work. Being a software with more than 30 years of history, make had it fixed only a year ago.
  3. CSS delimiters. When you define a cascading style sheet, amd want to define the same block of style attributes for a certain class inside a certain tag, you write the selector as tag .class. It's just a space away from tag.class that defines the styles only for tag elements of class class.
  4. C-style equality in conditional statements. If you're a seasoned professional, you should have already forgotten about this. In many languages, including C, Ruby, Python, Javascript, C# and others, if (a = 1) is always true, since it's an assignment of 1 to a, followed by checking the a's value for truthfulness. The correct version of that has one extra equality sign: if (a == 1). More confusion is created by languages, where the first version is the legitimate way to compare values, such as Pascal.

Imagine how surprised I was when I realized that I made a variation of mistake #4 in my Ruby code today! Here's what it was about.

Ruby hash niceness

Named parameter is a way to specify function arguments at call by name (rather than by order, as in the standard function call notation).

Here you may find how the Named Parameter is implemented in various programming languages.

To emulate a Named Parameter Idiom, Ruby uses hashes and some syntax sugar. The last parameter of a function may be a hash, which maps from parameter names to values. Syntax sugar allows a programmer to write

This sugar is not unique to Ruby; Perl also supports it.

instead of

The :name notation specifies a symbolic constant, which effectively is an immutable string that is defined by only one ancillary character.

Such hashes and symbols seem as a very useful feature. It allows you to emulate DSL-s; here's an example of Ruby on Rails web framework routing configuration:

Until one day, after an hour of debugging you find yourself having written something like this:

See what's wrong here? Indeed, here's how the code of the function called might look like:

So, options[:attribute_check] should evaluate to false boolean value, but... :false is a totally different thing; it's an immutable string of five characters that evaluates to true instead! Just one colon that lurked into the code, and it made it behaving very wrong way.

Just like in a C-style typo, some expressions that are evaluated as true in boolean context look like those that are evaluated as false, and you should be careful with the borderline.

New named attribute definition style in Ruby

New named attribute passing style was not designed to address this problem. However, the abundance of colons in such a code makes it look worrying in case there is a mistake like the above:

You see that the mistake is easy to spot, because the conjunction between the name and the parameter value is so ugly, that it immediately draws attention. However, if you actually need to specify symbol as a value, then you'll have to look ugly with this style.

Moreover, you can't erase the space between the parameter name and value, because for this code:

Ruby parser will think that it's false method in an attribute_check object, as :: is a scope resolution operator, just like in C++. Space matters again, as in typo #1 desccribed above.

People say that this style resembles that of C# or JSON. So, maybe, it is a good idea to migrate to it. Only two things prevent me from doing this so far: it's not portable to previous version of Ruby, 1.8 (though it slowly becomes obsolete), and I find the old style look much more cute :-)

***

This was yet another typo that makes our programs behave differently than we expect them to. And again, it was just one character that breaks the expected behavior of if statements that implicitly convert the condition to boolean type. Unfortunately, while the new Ruby named parameter syntactic sugar could help, it sometimes looks even worse to me.

I hope this post will help you avoid the similar mistake if you code in Ruby.

I would like to end with a joke about a mythical C++ programmer leaving hist last commit after having been fired:

Happy debugging to you, indeed!

Relativity of Simultaneity in Distributed Computing

About a year ago, I described an allusion to physical phenomena in computing in "Caching and Levers". This post is devoted to a more complex theory, namely the Special Theory of Relativity (STR), and mostly to one of its implications, the "relativity of simultaneity".

Relativity of Simultaneity

Special Theory of Relativity may be approached as a set of equations for coordinate translation that take their effect as the speeds of objects approach the speed of light, c.

STR is based on two postulates, one of which states that light is always propagated in empty space with a constant velocity c regardless of the speed of the emitting body. This may be used to introduce a tricky paradox.

Train car experiment

As seen by a passenger of the train. The events happen simultaneously.

As seen by a spectator on the station. The "back" event happens earlier than the "front" event.

(pictures from Wikipedia)

Assume that a train car is passing you when you're standing on a station platform. The train car has a lamp installed exactly at its middle. When the train passes you, a lamp flashes, and then you watch when the light reaches the front and the back of the car. It's interesting that those who sit in the car will notice the light hitting the two walls simultaneously, while those who stand at a platform will notice the back wall lit earlier than the front one. This is implied by the postulate described above: as seen by a spectator on the platform, the light propagates with equal speed at all directions, so it will hit the rear wall before the front one, as the former moves toward the light source, while the latter reproaches it. See wikipedia for more info.

This paradox, known as relativity of simultaneity may be formulated more generically: whether several events are simultaneous, depends on the location one watches them.

But what does it have to do with the computing? Before I answer this, I'd like to share what I've learned from a CACM magazine.

Quiescent consistency

In the multicore age, the classical data structures we all know about are about to change. The reason for that is the increasing demand on computation speed and volume that careful "sequential" CPU chip engineering is no longer capable to provide.

This challenge makes us rethink our approach to algorithm and data structure design. If we want data strctures to be efficient, we no longer may expect them to behave as good old ones in the sequential edge.

In distributed programming, there is an approach to describe data structure behavior known as quiescent consistency. There is a number of consistency conditions, sequential consistency, linearizability and others. These conditions describe how an object behaves when there are several threads calling its methods.

A data structure possesses quiescent consistency if it is consistent between its states of quiescence, i.e. when there are no methods currently in progress. As soon as a quiescently consistent structure has no operations pending (i.e. reaches the quiescence), we may be sure that the executions of methods before this state and after this state is never interpositioned.

Imagine a quiescently consistent stack. An implementation of it is described in this CACM paper "Data Structures in the Multicore Age", the one where I first encountered the quiescent consistency concept. Assume the following sequence of events happen to the q.c. stack:

  1. Three pushes x,y,z
  2. Quiescence (the pushes are processed)
  3. Three more pushes a,b,c
  4. Quiescence (the pushes are processed)
  5. Three pops
  6. Quiescence (the pushes are processed)
  7. Three more pops
  8. Quiescence

Note that q.c. doesn't mean that a data structure guarantees nothing except for this specific consistency. Consistency condition only maps data structure behavior in a concurrent setting to a behavior in a single-threaded environment, i.e. it only limits the number of a multitude of different sequences of method calls that may happen for a specific set of multithreaded events. All the properties a data structure exhibit in this sequential processing should still apply.

Quiescent consistency guarantees that the first three pops return x, y, and z (in an arbitrary order), and the next three pops return a, b, and c, somehow intermixed as well.

Note that, unlike linearizability, quiescent consistency doesn't impose any ordering on results of pops if there was no quiescence between the initial pushes. For instance, if processing of the first and the third pushes do not overlap, the linearizability ensures the pops will respect this order, while q.c. (in case that the second push overlaps with both of them) doesn't ensure that.

Having noted that, q.c. looks like a very weak and useless property. However, it still implies correctness, which, however, is enough in many circumstances. Imagine that you need a pool of unique numbers. It is enough to use a q.c. counter; the numbers it returns will be unique, which should fulfill our initial purpose.

Do we really need stronger consistency?

However, the reasons why we may be satisfied with weaker consistency conditions are not constrained with the specific examples. Let's try to prove the opposite. Assume we're implementing a stack that is accessed from a number of threads. Quiescent consistency may be not enough because if a push of A precedes the push of B, then the pops should be ordered in the specific way, and quiescent consistency may not guarantee that.

But wait... We say, "If one push precedes the other," but do we really know what "precedes" mean? If two threads in our distributed computational system invoke push call independently, how can we be sure that one of them precedes the other? And is there any sense in defining a measure for that?

Let's reckon the paradox we described at the beginning. In one reference frame, one event precedes the other, and in a different reference frame it's vice versa. So whether one push happens before the other, depends on the place we look from, and is not—and may not be—determined by a central, ubiquitous authority. That's a law of the universe, and we can't change this in computing.

Surprisingly, it makes quiescent consistency be a fairly appropriate constraint for a distributed data structure. If there's no strict sense of precedence between events that happen in different places of our network then why can't we use this to develop a more efficient data structure? The correct answer is, "indeed, why not?", and such data structures are well-defined nowadays.

OSI model

OSI model is an abstraction that aims to make the design of distributed computation systems easier. The chain of events that happen during a process of data transmission is separated to several separated layers: each layer has its own protocol, which is independent of the actual implementation of the underlying layers. You may learn more at the wikipedia.

STR vs computing: the difference

We have successfully used a metaphor from physics in distributed computing, but there is an inherent limitation of applying the concept further. In physics, if we have a car of a specific length (as measured by its riders), we may install the light source in such a way that the events at two sides of it happen predictably simultaneous in the reference frame of the spectator at the station.

In computing, we can't do that. OSI model prevents us from predicting how fast the events happen by masking out the capabilities of a system at lower layers. In physical reality, we know that as soon as the light "hits" the back and front covers of the car, the event "happens" (or, more precisely, a piece of reflected light is launched towards the spectator). In computing, however, we don't even know how long it takes for a signal to reach another node. Moreover, the OSI model makes this impossible to predict. All the layers guarantee correctness, but do not have any timing conditions.

Conclusion

The absence of timing in the OSI model suggests that may be no sense in saying, "this structure may not be used, as it processed the request in the different order than they're issued in". The "order they're issued in" is inherently unpredictable in our computing models.

This inherent unpredictability of timings of processes and events, as described in OSI model, is why we really can't apply the special theory of relativity to computing. However, we still may notice that simultaneity in the world around us, as confirmed by physical experimentation, is relative. And the nature still works well!

And if the nature accepts it, why can't we learn from it, and allow our distributed systems to be less predictable, and trade this predictability for speed?..

Comments imported from the old website

Pavel Shved on 26 December 2011 commented:

And again, it appears that someone realized this before me.

Typical Janitor's Failures

How easy is it, to write a program that removes a file or a folder? Okay, assume you have all the power a modern operating system provides you, including rm shell convenience command, and unlink system call. Do you really think the answer is still "very"?.. Then take a look at these examples.

Permission trap

Today one of my peers, Denis, was writing a program that somehow analyzed packages in Mandriva Linux distribution. Since trying to analyze the contents of a package without unpacking it looks more like gynecologist's work than that of a programmer, the software Denis wrote unpacked it first. Then it analyzed the package, and erased it with rm -rf in order to avoid disk pollution.

But sometimes it failed to erase it, which led to strange failures of the whole program.

The reason was the permission trap. Common sense tells us that if you can create a file, then you can remove it. This is wrong when it comes to uncommon permissions. Certain software that is very secure sets up specific attributes for certain files. In particular, it revoked writing permissions from certain catalogs, which prevented recursive removal of the unpacked files. You may try it yourself:

$ mkdir -p /tmp/try/this; chmod a-w /tmp/try; rm -rf /tmp/try
rm: cannot remove `/tmp/try/this': Permission denied

Oops. Even if you're the owner of the file, you won't remove it. Even --force doesn't help. To avoid this, I guess, you should chmod files recursively before removing them.

Why does Chromium remove its files twice?

By the way, I encountered a couple of engineers from PVS Studio when they visited Institute for System Programming, in which I worked; they were looking for a mentor for a Ph.D. thesis one of them was going to defend. Small world!

I read about this interesting issue here. Most likely, it won't hit you in the nearest years if you're using Linux, but for Windows users it might be interesting and maybe useful. Let me quote the piece of code they referenced

Why in the world would anyone try to remove a file twice determinedly? What was the programmer trying to achieve? Let's quote that article again:

A file can be definitely removed or cannot be removed at all only in textbooks and in some abstract world. In the real system it often happens that a file cannot be removed right now and can be removed an instance later. There may be many reasons for that: antivirus software, viruses, version control systems and whatever. Programmers often do not think of such cases. They believe that when you cannot remove a file you cannot remove it at all. But if you want to make everything well and avoid littering in directories, you should take these extraneous factors into account.

"Remove" vs. "unlink"

Okay, so you've managed to remove a file. The question is, have you freed some space on your disk? The answer is maybe.

Removing is called unlinking on purpose. What you do by "unlinking" is removing a reference from a directory index to a certain chunk on your hard disk. The chunk, nevertheless, may persist, especially if there are more references to it. One of the possibilities is hard links, but you explicitly created them. Another could be a badly working file system, but that was your choice.

However, there is a very harsh way to experience the difference between removing and unlinking. Assume you have a long-running program that writes something to its log file. The information it writes is not very useful, and you might want to discard it. As the program spends hours of working, the log file grows larger and larger, and you realize that you really should erase the log to prevent your disk from filling up. You rm -rf log_file, and... nothing happens. The log is still being written somewhere, and there is as little space as there was.

The thing is that you unlinked the file, but the file system's directory index wasn't the last place it was referenced from. Another place is the file handler of the program running, and it will be referenced until the file is closed. Otherwise, as far as I know, the only way to fix it is to shut the program down. So, do not turn on too much logging by default.

Uncannily removing sensitive files

Of course, you know this. It's like knowing that you should move carefully in a room full of rakes and nevertheless getting hit by one because it's just so obvious...

These are classical rm -rf $FOO/* when $FOO happens to be unset... These are when you type rm -rf /etc, and thus crush your system, instead of rm -rf etc to remove a backup copy of /etc you were so prudent to create. These are the famous bumblebee bugfix—or was it mere viral marketing?

***

Usually, you think that creating is easy, and destroying is simple. In many cases, however, it's not true. In Russia, for instance, it's much easier to open a business than to shut it. And when it gets to removing files, it's also sometimes true. So, if your program really depends on successful file removing, it should be more careful with such a seemingly simple operation.

Comments imported from the old website

Maxim Ivanov on 21 November 2011 commented:

In case of abusive logging, I found handy to run ">/path/to/logfile", which effectively frees up space.

Pavel Shved on 17 December 2011 commented:

@Maxim, this sometimes works, and sometimes does not. I'm yet to figure out the exact conditions one may rely on this under.

Do not use Btrfs!

"Btrfs" is a new, modern filesystem, which has always been a nice thing to try for any Linux aficionado. When you're bored, and if your Linux installation is too stable, you can always try one of new cool filesystems to make your machine faster, cooler, and less stable.

ReiserFS, named after its developer Hans Reiser, was the first Linux filesystem with journaling support. It also was known as the fastest file system to deal with large amounts of small files. It had none of the Btrfs' problems I describe here.

The successor of this filesystem, Reiser4 should have had even better small file support. It also was innovative, and allowed transient compression of content (which worked, by the way), but it has never been finished.

Hans, being in prison, doesn't continue the development of the new filesystem, and, it seems, that others just fail to keep the pace. reiser4 is still not included in Linux kernel. I'm pretty sure that it's not just because of non-technical "personality issues", but of architectural incompatibility with the kernel philosophy (and this) as well.

It's sad that prison seems to have disrupted Hans' mind. We probably will never have the same drive in Linux file system world.

After Hanz Reizer has been sentenced for a decade in prison for killing his wife (sic!), we can't take reiser4 seriously anymore. I had a painful experience installing reiser4, and unsurprisingly, it never entered a stable state (and most likely, never will).

When Hans, in addition to his wife, had killed my root filesystem thanks to his unstable evil code blurb, I decided to ... install something cool again. My choice was Btrfs. It is a new fast filesystem, and it was claimed to be fast as hell. Moreover, it became the default filesystem for MeeGo mobile Linux OS standard, which is now nearly dead due to unrelated circumstances.

Aside from being just new, Btrfs offers a lot of unique features. One of them is that it's a copy-on-write filesystem, which allows a user to make snapshots of the whole file system on the same partition without consuming twice as much disk space. You can also roll back to a saved state immediately, without re-writing everything. However, I recently learned an already known way to make these operations fast with any filesystem, so Btrfs' advantage here is questionable. An "only" advantage left is its speed.

After I used Btrfs both at home and at work, I now advise to steer clear of it. At least, until its issues are resolved.

I'm not a kernel hacker of any sort, I haven't looked at Btrfs source code, and I never even read its specs. I just share my experience with it as a user. The post may already even be out-of date, and the state-of-the art tools and implementation may have already fixed the problems I describe.

Lack of proper tools

Btrfs still doesn't have file system checker that fixes the FS when computer was powered off or reset. Reference: its own wiki. I really don't understand how come Btrfs was used at real devices. I remember I really once had to run file-system check for my N900 mobile phone after an unfortunate battery exhaustion. Will my MeeGo device lose all my data after its battery dies?

And you know what? This "we still don't have file system recovery tool" message has been hanging there for more than a year. I guess, it will never be actually implemented.

So, it's already a good reason to never use it for sensitive data, such as /home partition.

Irrecoverable disk pollution

Okay, so, Btrfs is not very reliable to have, for instance, your /home partition on it. But we could use it for a temporary file system that doesn't store important data!.. could we?

I tried Btrfs on one of the nodes of our Linux Driver Verification cluster as a file system for a 130 Gb partition. Here's the workflow the filesystem underwent:

  1. Write a lot of small files
  2. Realize that the disk is nearly full
  3. Erase the small files written
  4. Go to step 1

On day I realised that the cluster node just was not working. I checked the logs and I realized that there was no space left on the device. I ran df -h to check how much space there really was, and it showed that there were... 13 gigabytes free!

It could mean only that Btrfs just damn secretly used more than 10% of disk for its own metadata. Using more disk space than the overall file size is normal, but here it uses it unpredictably: Btrfs lies about the real amount of space left. Some mailing list postings confirmed that, but I can't find the links now. The only suggestion available was to "balance" the btrfs with its tools. But it didn't work for me!. The file system remained in its unusable state.

Then I decided to try to remove some files. I typed rm -rf directory/, and... File removal resulted in a "no space left on device" error! So when I try to clear some space with rm it just outputs "no space left" errors. How nice.

However, it actually did remove the files, but the process merely took more time than usual (about 2 files removed per minute), and had less than 1% success rate. The recommended elsewhere echo >file/to/remove didn't work either. After 30 hours of removing, the filesystem has traversed the 130 Gb folder, and removed a couple of gigabytes successfully. The subsequent rm -rf cleared the disk in no time...

... just to enter the same crippled, unusable state, but with 27 Gb "free" (more than 20% of the filesystem) after a couple of weeks! That time, even rm -rfs couldn't help. We re-formatted the partition into ext4.

By the way, Edward Shishkin is the primary developer of reiser4 at the moment, i.e. he's a competitor of Btrfs. If it was not for my own experience, I wouldn't think that his observations and conclusions are grass-root.

I guess that my observations confirm Edward Shishkin's review of Btrfs: the filesystem is space-ineffeicient without any good reasons, and its tree balancing algorithms are apparently broken.

And it is also not fast!

Okay, but, initially, I mentioned that Btrfs was, at least, fast. Not just working with lots of small files should be faster, but the overall workflow should have had the speed increase.

It does not.

My home PC uses Btrfs as its root filesystem. A media storage, which I keep movies on, however, uses XFS. Now guess what happens when I try to play an HD movie from the XFS partition?

No, you're wrong. It doesn't play smoothly. It just stalls unless I increase the I/O priority of the movie player to the highest value. Because btrfs-transactio process wakes up each five seconds and sends my PC to a non-interruptible I/O sleep. This renders my PC unusable for less than 0.5 second, but it is enough to crush my HD movie experience: it just wouldn't play because disk throughput is not enough for a small, but significant amount of time.

In production, however, I also noticed that. A month ago I introduced a change to my BLAST project. I now realize that it was the Btrfs to blame, but back then I just saw that when a process is about to write to a file 10 times per second it sometimes stalls for a second before the write succeeds. This is intolerable for any serious file system.

Perhaps, during the common usage cycle, when files are not re-recorded several times per second, it wouldn't be an issue at all. However, the way I use file systems at work is not "common", and it should satisfy more requirements than those of users that just never write to their files.

***

Finally, the solution to Disk I/O speed seems to have come not from tricky file systems. Solid-state drives (SSDs) with their awesome read/write speeds have made filesystems much less relevant. The main challenge for file system developers would be to adapt to SSD's requirements. And it is natural! We all are pretty damn sure that sequential reads-writes should be faster than random I/O; we are used to placing our swap partitions at the beginning of the drive, to increase the speed they're accessed. We treat magnet disks requirements as granted without really noticing this.

Time to change the paradigm. As of today, if you want a faster disk I/O, the best choice would probably be an SSD, not a faster file system.

To me, as I described above, Btrfs doesn't seem a good choice both at home and in production. Its features do not overcome the fact that it's just unusable. Until it's fixed, do not use Btrfs.

Ruby-ran-off-the-Rails

Recently I ran a software update from Ruby on Rails version 2.3.5 to 2.3.14. You might have noticed the "maintenance" message all pages were redirected to last Sunday. Of course, after the update completed, I clicked on a bookmark entry "My Blog" in the browser window, and, having acknowledged the correct loading, prematurely considered the update completed well.

I was wrong this time. But first,...

How I perform updates on my webserver

To automate the update process, I use a couple of nonstandard scripts. During the update, the web server becomes nearly unusable, so, instead of providing poor experience to users, I decided to show a "maintenance" page while the update is in progress.

In my virtual host Apache configuration file, I use a special variable, COLDATTIC_MAINTENANCE, to distinguish whether it should ask Mongrel for a web page, or should show a simple maintenance.html page. Here's how virtual host config looks like:

<IfDefine !COLDATTIC_MAINTENANCE>
# ... normal website redirects
</IfDefine>

<IfDefine COLDATTIC_MAINTENANCE>
DocumentRoot /var/www/coldattic.info/htdocs/
<Directory /var/www/coldattic.info/htdocs>
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
</Directory>
# Redirect everything to index
RewriteCond /var/www/coldattic.info/htdocs%{REQUEST_URI} !-f
RewriteRule ^(.+)$ /maintenance.html [R]
ErrorDocument 503 /maintenance.html
Header always set Retry-After "18000"
</IfDefine>

This config makes the web server, if the -D COLDATTIC_MAINTENANCE is supplied to Apache's command line, to:

  • Redirect every request to maintenance.html
  • Show 503 Service Unavailable status code. Status code is important, because you should prevent web search engines think that this is an only page that's left on your site.
  • Notify bots that they should come in 5 hours (set this to average maintenance time).

To update the server I do the following:

  1. Set -D COLDATTIC_MAINTENANCE in Apache config (on my distro it's /etc/conf.d/apache2);
  2. Restarted system Apache by /etc/init.d/apache2 restart;
  3. Now, the apache is serving maintenance pages. I edit the config /etc/conf.d/apache2 back, and remove the maintenance define. However, I do not restart the server with new settings!
  4. I run a system update script, which, essentially, looks like update-software ; /etc/init.d/apache2 restart, so the server is automatically restarted without maintenance mode after the update is completed.
  5. Go to movies, since the update usually happens on early East Coast Sunday mornings, when it's Sunday Night in Russia, a good time to relax;
  6. Come home, log in to server, and deal with update failures >_<

I do not do the updates quite often, and they go well. After the update last Sunday, I noticed a sudden decrease of visits. This could be just a random spike, but after the visit number had decreased to less than ten the next day, I realized that the update broke something. I entered coldattic.info into the browser. It should have redirected to http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/, but what I saw was a raw HTML text of the blog page!

How Ruby went off the Rails.

How come? I checked the logs of the website, and notices an abnormal stacktrace:

Tue Oct 18 15:52:05 +0000 2011: 
    Error calling Dispatcher.dispatch #<NoMethodError: private method `split' called for nil:NilClass>
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/cgi_process.rb:52:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:101:in `dispatch_cgi'
/usr/lib64/ruby/gems/1.8/gems/actionpack-2.3.14/lib/action_controller/dispatcher.rb:27:in `dispatch'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:76:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `synchronize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/rails.rb:74:in `process'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:159:in `orig_process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:158:in `orig_process_client'
/var/www/coldattic.info/main/coldattic/vendor/plugins/spawn/lib/patches.rb:61:in `process_client'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:285:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `initialize'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `new'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel.rb:268:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:282:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `each'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/configurator.rb:281:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:128:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/../lib/mongrel/command.rb:212:in `run'
/usr/lib64/ruby/gems/1.8/gems/mongrel-1.1.5/bin/mongrel_rails:281
/usr/bin/mongrel_rails:8:in `load'
/usr/bin/mongrel_rails:8

Hm.

It only happened when the page request should result in redirect, but it worked for direct page accesses (that's why, perhaps, the visit counter was greater than zero: people could just user their bookmarks).

I ensured the localhost version is synchronized with the production, and launched WeBrick via ./script/server to play with the website outside the production server environment. But on my local machine it worked well! (No wonder, because there's no entry of my web application in the stack trace; the whole trace consists of web server and framework subroutines.)

Solution: monkey-patching

I googled for the error, and it appeared to be more than half a year old. This bug entry is, I guess, the first place where the error was spotted. In a nutshell, new version of Rails seems to have employed a previously unsupported use-case of Rails' web server interface, Rack. It interfered with, supposedly, partly incomplete implementation of Mongrel, a specialized HTTP webserver for Rails. And mongrel updates still haven't made its way to Gentoo distribution my web server runs (-1 to Gentoo maintainers reputation counter!).

The solution a lot of people (including me) used is here. It is a small Ruby file that should be placed... into your application startup directory (for Rails it's config/initializers/)! You don't have to patch your system libraries and make new packages for your distributions: all you have to do is to add a file to the userspace. The code there uses a well-known feature of Ruby, monkey patching. It alters the implementation of external library classes. I already demonstrated how this could be used to add some Perl-like features to Ruby.

So, fixing others' bugs is another, though less legitimate, way to employ monkey-patching.

***

Why did this happen in the first place? Two reasons:

  • Insufficient testing after software updates. I should have tried to enter the site as an anonymous user, instead of clicking on my own bookmarks. Without the fix above, the site worked well if the user who accessed it had cookies!
  • Deploy the exactly same environment on testing machine as on the production, and perform the tests there before upgrading.

I thought that Mongrel and the whole Rails stack was stable, but, I guess, it suffers from fragmentation. Ruby on Rails stack consists of a lot of components. While this fosters innovations, the interaction between their developers is not tight, and there also is no common governance; this are the reasons why such strange bugs get shipped in stable versions.

OCaml Hash Table Chronicles

While preparing the static verification tool I maintain, which is written in OCaml, to the Competition on Software Verification, I noticed a strange performance problem. Queries to a hash table structure seemed to take too much time. Of course, there were a lot of such queries, but a greater amount of queries to a much more complicated data structure, a Binary Decision Diagram, caused the CPU less trouble.

I've been troubleshooting the issue for several hours, and the spot where it hit me was, as a Dr. Watson would say after Sherlock Holmes explained him the case, elementary. Bet let's explicate the case from the very beginning.

What an OCaml hash table looks like

The hash table implementation in OCaml, in addition to the expected capabilities, can also keep the history. So if you add several values for the same key, you may either query the latest value added (via find method), or query them all in the order of appearance (via find_all counterpart). Of course, you can also remove the latest binding of a key (via the remove method), so that the subsequent finds will return the previous value it was bound to.

"Who would want that?" you might ask. It appears, that this feature is very convenient. First, you get a standard library data structure for "a map from something to a list of something" with nearly the same interface as the plain one, without "list". Second, which is more important, when you wrote a multitude of code boilerplates, and only then you realize that not only you need the current value for a key, but, under certain circumstances, the history of its updates as well, you might get really upset. Especially if you use such a strict language as OCaml. Then, you'll praise the opportunity to keep all the code the same, querying the history just where needed.

The latter was what happened in our project. And in this history querying we started to encounter performance problems.

Batteries not included

The first problem was that the standard OCaml hash table interface doesn't have remove_all function that clears the history of a given key! To do that, we had to call remove method as many times, as many values there were for the key. Our profiler demonstrated that it was here where nearly 25% of the runtime had been spent.

I integrated a hash table from "OCaml Batteries Included" project that provided an extension to the standard library. Its hash table did have the remove_all method, and I hoped it would help.

It didn't. It decreased the runtime of removing, but not dramatically; besides, the run times of finds and find_alls remained bigger than I felt they should've been.

Digging deeper...

It was time to dig deeper into the Hash table implementation. But I didn't frown upon that. OCaml is a functional language, and I've been playing with it for quite a while. Sometimes, here and there, I notice strikingly nice implementations of well-known algorithms I learned in the imperative terms (the most nice would be depth-first traversal). No wonder I took a look into its intrinsics.

Unfortunately, hash table is too imperative a data structure for its implementation to look too uncommon. It was a plain array with lists attached to each of its elements.

What was surprising is that storing history appeared to be simpler than not storing it! When an value for a key is added to the table, it's just inserted at the beginning of the bucket's list, and the usual, simple search for a matching key in the list would always hit the latest value bound! removeing the key is as simple: you do the same search, and the first value found is removed, making the value added before it available for the next lookup, just in the historical order.

No wonder that our "remove a value for a key as many times as many bindings there were" didn't work fast: its runtime was O(n²), n being the size of the bucket, while we should do it in O(n). Of course, the same worst-case-linear runtime of bucket length couldn't go anywhere, but at least wasting the valuable time in a useless quadratic loop was avoided.

Still, it wasn't fast enough. The integral time to remove all values decreased, but didn't become negligible. Why could that be?

Perhaps, there were just too many values? I inserted some debug prints to the implementation of hash table, and tried to measure how many iterations the remove_all made at each operation. Yes, indeed too many. Hash table bucket size, given a good hash function, should be close to the number of elements divided by the size of hash table's array. And OCaml's standard universal hash function should definitely be good, and it even had a C implementation tuned for speed, apparently. So, the solution could be to increase hash table size.

When the hash table size increase also didn't help, I was quite surprised. How come? I was really lost.

...Until I added a print of hash function values. Yes, they were repeated quite often, and even the keys that were just slightly different had the same hashes—and good hash functions have to assign dramatically different values to slightly different inputs.

I took a closer look to the hash functions, and noticed some magical numbers (10 and 100). What were they?

external hash_param : int -> int -> 'a -> int = "caml_hash_univ_param" "noalloc"

let hash x = hash_param 10 100 x

I referred to the documentation, and... uh, I wish it was what I started with!

val hash : 'a -> int

Hashtbl.hash x associates a positive integer to any value of any type. It is guaranteed that if x = y or Pervasives.compare x y = 0, then hash x = hash y. Moreover, hash always terminates, even on cyclic structures.

val hash_param : int -> int -> 'a -> int

Hashtbl.hash_param n m x computes a hash value for x, with the same properties as for hash. The two extra parameters n and m give more precise control over hashing. Hashing performs a depth-first, right-to-left traversal of the structure x, stopping after n meaningful nodes were encountered, or m nodes, meaningful or not, were encountered. Meaningful nodes are: integers; floating-point numbers; strings; characters; booleans; and constant constructors. Larger values of m and n means that more nodes are taken into account to compute the final hash value, and therefore collisions are less likely to happen. However, hashing takes longer. The parameters m and n govern the tradeoff between accuracy and speed.

Standard library documentation for hash table, see here.

The keys for the hash value were complex structures (think recursively folded C structures), and many of them shared large parts, which made hash values collide too often. I changed the hash function used for that table to Hashtbl.hash_param 100 100, and the whole table became as fast as a jaguar in a savanna.

Lessons learned

The first thing to see as a morale is that you should read documentation before doing hard debugging. Of course, feeling like a file system hacker is invaluable, but, if you are after delivering something, keeping it slow pays you back in the long run.

When I was writing this post (and it's another cool thing about writing a blog), I suddenly remembered another issue I had with another hash table. The keys also were complex structures, but that table didn't use them directly. Instead, the complex structures were converted to strings, and only they were used as keys. I thought it was too redundant, but the straightforward way, to use the structures as keys, was, surprisingly, slower. I merely rolled back the changes, but now I realize the background behind this. So, perhaps, converting structures to strings (especially when there's a natural 1-to-1 match) helps, as the whole structure will be considered to distinguish.

I realized that keeping the history of hash table updates was not a nice feature at all. Rather, it appeared to be just a small bonus with lack of support in all the aspects (I had to integrate code from the other library to add such a natural functionality as remove_all). Or, it can be used as saving time in return of leaving older unneeded bindings unremoved.

And the last, but not least, I saw a good example how different languages teach you new concepts and new ways to look at the things you're already used to. Such experience, even at cost of half a day, was totally worth it.

Comments imported from the old website

I'm just wondering - if Ocaml hash tables are slow by default in the manner you described, then isn't that a bug?

Regards, Faheem Mitha (faheem at faheem dot info).

Pavel Shved on 03 March 2012 commented:

Fahreem, Hash tables are just fine. The key is the hash function.

Most OCaml data are structures that contain structures that contain structures and so forth. This may easily be imagined as a graph. An ideal hash function would traverse the whole graph, and account for all its nodes. However, the default OCaml's hash function does not do this.

The reason is not that it's a bug. Rather, the default configuration tries to make it reasonably fast. It merely assumes that most differences in data structures used as hash keys are close to the "beginning" of the structures, to "roots" of the graph. So itstops exploration at a certain distance (10) and amount of nodes scanned (100), and this makes the hash functions perform both fast and distributed well.

In my particular case, however, the data structures were organized in such a way that the difference between hash keys was "buried" deep inside their graphs. That's why I should have configured my hash differently, because the assumption the default configuration worked well under was not the case. I should have read the documentation, and know this peculiarity.

Now that you have read my post, you know about this without digging through it yourself.

Looking for a Match, a Geekier Way

As unlikely as it seems, the "match" the heading refers to is not a regular expression match. While I am in a long-term but open relationship with computers, I also do pay a certain bit of attention to opposite sex subjects of my species. This involves two immediate conclusions:

  • I have a shortage of such subjects available for dating in my nearest social circle;
  • The more information technology-abundant the process of mating is the more I'm likely to succeed.

Which, in turn, makes it obvious that I like to visit dating sites. If your first guess was that I use a geek-oriented fully-automatic neural-bayessian-expert-system-network-powered social site, you would be wrong. Automated matching engines appeared a bad solution: the criteria they use are too generic; besides they're not as popular as those that target generic audience. Therefore, the only option I have is to look for the one at generic-purpose dating sites.

The problem with generic-purpose dating sites

These sites (I tried several) share a common issue: they are annoyingly inefficient.

What's the source of the inefficiency there? It's the way search results are presented and managed. Or, to be more specific, how they're unmanageable by design. When you try to look for a woman to date, you're presented with a search result page, which is sorted by a complex weighted formula. A large weight is assigned to the last visit date, however, an even larger weight is of a paid (or, sometimes, free) service to move yourself up in search results page.

No wonder that a lot of women you have already considered as a potential mate and have rejected keep showing up in search result, distracting your attention from those potentially dateable. This decreases the efficiency dramatically.

I already tried to apply a "geek-like" approach to another "real-life" problem I head, an overweight. The solution I found appeared to be quite efficient: the graph of my weight is still the biggest motivator for making myself less fat.

A solution

A solution to this would be to filter search results on server side, discarding girls a user has already seen. However, aside from the apparent marketing-related problems—how would you sell good places in search results if they're so easy to hide permanently?—user-specific search yield should suffer from performance impediments, especially on popular sites.

Therefore, a filter on the user side could solve the problem. And the web browsers (at lease, some of them), do already have such mechanism. User scripts.

I used Firefox and GreaseMonkey to make a simple script to improve the girl finding experience. Here it is:

This script sets up an event listener to the ENTER key. When you press it, it locates the parts of the web page that look like girl userinfo or like search results. If it finds the search result that has already been seen, it shades it, making the other girls more visible. If the script finds that I'm currently at the girl-specific user info page with a lot of details, it will strike out the user next time, because a girl I closely looked and rejected should be even less visible.

The script uses GM_setValue and GM_getValue special functions. Greasemonkey Firefox addon specifically provides them to access a permanent storage that persists across browser invocations.

Here goes the rant

The script, however, looks a bit more monstrous than it should, and there is indeed a number of issues for this. I found a nice JavaScript reference; it was not as unhelpful as many other ones, but some issues left unresolved anyway. Let's enumerate them.

Ajax loading

The simplest architecture for the script, of course, would be just to fix up a page after it's loaded. Why do we need a lot of creepy event listeners?

The key here is after it's loaded. We can't tell when the page, or a piece of its content is loaded when running a userscript. The dating site I targeted was a complex piece of software with a lot of frames, with ajax-powered search result loading, etc. It heavily suffered from the issue described in this StackOverflow question, and the solution presented there with onhashchange event didn't help.

Therefore, I had to retreat to the User Brain Slowness. A user should hit ENTER after the page is loaded, making the script parse everything correctly. However, this worked badly too, hence all these event handlers that seem excessive at first sight.

XML traversal

Okay, so assume we have successfully loaded the document HTML tree. How can I now traverse it, spending as few keystrokes as possible?

As far as I know, there's no way to do this. All ways are browser-specific, and do not look even closely as a foreach loop. You may see these ugly constructs back from 90s near the FIRST_ORDERED_NODE_TYPE constants in the code above.

***

Having tried this, I really do not understand why JavaScript-based interfaces are becoming industry standard. Perhaps, I know JavaScript too badly to see its full power (this is my first JS program). Or, perhaps, programmers that prefer and are capable to make convenient languages merely steered clear from web-programming, and preferred to lock themselves up in ivory towers...

The only conclusion I'm sure about is that, the night I wrote the script, programming successfully distracted me from women for several hours again. What a jealous activity it is!

So I Patched That Filesystem

If you don't know what an "abstraction leak" is, please, refer to this Joel's blog post

Finally, I patched it. It's been pending for several months, always being postponed as "we do not have time to do this right now." However, we fixed several abstraction leaks, which created an increased workload to the network, and I couldn't postpone it anymore.

UnionFS

UnionFS is a file system that makes a union of several directory trees. The directory trees mounted under a union are organized as a layer cake: whenever you try to find a file in a union, it looks up, in a user-specified order, for this file on each of the file systems involved, until it finds one.

However, the most interesting things happen if you try to write something into a union, or to delete a file from it. If you turn copy-on-write flag, then you may use a higher-level writable file system to mask files on lower level read-only file systems by copying the files to and modifying them at a higher writable layer, or by creating a special masking record on an upper writable file system, respectively.

UnionFS is useful to represent a file systems that originate from a large read-only directory trees (such as those burned on a CD) with just a few changes made by user (yes, think install CDs).

In our project, we used it to create a naïve transparent network FS implementation.

Two posts ago, I whined upon the issue with our filesystem setup for distributed computation. In short, the C preprocessor relied on a lot of file lookup failures when handling #include directives, and failed lookups were never cached in our naive UnionFS + SSH-FS software stack. The former, being a union of several layers of directory trees, always searched for absent files until it reached the lowest layer, which was a slow network. And the latter just didn't cache failed file requests.

As you already know, I am a fan of caching. Naturally, the solution seemed to me as to cache the failed requests somehow. But what if I cache a failed file request, but the file appears?

Thus, the goal was to devise a conceptual solution that doesn't look like a hack. Rather, the desired solution should have relied on some property of our file system that implied that caching of the failed lookups would be sound.

Monotonic file system

The property, however, wasn't too tough to discern. I named it as being a monotonic file system. Let's define a monotonic file system as one a file is never added to. Since the moment of mounting as a UnionFS layer, all read-only file systems that lay lower than the first writable layer are assumed monotonic.

While it wasn't really strictly the case for the file system used as a whole (it had files added to it), the subset of files that were accessed remotely was monotonic.

Implementation details

UnionFS has a mechanism of "masking" the filed on read-only layers as if they were deleted (see side note). My solution was to mask a file after a lookup for it has failed.

First, I needed to determine what system calls the C preprocessor uses for #include file lookups. FUSE, the mechanism UnionFS was based on, intercepts system calls to access the file system, and routes them to a user program that contains handlers for them. Therefore, I needed to know the syscalls involved; for this, I used the good old strace command:

$ strace -f -T  cpp file.c 2>&1 | grep xxxx
[pid 23961] read(3, "#include <xxxx.h>\n", 18) = 18 <0.000010>
[pid 23961] open("/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include/xxxx.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory) <0.000019>
[pid 23961] open("/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include-fixed/xxxx.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory) <0.000015>
[pid 23961] open("/usr/include/xxxx.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or directory) <0.000013>
[pid 23961] write(2, "file.c:1:18: fatal error: xxxx.h"..., 59file.c:1:18: fatal error: xxxx.h: No such file or directory) = 59 <0.000013>

The syscall time shown due to -T option helped me to determine whether I really succeeded in fixing the problem (the times should have dropped dramatically if it was fixed).

However, attaching to the FUSE's open call did not show any activity. Having turned on the debugging in UnionFS, I realized that I should have attached to the getattr syscall handler instead.

Since now I knew what syscall handler to interfere, I patched the handling of getattr:

diff --git a/src/opts.c b/src/opts.c
index 02a4e79..c6e7a17 100644
--- a/src/opts.c
+++ b/src/opts.c
@@ -228,6 +228,8 @@ static void print_help(const char *progname) {
 	"    -o chroot=path         chroot into this path. Use this if you \n"
         "                           want to have a union of \"/\" \n"
 	"    -o max_files=number    Increase the maximum number of open files\n"
+	"    -o monotonic           treat file system as monotonic (\"delete\""
+	"                                  nonexistent files)\n"
 	"\n",
 	progname);
 }
@@ -295,6 +297,9 @@ int unionfs_opt_proc(void *data, const char *arg, int key, struct fuse_args *out
 		case KEY_COW:
 			uopt.cow_enabled = true;
 			return 0;
+		case KEY_MONOTONIC:
+			uopt.monotonic = true;
+			return 0;
 		case KEY_STATFS_OMIT_RO:
 			uopt.statfs_omit_ro = true;
 			return 0;
diff --git a/src/opts.h b/src/opts.h
index 040956f..1cfb64b 100644
--- a/src/opts.h
+++ b/src/opts.h
@@ -22,6 +22,7 @@ typedef struct {
 
 	bool stats_enabled;
 	bool cow_enabled;
+	bool monotonic;
 	bool statfs_omit_ro;
 	int doexit;
 	int retval;
@@ -32,6 +33,7 @@ typedef struct {
 enum {
 	KEY_STATS,
 	KEY_COW,
+	KEY_MONOTONIC,
 	KEY_STATFS_OMIT_RO,
 	KEY_NOINITGROUPS,
 	KEY_CHROOT,
diff --git a/src/unionfs.c b/src/unionfs.c
index f05af45..a64499e 100644
--- a/src/unionfs.c
+++ b/src/unionfs.c
@@ -57,6 +57,7 @@ static struct fuse_opt unionfs_opts[] = {
 	FUSE_OPT_KEY("-V", KEY_VERSION),
 	FUSE_OPT_KEY("stats", KEY_STATS),
 	FUSE_OPT_KEY("cow", KEY_COW),
+	FUSE_OPT_KEY("monotonic", KEY_MONOTONIC),
 	FUSE_OPT_KEY("noinitgroups", KEY_NOINITGROUPS),
 	FUSE_OPT_KEY("statfs_omit_ro", KEY_STATFS_OMIT_RO),
 	FUSE_OPT_KEY("chroot=%s,", KEY_CHROOT),
@@ -188,6 +189,10 @@ static int unionfs_getattr(const char *path, struct stat *stbuf) {
 	}
 
 	int i = find_rorw_branch(path);
+
+	// If we got an ENOENT, and we assume a monotonic FS, then we should whiteout this file
+	if (i == -1 && errno == ENOENT) monotonic_unlink(path);
+
 	if (i == -1) return -errno;
 
 	char p[PATHLEN_MAX];
@@ -335,6 +340,9 @@ static int unionfs_open(const char *path, struct fuse_file_info *fi) {
 		i = find_rorw_branch(path);
 	}
 
+	// If we got an ENOENT, and we assume a monotonic FS, then we should whiteout this file
+	if (i == -1 && errno == ENOENT) monotonic_unlink(path);
+
 	if (i == -1) return -errno;
 
 	char p[PATHLEN_MAX];
diff --git a/src/unlink.c b/src/unlink.c
index 06a7fa2..7f3275e 100644
--- a/src/unlink.c
+++ b/src/unlink.c
@@ -96,3 +96,27 @@ int unionfs_unlink(const char *path) {
 
 	return -res;
 }
+
+/**
+ * whiteout file on the first rw branch available
+ */
+void monotonic_unlink(const char* path)
+{
+	DBG_IN();
+	// monotonic check
+	if (!uopt.monotonic) return;
+	// save errno: if we can't whiteout, we should return -ENOENT anyway
+	int old_errno = errno;
+
+	// find the highest RW branch (yes, not a typo)
+	int j;
+	j = find_lowest_rw_branch(uopt.nbranches);
+
+	// All branches are RO... then just return the error
+	if (j == -1) return;
+
+	// Whiteout the file
+	hide_file(path, j);
+	// if creating the file with the hide tag failed, restore old errno
+	errno = old_errno;
+}
diff --git a/src/unlink.h b/src/unlink.h
index 5223258..8dc145b 100644
--- a/src/unlink.h
+++ b/src/unlink.h
@@ -8,5 +8,6 @@
 #define UNLINK_H
 
 int unionfs_unlink(const char *path);
+void monotonic_unlink(const char* path);
 
 #endif

And that was it. That was my filesystem patch! It worked well: the preprocessing times, confirmed both by the strace timings and by the real experimentation have increased by a factor of more than fifty! This is what patching a file system for your particular case can yield.

***

I rarely understood what benefits they referred to when describing how they patch filesystem to make their corporate servers work better (i.e. faster or in a more reliable manner). Having patched UnionFS, I realized that two hour work may give a 50x performance improvement (well, Amdahl's law loweing it to 7x improvement)—just because you are not afraid of digging that far.

I doubt that the mainstream would accept my patch, as it's too specific, I guess. But the milestone has been reached: now I may proudly declare that I patched a file system.

Microsoft Office Needs Windows... a lot of them!

Microsoft Office

This is a photo of a Microsoft office (stolen from here), namely, the Roger Needham Building in the University of Cambridge, one of two buildings the Microsoft Research resides in. See how many Windows it features? It just couldn't be a coincidence...

Lately, I visited Microsoft Research PhD Summer School 2011 in Cambridge, Great Britain. The event has been organized by Microsoft for more than ten years. Researchers from all the institutes over the Europe that are partners of Microsoft Research (don't know what it means, but apparently the institute I work at is one of them) come there to learn something new about researching in general, to meet researches who work at Microsoft, and to get introduced to what Microsoft is interested in.

The event was mostly held in Roger Needham Building, where, as far as I know, most of the Microsoft Researchers in Cambridge nest. I haven't been given a chance to take a tour over, but it was obvious that this Microsoft Office really has a lot of Windows on purpose ;-)

Microsoft also blogged about the event, but here I'll share my personalized view.

Microsoft and Linux

My poster

That's how my poster looked like; check out the Tux at the upper-left corner! I know it's a total crap (there's just too much text), but this was my first experience, and the next time I'll do it right... or, at least, more right than this time. Yes, next time I will not use LaTeX for making a poster. (And here is the pdf).

As a part of the obligatory participation program, I made a poster presentation on the research I'm currently doing. Of course, I couldn't miss the chance to troll Microsoft with an image of penguin Tux, the mascot of Linux, and with a title that starts with "Linux" as well (see sidenote).

It turned out that Microsoft researchers are really open-minded; they didn't get xenophobic, they joked on this matter, and some of them even confessed that they sometimes even use Linux in their work.

Of course, the most of the school, though, was devoted to introductory "breadth-first" talks on what Microsoft is doing. The talks were made by practicing Microsoft researchers, and I'll briefly overview the ones that seemed the most insightful to me.

Research techniques introspection

Simon Peyton Jones who is known for his work on Glasgow Haskell Compiler, gave an excellent talk on how to make excellent talks at a research conference. Not about GHC, though, which, probably, could be more interesting.

One of his observations, which I also noticed a couple of conferences ago, was that it's easy to be just good enough because most talks on an average research conference are really boring.

He also made a talk on how to write a good research paper, but it went beyond that, and Simon shared his view on how to interweave presentation into the research. He advised to start writing a paper at a very early stage, starting from the "middle", the "core" of the paper. This would help you to shape your thoughts, and to understand what your research is actually about better, he said. However, some people reported that this recipe may be not universal, as it (luckily?) fits the way Microsoft evaluates their employees.

Another good insight was about about some techniques of how to present a research made. I learned that the approach to presentation of our scientific results we've been taught here, in Russia, is not the only one possible. The key difference with what Simon explained is the fundamental motivation why the research has been done. I have always been taught that in each and every paper I should justify my research with how bad, wrong and suboptimal the results previously discovered were, and here I come, with my research has the sole purpose to improve over them. Simon has introduced another approach. The key motivation for the research, in his opinion, was, "look, I've created something new! Isn't it cool? Look, how efficiently it works! And, by the way, foo and bar have also been discovered, and I sincerely thank their authors for they have inspired me".

Of course, it's hard to tell that one of them is the right way to go. However, I consider papers written the second way much more pleasant both to read and to write.

My Website is Closed for Christmas

Kenji Takeda making a talk on cloud computing

Kenji Takeda gave an overview of what Microsoft is doing in clouds. Aside from flying here and there on planes, Microsoft also builds datacenters, both at software and at hardware level. I learned that an interesting solution was developed for cooling datacenters. Instead of installing fans that just move the air away, simple physical observation could be used to improve over such an inefficient and sometimes failing solution as mechanical fans. The idea is to utilize the phenomenon that the hot air is lighter, and moves up. If the upper part of a large room is open, the hot air is removed from there, thus creating the draught that sucks the cold air in from the holes in the bottom.

After the lecture, we had a brief discussion of how to manage utilization of the cloud resources at peaks of the demand, when there's just no enough power to serve all the requests per minute, such as on Christmas. The answer was that it may be achieved by raising the price at certain periods. Indeed, if we are running a scientific computation (such as genome sequencing), it can be safely paused for Christmas, whereas large shops or entertainment information providers may take the resources instead. Raising price is the way to make it self-manage, driven by reducing expenses and maximizing income for all the parties involved.

As a side-effect, it may turn out that some non-commercial services that utilize cloud computing services, such as my website, may also get temporarily shut down—just as a ground shop would be closed on holiday. So "this website is closed for Christmas" is not a distant future for us.

Biology and Ecology, and... Microsoft?

Biological simulation in Sci-Fi

This is how computer simulation in biology is portrayed in the "Hollow Man" sci-fi movie by Paul Verhoeven. As far as I know, Paul used to be a researcher, and really tried to make it look real. But you may notice that the protagonist is running a simulation of a molecule he designed, and it's not model checking: the science was not that advanced at the time of shooting.

Turns out that Microsoft really carries out a lot of research. For instance, heard a talk about fighting cancer with model checking, which was created with help of Thomas Henzinger, a well-known scientist among software verification specialists. Model checking helps us to fight cancer by discovering and verifying models of internal chemical processes in human cell—without simulations or actual experiment. These models, shaped as finite state machines, are first verified by computers, and if the automaton found is proven sane, the real experiment is carried out to confirm the relevance of the model.

I've seen something like this in the "Hollow Man" sci-fi movie, but it turned out to be the reality of a Microsoft Research laboratory! And even more, while the Hollow Man ran simulations (see sidenote), model checking techniques do not require even a model of the experiment: the assumptions about the nature of the processes being studied are verified as a mathematical abstraction instead of as a physical object, thus providing more reliable result.

Another interesting domain is ecological research. Microsoft gathers and works on providing representation of a lot of ecology-related data. Drew, a researcher from Microsoft, also shared that humanity starts experiencing difficulties with water! He said, that today we literally have to mine water as deep as hundreds of meters. And some of the drink water sources are actually getting exhausted: you have to dig deeper and deeper to get to the actual water!.. However, I'm not an ecologist, so that was just an idle curiosity there.

Anthony Hoare and yet another philosophy lecture

Sir Anthony gave a boring talk about the history of computer science projected on the development of philosophy. Ever since the times when I was visiting philosophy lectures, which were obligatory for my postgraduate studies, dozing off became an instant reaction to the history of philosophy.

I couldn't miss the chance to ask a couple of questions to such a well-known person. I, actually, had one prepared. Several months before, I heard a talk on some applications of model checking. Irina Shevtsova advocated the use of models in software verification. Roughly speaking, she insisted that a program should come with additional information to verify its correctness. Ascribing this to Hoare, she explained that the auxiliary information should be a model, a finite-state machine to depict a programmer's—or a customer's—intent. This sounded too controversial to me, but the moderator of the seminar shot my attempts to start a discussion on that matter.

So, I asked Mr. Hoare whether it was really what he meant. It turned out that it indeed was not the model what Hoare was talking about, but mere assertions. Any assertion you put becomes a piece of meta-information on what the correct behavior of the program is. The non-violation of these assertions can be automatically, or, at least, manually, verified, and the stringent adherence of any execution to all the assertions is what makes a program complying to the developer's intent. Can't wait for another encounter with Irina, where I'll have a chance to explain (oh, let's be honest, to brag about) this misunderstanding.

Terminator

Byron Cook gave an awesome talk on termination checking techniques. This is not about the extinction of the humanity by machines (well, if it is, then just indirectly), but about automatically proving that a given program will terminate. I was aware about the techniques of proving that a program will never violate a given property while running (this is called "safety property"), and I even maintain a tool that performs such a verification. However, the details of proving termination (that the program will not get stuck in an infinite loop) somehow remained being not learned by me. Byron has filled this gap with his awesome talk. I won't try to explain it briefly, and will redirect you to the article in CACM he co-authored (or, judging by his talk, made the biggest contribution to).

Cambridge

King's Colledge

This is, perhaps, the most well-known view of Cambridge—shot on my own camera! At the time of making this picture, I was on a boad being punted, the even having been kindly organized by Microsoft Research.

The event location was Cambridge, a beautiful village an hour of a train ride away from London. We quickly got used to a mild smell of potassium the town greeted us with, and the remaining dominant flavor of science encouraged the dissemination of the knowledge at the school.

We couldn't help walking through all the Cambridge at our free time, including, but not limited to several pubs (authentic English ale is awesome!) But the details of the trip are really off-topic here; what I can safely share is that if you get to England, it's totally worth it to spend a day taking a tour over Cambridge.

Conclusion?

What was most valuable to me personally in this School is the breadth. I learned a few about a lot, and that's how you really enjoy getting new knowledge. I met one person who worked in a similar domain (and I missed a change to have a good discussion with another guy—jeez, I should really learn how to approach people!), and had an interesting discussion with him. I learned something new about how to present my research well, and... Well, I learned that Microsoft is not that evil! Isn't this what the organizers hoped for in the first place?..