This blog is now a Hugo-powered static website on AWS Elastic Beanstalk

February 15, 2018 1475 word 7 min read

Ten years ago, I thought this website should be a content platform with video hosting, blog platform, and other cool things. It was an essential learning experience: I’ve chased performance bugs and reinvented highload wheels. Today, armed with several years of Industry Experience™, I am ready to present the highest-grade advice of a seasoned web engineer.

The right architecture for this blog is a set of static webpages served off of a file system by a simple web server such as Apache or nginx. One tiny shared VM would serve way more visitors than my content could ever attract.

And I did just that. There’s no dynamic content here anymore. So… wordpress you say?

Why do this again?

I hated managing the website manually, so why did I choose to endure one more manual-ish setup? First, Wordpress is for wimps who love PHP, especially the hosted one. I was going to yet again immerse into whatever the current hype was in order to learn about the latest cool tech. And the hype of 2017 was

container-based (Docker)
mobile-first
implemented in Go
deployed onto Managed Cloud
with managed services for email, database, etc
…and maybe even serverless?

Everyone seems to like Amazon. I would have chosen Google Cloud Platform, of course, if I were to optimize for quality and reliability. However I’ve chosen AWS because its a) the hype; b) not where I work. I’ve had enough exposure to Google Cloud as an insider, and I did want to expand my horizons.

But how would I choose what exactly to do? What should my architecture achieve? Of course, it should be security, long-term maintainability, and… money.

Security

My previous version of the blog ran on Gentoo Linux. It was hell and it became unmaintainable. I rant about it at length in my previous post.

Anything that is quickly updateable would work. I used whatever linux. I literally don’t remember what it is; I need to look….

$ head -n 1 Dockerfile
FROM nginx:latest

What is it? I guess it’s Ubuntu or something else Debian-based, but I literally don’t care. Ok, Security–done, next.

Let’s optimize for… money!

Another motivation I had was to reduce the cost. Before the migration, the single VM that was serving my shitty Ruby on Rails app cost ~\$50 / month. Gosh, I could buy a brand new computer every year. So, can I do better than to shell out the whopping \$720 / year for this website few people read? Can I do, say, \$10 or \$20/month?

It sometimes gets worse. As the fascinating Interview with an Anonymous Data Scientist article puts it,

…spot prices on their GPU compute instances were \$26 an hour for a four-GP machine, and \$6.50 an hour for a one-GP machine. That’s the first time I’ve seen a computer that has human wages…

Turns out, I could get it cheaper but only if I didn’t use all the managed services. Managed is costly.

The smallest (tiniest) MySQL managed database costs \$12 / month on AWS. It supplies me with 1 cpu and 1 Gig of memory. This blog doesn’t need 1 damn CPU dedicated to the DB! It doesn’t even need a DB! It needs to copy static pages from basic file system to your screen!

Rabbit holing is another problem. So what if I want a AWS managed Git? Yes sure! That’d bee free or \$1 a month. Plus access to the Keys. The Keys would be \$1/month. Oh, and the logging of the key usage? That would be another whatever per access unless there’s 10k accesses but don’t worry, for most workflows that’d be fine!..

ugh. Can I get something more predictable? One way is to search clarity, and the other to get rid of this.

Getting rid of this. And of that.

Turns out, I can get by on the internet with free services for code storage, bug tracking, and file, photo and video storage. The \$1/month I pay to Google covers 100 Gb of crap, which I’m yet to fill. GitHub and Youtube are the staples. I’ve explained more on private git and other things in the previous post.

Do I even need rendering?

So what about converting human-writeable rich-text formats to HTML? Wordpress would be too lame, but I can’t do my own rendering engine anymore. The highlight of the previous version was of course the custom context-free parser generator that compiles a formal grammar into Ruby code. It took it sweet 1-2 seconds (!) to render each page of this website (not a joke).

That thing burns in hell and gets replaced with Markdown.

There would be no database. The contents (i.e. the text on this and other pages) would be versioned in Git and stored on a “local” disk (disks that are attached to only 1 machine are considered local even if they are actually remote, which is how cloud architectures now work).

If I wanted to change the contents or to post something new, here’s what my workflow would look like:

SSH onto the server
Use vim to add or edit a file in a special folder.
Use git to push the change into the “release” branch.
Run a ruby script that would use Markdown to format all the blog pages into HTML. It would use ls to get the list of all pages and make the blog archives page. That’s not much different from a DB-based architecture: ater all, Databases evolved out of simple collection of files arranged into folders.
rsync the code onto the remote disk.

That’s it. nginx would serve the generated pages. Since making a new post invalidates all pages anyway because you’d see it pop up in the sidebar to the left, there’s even no need to be smart about it!

What a genius idea! It’s so obvious that I’m surprised nobody…

…and here’s a list of 450 static site generators

…welp.

I chose Hugo because I wanted to play with Go.

Well, ok, now what do we do with Docker?

Docker can combine the local disk contents with a recipe called Dockerfile to produce a VM-like image that could serve a website.

Now, it would be a fun idea to have a self-hosted Docker image. The image would contain the Git repositories for website content and its own Docker files, and it would build itself and redeploy itself using AWS APIs. I think it could work…

But let’s start with something simpler. Let’s simply build Docker from another Docker container. It’s easy enough and it protects me from the loss of my home machine. In the end, the workflow would work like so:

Docker image Build contains all the build tools, including the Hugo website builder.
Static contents (including the images and Markdown files) are in a git-versioned local folder.
The Build image runs hugo to create a folder with HTML, CSS, and other files that constitute the entirety of the website.
Another Dockerfile describes the “Final” image, which combines [nginx:latest][nginx-docker] and the static files created in the previoius step.
The Script deploys it to Amazon Elastic Beanstalk.
a Makefile connects it all together.

Here’s a diagram of what it looks like:

And in the end, you get this website.

Amazon Elastic Beanstalk speed run

The speed run of how to set up autoscaled container service on Amazon Cloud is in a separate post.

A speed run is a playthrough of an otherwise fun game done as fast as possible (primarily, on video). Setting up AWS is a very fun game. For example, it's easy to set up ECS, and then discover, halfway through, that you've made some wrong choices at the beginning that are unfixable, and you have to start over.

I wrote a speed-run of AWS Container game. Check it out, and after that you can enjoy speed runs of less fun games on youtube.

But did it work?

Yes it did.

It works, it is blazingly fast, and it’s pretty cheap for a managed platform with rolling updates. My bill is \$0.85 per day, which brings it to… $26 a month. In a quirk of Amazon Billing, it says I’m paying the most (65%) for the Load Balancer rather than for “running the instances” (27%). All these costs are bogus anyway.

Believe me, I tried to delete the Load Balancer (this kills the service) or switch to single-instance configuration (this simply doesn't switch and quickly reverts back--sic!). I couldn't bring it below \$25, and I'm OK with that. Although, I could run this on App Engine for free...

I’ve achieved my goals: cut costs by almost a factor of 3, and reduced the page load speed by a factor of 10-100x.

I fixed my blog. But I set out to fix it not to blog about fixing it; I wanted to explore something more interesting. But perhaps, next time. ;-)

A Farewell to Gentoo

January 25, 2018 854 word 5 min read

As I mentioned in my previous post, the previous version of this blog ran on a VM powered by Gentoo Linux. Partly, that was the reason it was such a big mess and frankly, a security hazard.

You see, I’ve become literally scared to update Gentoo. Installing updates on Gentoo is like a challenging puzzle game. Installing Gentoo updates is an NP-hard problem. It is a drain on your time, it’s a family tragedy and it is plain and simple a security threat. But let’s start at the very beginning, when I first saw a Linux thingy at my grad school….

At the beginning, there was Windows 3.11 for Workgroups. The first computers I interacted with ran MS-DOS or Windows 3.11. Then Windows 95, and 98, and finally Windows XP. I thought Windows was all there is.

And then I went to a CS class in college, and wham! Gentoo.

I immediately fell in love with these green [ ok ] marks that show when a portion of the system has comkpleted loading. Unlike the never-ending scrollbar of Windows XP, it fosters immediate connection with the inner workfings of the machine. You feel involved. You feel in the know. You feel powerful.

So when I needed to choose a Linux distro to complete my coursework, it was Gentoo.

The main feature of Gentoo is that you build everything from sources. Nothing connects you to the inner workings than you literally witnessing the gcc invocations as they churn through your kernel you manually configured, through the window manager, or a new version of perl. That’s right, every single package–including the Kernel–is rebuilt on your local machine. Why?

One tihng, is that you can enable unsafe optimizations and tie everything to your machine. Those off-the-shelf distros have to work on a number of machines, but with Gentoo, you can compile everything with gcc -O3 --arch=icore7 -fenable-unsafe-life-choices.

It is insanely satisfying to watch. You haven’t lived if you’ve never seen Linux software compile. If you haven’t seen it, watch it. It’s worth it. It’s like watching fire.

Another selling point, you can disable the features and Reduce Bloat(tm). You don’t want to build a desktop environment? Fine–all your packages will compile without the GUI bindings. You never use PHP? Fine, no PHP bindings. You don’t like bzip2? (Wait, what?) You can disable that too! You just specify it in the USE flags in your make.conf, like USE="-pgp -gtk -qt4 -bzip2", and then when you emerge your packages, they’ll build without them. (emerge is the Gentoo’s apt-get install).

Awesome. Wait, what did you say about Bzip2? You can compile your system without bzip and only with gzip? Why do you even care? That’s because you’re a college kid with a lot of time on your hands. Emerge on.

So I emerge. Back in 2005, it took really long to compile KDE 3. We would leave it overnight to compile, and pray that it would not fail because our particular USE flags selection didn’t make it fail.

And then you try to update it. emerge -uDpav, I still remember it. It recompiles all your updates.

… or not. If you somehow forget to update the system (e.g. you leave for a vacation, or your cron crashes) then come back in two weeks and try to update it… it will fail to compile. That’s when you’re introduced to dependency twister.

Since the system is its own build environment, every next version should be buildable on top of the previous version. But sometimes it’s just not. It just doesn’t build. Some library is too old, but in order to compile a new version, you need to downgrade another library. Or worse, build dependencies form loops. Imagine dependency Foo needs a new version of library Bar to compile, and the new version of library Bar requies a new version of Foo–this actually sometimes happens.

Then, you’d have to resolve them by temporarily disabling or re-enabling USE flags. Or randomly rebuilding subsets of your packages (via helper tools like revdep-rebuild). Or applying the updates in the correct order, but you need to figure out the order first.

It’s 2017 and you still have to do it; nothing changed.

As a result, your system quickly rots and becomes a security hazard. A computer that hasn’t been updated for years, and is open to the network is a security risk. My access logs showed that automated bots were constantly trying to hack the website (polling URLs like /wp-admin/admin.php). So that’s it. Unless the system can security updates quickly and reliably, it’s a security hazard. Gentoo can not.

I got tired playing dependency twister around the time I graduated. Also, I got tired of trying to update Ruby’s ActiveRecord every now and then. Nothing like doing this for several years, I really makes you appreciate App Engine and simialr products.

So I bid Gentoo farewell and moved on. I moved on to Whatever Linux that makes my Docker Containers up to date… which is now I believe Ubuntu? I don’t really know and I no longer care.

Good bye, [ ok ]. I will miss you.

A New Look

December 16, 2017 249 word 2 min read

This website just got a new look (with mobile layout), a new web hosting, and a new technology that powers it. I’ll post the summary in the coming days. And otherwise, just enjoy fewer lags and 500 errors. (Except today when I accidentally routed the traffic to the wrong Load Balancer. >_<)

I’ll be writing separate posts about these migrations, but here’s what went where.

Website hosting moved from Rackspace Cloud hellscape of a single manually managed Gentoo Linux instance to Amazon Elastic Beanstalk. Apart from manually managing a VM, managing a Gentoo VM is its own special hell.
Image hosting moved to Google Photos. Especially since Google photos is not as much of a walled garden as it used to be.
Video hosting dies–why have it when we have youtube and vimeo. Yes, I had my own small video hosting service. I’m glad it died.
Code hosting moved to my Github page.
- Private code hosting moved to GitLab.com which seems to offer free private git repo hosting.
- I will no longer host a Redmine instance at dev.coldattic.info. My toy projects such as Muni LED Sign and Evil Board Engine are on Github now.
Email filters. The server used to run an IMAP filtering script that was scanning my gmail inbox and sorting mail. I’ve created filters in Gmail interface instead.

Non-essential services have been happily shut down. And I hope you’re enjoying the new site and 0ms page load times. Shoot me an email if something’s not right.

Division by Zero? How About Multiplication by Zero

March 17, 2016 192 word 1 min read

Crash due to division by zero? How about multiplication by zero?

I recently transferred to the next level of floating point arithmetic caveats. Division by zero is something we all know about, but multiplication by zero can be as harmful.

An algorithm computed weight of entities, then converted them to integers, as in

This code may fail the assertion. Imagine y is large (say, 1000), so exp(y) no longer fits double. The value of exp(y) will be +Infinity in this case. Surprisingly, it will correctly understand that INT_MAX is less than +Infinity, and will pass the check as expected. But here's when multiplication by zero kicks in.

What will happen if x is 0, and exp(y) is infinity? Any result will be mathematically nonsense, unless it's a special value of "Not A Number". min() would also return NaN, and integer conversion will happily convert it to... -2147483648. The assertion fails, and the production job crashes because it does not expect the result to be negative. We're multiplying two positive floating point numbers, how can it be negative?

Yet it is. All because of multiplication by zero.

How Reader Mutexes Can Deadlock

March 10, 2014 664 word 4 min read

Sample Deadlock

Translucent areas depict waiting for something; incomplete lock statements have dashed border. Note that it doesn't matter in which order the top two acquisitions are made.

Can a cryptic entanglement of your mutex locks lead to a deadlock? It sure can. Deadlocking is the second thing your parents tell you about mutexes: if one thread acquires A, then acquires B before releasing A, and the other does the same in the reverse order, the threads may potentially deadlock. And deadlock they will if the first two acquisitions are picked from separate threads. Here's the code of two threads, and sidenote depicts the problematic execution:

But what about Reader/writer locks also known as "shared/exclusive" locks? Let's recap what these are first. Sometimes, to achieve greater efficiency, a mutex implementation supports two flavors of locking: Reader and Writer (otherwise known as "shared" and "exclusive"). If several threads only want to read from a shared variable, there's no need for each of them to wait for others. That's where you'd use a ReaderLock operation on a mutex guarding the variable. If thread wants to write, it invokes WriterLock which means "do not run any readers of writers while I'm holding the lock". Here's a wiki entry for reference, and here's standard Java API.

Seemingly OK Reader Lock Execution

We no longer have a "must-before" relation between B locks in two threads, so they don't deadlock. This looks OK, but it actually is not!

So imagine that both threads X and Y happen to use one of the locks as Reader lock? It seemingly should prevent deadlocking: if, say, B is a reader lock, then the execution specified above will make progress: B.ReaderLock() in thread X will not block waiting for thread Y to release it... right? Here's the code for clarity:

Turns out, reader locks can deadlock. You just need to make reader lock wait for another reader lock's release; how?

Many mutual exclusion implementations make acquiring threads "form a line" of some sort to ensure fairness: no thread should wait forever for a lock. Then a threads that tries to acquire a lock--either shared or exclusive--waits until all threads that called L.WrLock() earlier exit their critical sections. Fairness is especially important when you have reader and writer locks: if you'd allow any reader lock to proceed while there is another reader holding the lock, your writers could "starve" waiting for quiescence among readers, which may never happen on a highly contended lock.

So, to make a reader lock wait on another reader lock, we need a writer lock between them.

Deadlocking Reader Lock Execution

Here's how three threads can interleave such that you have a deadlock between reader mutex locks. The "blocked by" relationship between these reader locks transitively hops over a writer lock in some other thread Z.

Assume, that in the execution described earlier, before Thread X attempts to acquire the reader lock B, thread Z chips in, invokes B.WrLock(), and only then X calls B.RdLock(). The second X's Y.RdLock() starts to wait for the Z to acquire and then release B because of fairness concerns discussed above. Z's B.WrLock() waits for Y to release B.RdLock(). Y waits for X to release A. No thread makes progress, and here's the deadlock. Here's sample code of all three threads:

Note that you will have at least one writer lock for B somewhere (because if all acquisitions are Reader Locks, there's no point in having the lock at all.) Therefore, the only way to prevent this kind of deadlock is to not distinguish reader and writer locks when reasoning about progress guarantees.

This kind of deadlock needs at least three threads in order to bite you, but don't dismiss it outright! If the Internet taught us that a million monkeys in front of typewriters will not eventually recreate all the body of Shakespeare's work. they would at least trigger all possible race conditions in our typewriters no matter how contrived the corresponding executions seem.

Nonlinear Effects when Web Server is at 100% CPU

January 2, 2014 1176 word 6 min read

I used to wonder what this Error 502 meant when I observed when I visited my favorite websites. When I retried the error usually seemed to disappear, and I found it's linked with the service being overwhelmed with requests at certain times. But why do I see it? Why doesn't the web server start many threads leveraging OS capabilities of CPU sharing? If it queues incoming requests, why doesn't it just delay their fulfillment, just making them wait in a line longer? Just like, you know, in any cafe, bank, and public office where people form lines, and get what they needed--eventually?

There are three answers to this. First, small-latency error is more important than serving all requests. Second, there are interesting nonlinear effects that happen when the server is at capacity if you aren't using queueing. Which leads to the third point, if more requests arrive than what you can process in one second, you will not be able to serve them, ever. You have to drop them. Let's take a closer look.

Fast error is better than slow result

If a service is consuming responses of several other services, it's often in a situation where it can ignore output of one of them, and still provide value to the user. An example in the paper linked above is Google web search that can skip displaying ads, maps, and other additional information, and simply serve search results.

This alone could be a good reason to drop a request, but this doesn't sound that convincing for the frontend server that faces a human end-user. Let's explore two ways of keeping the requests.

Latency Grows Exponentially If You're Not Limiting Rate

Imagine we're not limiting the number of worker threads, and just start processing everything we receive. In other words, we do not keep a queue of requests, and rely on the operating system to provide fair distribution of computing power among working threads. Say, if we have 4 cores, and every requests takes 1 second of CPU work, we can serve at 4 QPS; this is considered "at capacity".

What would happen if a rate grows beyond 4; say, to 5? Requests arrived at the beginning would not complete when a pack of new requests arrives. At the beginning of 2nd second, 5 earlier requests each would have 20% left, and execution speed would drop twice, so they would be completed only after 1.5 seconds. At the beginning of second 2, requests that arrived at second 1 would have completed in 0.4 seconds, but 5 more requests arrive, decreasing the speed by a factor of two again. The latency for these requests would be 2 seconds. The latency for the next pack arriving at 3rd second would be as much as 2¾ seconds, etc. Here's an illustration on how CPU time is distributed among requests, requests arrived at the same second have the same color:

Let's assess how the latency grows with rate. In other words, let's compute function L(t, r), latency of a request arrived at the moment t if the rate of requests is r times higher than capacity. For instance, if our capacity is 4 QPS, but there are 5 incoming requests, then the rate parameter r = 1.25.

First, the number of in-flight requests at the beginning of a given second N(t) grows linearly with t. Indeed, N(t) < QPS*t; on the other hand, the amount of CPU time owed to previous requests is QPS*t*(r-1)/r, which equals QPS*t*(1-1/r). So, (1-1/r)*QPS*t < N(t) < QPS*t, which means that N is linear function of t, e.g. N(t) = a*t, where a > QPS*(1-1/r).

Second, the CPU time allotted to a request witinh a second [t, t+1) is at most 1/N(t). The latency of the request is then the minimal x such that S = 1/N(t) + 1/N(t+1) + ... + 1/N(t+x) ≥ r. Since N is linear, this sum equals to 1/(a*t) + 1/(a*t + a) + 1/(a*t + 2a) + ... 1/(a*t + xa). From integral calculus we recall we can approximate the sum 1/2 + 1/3 + ... + 1/x as ln(x), so S ≅ 1/a * (ln x - ln t)). Hence, x ≥ t * exp(a*r).

We proved that L(t, r) ≳ t * exp(a*r) where a > (1-1/r), so latency grows linearly with time, and exponentially with rate of incoming requests if we try to serve them all right away.

Practical results, however, seem to suggest that exponent power is sublinear with respect to r. Latency definitely grows greater than polynomially, but the true function in the power remains a mystery. The closest I came to so far is sqrt(r), but let me juggle it a bit first, and I'll share my calculations and graphs. Here's the simulation script; built-in rational numbers in Python are cool, btw.

Newly arriving requests slow down the currently executing ones, prolonging their latency. There is a way to mitigate it: queueing.

Latency Grows Linearly with Queues

Imagine the most naive solution: first come, first served queue. All incoming requests are stored in the queue, and are served in the order they arrive, handler popping them from the queue as soon as it has capacity (a free CPU share or an idle working thread).

When the server is below capacity, the queue is always empty, and the latency of each request is L0, which is the time it takes handler to construct a response. What happens when the server reaches capacity? The latency of a request that arrives at time t (we assume constant rate r) is L(t, r) = Q(t, r)*L0 + L0, where Q(t) is the queue size when the request arrives. The size of the queue is the number of undone requests, which is t*(r-1). Therefore, L(t, r) = t*(r-1)*L0 + L0, concluding that latency grows linearly with both rate and time with queues.

You Can't Serve More Requests than What You Can

This may seem obvious, but is often overlooked. If you start queueing requests and serve them later, then you need to have the capacity dedicated to these requests later. Meanwhile, the next second more requests arrive than you can serve that second, and you have these old requests, too. Now you have three options:

Serve old requests first, but then your latency will start growing at least linearly as you spend more time over capacity;
Serve new requests first, but then old requests will remain in the queue until the first second where more requests arrive. In most cases, this would mean waiting hours as usage pattern are usually circular.
Discard some requests without spending CPU time on serving them. Which leads us to error 502.

In other words, we have to drop (r-1) of all requests regardless of which scheduling mechanism we use. Whether or not the latency grows linearly or exponentially is irrelevant: it has to grow, and this is bad. Its exponential, turbulent growth without queueing is just an interesting artefact because you have another reason to avoid starting all the incoming requests at once: it would simply choke OS scheduler.

Light Sensor with Arduino for the Muni Sign

October 2, 2013 1185 word 6 min read

"SF Muni Sign at Home" series

Several post on how I was building a San Francisco Muni sign that shows train arrival times at home.

SF Muni LED Sign at Home with Raspberry Pi — first post where I introduced the sign;
Light Sensor with Arduino for the Muni Sign — how to add light sensor to shut the sign off when it's dark.
Tidbyt: Someone's making the Muni Sign — someone's Kickstarter project to make a similar art piece!

When I installed SF Muni sign in my apartment, I realized it had contradictory requirements when it comes to positioning. On one hand, I want it to be visible from my couch, where I sit in the morning. On the other hand, I would want it powered off when I sit on the very same spot in a darkened room, say, when I'm watching a movie. Manually turning it on and off (a custom remote control?) would undermine the whole idea of having a fully automated always-on sign. Installing an automated power on/off knob by itself is easy, but what would trigger it?

Note that powering the LED sign on and off was not a part of an official interface, but I resolved it by rendering an empty picture. I don't know how much energy it saves, but since the USB port is cold after a day of it being "powered off," I can guess that a lot.

I thought of using a schedule: say, power it in the morning, and power it off in the evening. That would probably do. This would be at indirect measurement, though. Schedule is not the problem here: the actual problem is the contrast between the lit sign and the background. What if I can measure the luminosity of the room directly? I have a computer that is much closer to hardware than what I used before, so why not install a light sensor?

Minus Pi

What we need is to somehow read the value of current ambient light, and transform it into digital form for our scripts to read. A light sensor is basically a diode that changes its resistance based on the outside light condition, and we need to read current that flows through its circuit. Unfortunately, the Raspberry Pi itself can not do this: it lacks digital-to-analog converter. Luckily for us, other small Pi-like computers do.

Arduino Nano

One of Arduino boards, Nano. I chose the smallest one that had analog input. There are more arduinos available.

Plus Arduino

One of the computers capable of this is Arduino. Unlike Pi, it does not run as a full-featured Linux machine. Instead, you load up a program from another computer, and Arduino then runs this program. Well, that's not too different from what happens on the Pi, but you definitely can't load Linux on small Arduinos.

So, since Light Sensor can't be connected to Pi, I bought Arduino Nano for $10. You probably can buy a different board (the important thing to look for is an "analog port",) but I bought the smallest because Pi is already large enough. The board I bought is, like, ten times smaller than Pi, and that's what I looked for.

Here's how it looks like, mounted:

Mounted Pi+Arduino+Sign

Here's how the whole thing looks on my wall when it's bright enough.

Circuit

Since my electronic circuitry skills are somewhere at high school Science classes level, I used this video as a guide. Here's the circuit that guy describes:

Please excuse my mad circuitry skills.

When the light is off

Note that the sign is shut off when there's no enough light. Fancy LEDs are conveniently hidden behind the corner, and don't face my couch. :-)

The components used are really cheap. Arduino itself cost $15. Light sensors and resistors (I used 1kOhm) are sold for five bucks a hundred, but I went for a more expensive option from RadioShack because the store is close to where I live. Since I don't know how to solder, I used female-to-female jumper wires to connect the components (I cut one wire's connector at one side with scissors to connect to the analog input A0.

Conclusion

So it worked as a charm. I played a bit with circuitry (see the notes below,) and can now watch TV without bright red LED lights imprinting on my retina. And I have a much weirder and colorful installation on my wall, which is by itself immensely cute as well as useful.

Programming notes

After I assembled the scheme, I proceeded to install SDK. I didn't try it on my desktop computer, and did this directly on Pi (because why borther?) The apt-get install arduino didn't work, and was showing some error I forgot to write down, but the advise I googled was to update first:

sudo apt-get upgrade && sudo apt-get update
sudo install arduino

Then I launched SDK from a remote console by typing arduino into my console (I made sure I ssh-ed with -X option, like ssh -X pi@192.168.100.100), and had the GUI pop up on my desktop while actually running on Pi. I pasted the program you can find in contrib, and pressed "Upload." It didn't work because I first had to select "Arduino Nano w/ATmega328" under "Tools -> Board" menu. This was all programming required: Arduino ran this program every time it was powered on, but I still needed to collect the digital data. The numbers were printed to the USB serial port, and I could read them either from Arduino SDK or by cat /dev/ttyUSB0 from a console.

This program only made sure that the numbers are printed onto the serial port, and a separate "collector" software that interacts with the sign client was still necessary. The collector script I wrote was communicating with muni sign program by creating a file, thus signaling that the sign should be shut down. I added another initscript and used rc-update to auto-start it.

When I first tried to check if the sensor works, I saw that it changes very abruptly with the luminosity. The value read 830-840 unless I completely shoved it under the cushion (where it read as 0). My physics intuition nudged me to decrease resistance, which I did, and achieved greater smoothiness. Now it could tell dimly lit room from a completely lit one, which was my original goal. I use threshold value of 100, but it will vary with resistor and the sensor you use. Oh, and light sensor is one-sided, so if the value you're reading is about zero no matter how bright the room is, you probably should insert it backwards.

I also couldn't find a way to assign ttyUSB ports, so that LED sign is always on /dev/ttyUSB0, and Arduino's serial port is always on /dev/ttyUSB1, and they get assigned randomly. I just reboot Pi a couple of times until it gets it.

That's all, and I hope this will help me reassemble it again.

Comments imported from the old website

https://profiles.google.com/104964979771807566427 on 25 November 2013 commented:

I'd prefer using some kind of ADC (http://en.wikipedia.org/wiki/Analog-to-digital_converter) to measure analog signal level.

Misuse of Caching in Web Applications

September 11, 2013 1116 word 6 min read

I noticed a pattern in some of my web software architecture failures, and I also notice similar pattern in others' designs. In many cases, people simply misuse memcache, hurting performance of their web applications.

If performance is your goal, you can’t pretend that your computations and rendering is fast, and add transparent caching where it later appears to be slow. Transparent caching is a fairy tale programmers have been telling one another, and we’ve also been writing tools and building hardware to remain in for decades. Turns out, it doesn’t scale. A high-performance web application is designed around serving static content that--sometimes--get recomputed

under very specific conditions;
only partially;
after very specific events;
most likely, by a component separate from the one serving the content.

Where would you keep such a static content derived from the raw data (I will later call it ”derived content”)? Memory caches are often unfit for the job of storing derived data: your items may be removed from cache ("evicted") if you don't have enough memory, or if your another web application or component is working within the same cache. And evictions are much more severe than you might think.

Assume it takes your webapp several seconds to recompute the item you want to serve even if it hasn’t changed, and all requests that come during this period are going to have at least these several seconds of latency. In modern multi-component web-services high latency for a small fraction of requests makes much more harm than it seems, as corroborated by this overview paper by Googlers, so you might want to avoid it. Not to mention the simple reasoning that if this eviction happens under heavy load, the stack of HTTP requests waiting for the data to be computed might kill you.

Is there a problem with caching data altogether? Yes, and no. The idea of memoization--storing results of complex computation for later reuse--is perfectly fine. The problem with caching as I see it is that many now view caching exclusively as storing data in unreliable but fast-to-access caches (think memcached). This is not how caches in web applications have to work despite that’s how they usually do.

Caching Primer

There is another storage you can use for precomputed results, the Database (SQL, or NoSQL, or NewSQL) where you probably store the rest of your data.

Assume you are presenting users with a set of analytics over the data they entered yesterday; say some breakdown of yesterday’s sales by zip code. Say, it takes 20 seconds to compute all the data. Where would you store it?

What you could do is to cache the result of the first request, spending 20 seconds to serve it. You can be smarter, and add a cron job that hits that page every midnight. But if your data gets evicted, your next user will suffer these 20 seconds.

Instead, store the data in your persistent storage (database), the storage query being triggered by the same daily cron job. When the user requests the analytics, just serve it from there; it will never delay by 20 seconds since the data aren’t going anywhere unless you explicitly delete them, which you’ll be wise not to do needlessly. It will take you one database query to retrieve the results, and you can also cache this query in memcache for even faster results. No 20-second latency anymore, just the latency of one database select of a single row. Neat, right? Here’s a lame picture to illustrate this that shows that recomputing-on-write is more efficient as writes are less frequent usually:

Why People Use Memcache Then

So if the world of using permanent storage for caches is all ponies and rainbows, why isn’t it used everywhere? The reasons are performance tradeoff, abusing unreliability of caches, and simple storage management.

Preformance Tradeoff

If your computation is as fast as querying storage, there’s no sense to use persistent storage as cache. In rare cases when you have insufficient latency, but overprovisioned on CPU throughput persistent caching will improve your app, but most likely not. Also, if you think of updating cache often in background as triggered by user requests, you need a fast task offload solution that avoids write contention (sich as App Engine’s Task Queues).

Abusing Cache Unreliability for Eventual Consistency

Cache expiration algorithms make the state of the cache eventually correct with respect your changing data. To build a correct permanent storage cache, you must make sure that the data in this cache are explicitly updated or deleted if unused. Unreliable cache will recycle stale data for you: if a key is never going to be accessed, it will be simply evicted in favor of more recent keys without occupying extra storage. Impersistence of memcache is an asset rather than liability in such cases. One of my previous posts, about sharded counter, contains an example of this.

The problem with efficient record recycling is that you can’t update record’s access time in the persistent database every time you access it without significant performance penalty (see the previous section on fast task offloading). On the other hand, you implicitly do this on every access with specialized in-memory caches. In some cases, a background recurring thread that performs DELETE FROM cache WHERE created_at > NOW() - 600 will take care of this, but this isn’t always possible.

Automatic Storage Management

A memcache can successfully and automatically operate when you don’t have enough memory to store all the results. Its efficiency doesn’t roll downhill with the pace increase because more frequently accessed entries stay hot, while performance suffers only on less “hot” entries hence rarely. Finding space for storage of one extra column per database row is rarely a problem today, though.

Narrowing Down the Scope

Given the above limitations, storing intermediate results in database makes perfect sense when the cost of computing these results is greater than the cost of database query, and when the data are tied to other database records. Say, storing a pre-rendered version of a blog post or a wiki page that uses a custom markup (especially if it’s badly implemented like the one in this blog) makes sense: you simply update it on write, and remove if the post is deleted. Storing a result of 10-minutes analytics query in a table rather than in a cache also makes sense (and some still rely on memcache to hold such data). Storing a join by key of two SQL tables is probably not worth it.

You need to design your application so that it is capable of making derived data updates after writes. Relying on caching will hurt you in the long run, because your application will be improperly architectured.

SF Muni LED Sign at Home with Raspberry Pi

August 8, 2013 2589 word 13 min read

Running like this

My Android phone's stock alarm clock wakes me up into a cold summer San Francisco morning. I wallow around a bit, but eventually get up to enjoy yet another day. An hour later, I'm running in my dress shoes, tie waving in the wind like that of an anime superhero schoolboy protagonist, towards a light rail train stop. Twenty seconds before I get there, the train departs, right before an unanticipated 20-minute break in the service. I eat a bagel I don't want in a cafe nearby just to sit there and work while waiting; my day, having barely started, is already ruined.

"SF Muni Sign at Home" series

How I built a San Francisco Muni sign that shows train arrival times at home.

SF Muni LED Sign at Home with Raspberry Pi — first post where I introduced the sign;
Light Sensor with Arduino for the Muni Sign — how to add light sensor to shut the sign off when it's dark.
Tidbyt: Someone's making the Muni Sign — someone's Kickstarter project to make a similar art piece!

Could I have made it? Of course, I could. I could have not re-done my tie three times striving to get perfect length; I could have not procrastinated on reddit for five minutes while sipping tea, I could have postponed paying my utilities bill. On the other hand, I don't want to finish too early and wait at home instead of in a cafe. I need something to let me know, at all times, when the next train would be.

Observations

San Francisco "Muni" public transportation vehicles, like those of many other modern big cities, are equipped with real-time GPS trackers that publish location data on the internet. My smartphone has QuickMuni app installed, so I could easily be checking it. I would pick up my phone, unlock it, wait for all the Adnroid lags, wait while the app downloads the upcoming train times, and repeat this every five minutes. This would be implausible. If only I had something like a wall clock so that it constantly displays upcoming train time predictions...

One morning I located the item I wanted in the wild. Actually, I've been seeing it every day, but had no idea that's what I needed all along.

That is it! I can mount a similar sign on my wall, drive it with something like Raspberry Pi or even with my desktop PC. I'd write some software to download, process, and display the train arrival times! I can even teach it to show weather forecast to quickly see if I need to carry an umbrella.

The Result

The sign I used is probably not the only one with Perl API: here's something else on CPAN that is probably about these guys, but I didn't try them so I can't vouch that they work.

Well, let's go for it. I bought a Raspberry Pi from Amazon, and purchased this sign from BrightLEDSigns.com. I was looking for at least 16 rows of LEDs (to squeeze two lines of text in) and for a simple interface to draw pictures on it, as I didn't want to spend time to figure out opcodes or some other low-level, loosely documented coding. I got the sign for the $89 sale price.

After several nights of hacking, I finally had it: my own, always-on customized dashboard with relevant morning information.

This sign shows that the next 38-Geary buses will come to a specific, pre-configured stop in 2, 7, 14, 27, and 40 minutes; ellipsis means a long wait. The next line shows fewer arrivals for the "backup" 5-Fulton line. The sign also shows that it's 55 degrees Farenheit outside right now, and it will be 56 at 8pm (that's not a bug; that's San Francisco weather for you).

I also implemented a rendition of a real Muni sign just for the fun of it. It has some bugs, and it's slightly different from the real thing, but I didn't want to spend much time on that since I was not going to use it anyway (pull requests welcome). It looks good for demonstration purposes, though. It's animated, here is a video!

The rest of the post describes what components I assembled this thing from, how I configured the Raspberry Pi, what software I used to program the dashboard and to utilize external dependencies, and what problems and bugs I encountered on the way.

Source Materials

Here's the complete list of electronic components I used:

Programmable LED sign - programmable with a simple API, monochrome, at least 16 pixels high, bought from this vendor the 16x96 one, with Perl API and USB interface.
Raspberry PI - The famous $25 energy-efficient computer I managed to buy on Amazon for $40 at the time;
USB Wi-fi dongle - found an unused one, origially bought on Amazon (newer Pi boards feature a built-in Wi-Fi)
SD-card 8 Gb - Raspberry Pi uses SD-cards as "hard" disks to boot from. I found one in my closet.
Raspberry Pi case - bought on Amazon too; not required, but you are less afraid of moving a working piece around;
Micro-USB power cord - found unused in my closet
Mouse, Keyboard, HDMI cable (or a less digital RCA cable), and USB hub - found in my closet; used for initial set-up only and not required to run, update, or write software

Total cost: around $200.

Source Code

As usual, I'll begin with the source code, which is on github. The source code of the patched Next Muni gem is here.

How to Run

There's basic documentation in the README file, and sample command lines are in the daemon startup script for this spec. Basically, you select a stop on nextmuni.com to find out your stop ID, and run client/client.rb --stopId 12345. Alternatively, you can run a morning dashboard by specifying up to two routes and stops in text, and add an URL to retrieve weather from weather.gov:

PI configuration notes

During the first run, connect the mouse, the keyboard, and the Wi-Fi dongle (if you need one) through the USB-hub. Follow up the Getting Started guide to download the initial Pi OS image onto the SD card. Boot it up; install the default Raspbian Debian-based distro.

Do select "automatically boot desktop environment on startup" when asked. While you won't need it, it's the easiest way to configure Wi-Fi. Once the desktop appears, select "Wi-Fi configuration" under one of the menus accessed via the Start button, and configure the wi-fi card you inserted to connect to the router. Make sure you also configure it to start up wi-fi automatically at boot.

Configure SSH server, and make sure it starts up at boot. Either configure hostname, or write down IP address. Reboot and try to SSH from another machine. If this works, disconnect mouse, keyboard, and HDMI: you don't need them anymore.

Make sure git is installed via sudo apt-get install git, download the sources, and follow the README instructions to install language runtimes. Play with the script's command line options to figure out what you want to run, and configure a daemon by installing the initscript. Once confident enough in resilience of the setup, mount this onto your wall.

Results

So the solution helped: I stopped running like a Japanese schoolboy, and started to control my morning procrastination, too. As a bonus, I had a ballpark estimate of how much it should cost taxpayers to program this sign, and, of course, I had a lot of fun too. Finally, I now have a geeky item in my living room... although it looks to others as a stream of stock prices rather than of Muni arrival times.

Notes

The rest are less "hot" sections, detailing my experience with the tools I used for the job.

Ruby

Raspberry Pi's default Linux, as a Debian derivative, contain decent support for Ruby, including 1.9. which is a bit old already, though. It takes Pi several seconds to load the interpreter and a dozen of gems I needed for this sign, but it's OK since the program only starts up once when the system is powered up.

LED Sign

So the sign I used provided a Perl API, so it would be natural to write the client program in Perl, too... not really. While I have a lot of experience programming in Perl, I would much rather prefer writing in a more interesting language, such as Python or Ruby. That was one of the reasons why I wrote a small wrapper over the Perl API, so that I could exec this simple program from any other language.

Another reason why I wrote this API was that the sign does not support two-line text in its built-in text renderer. The API allows you to supply a line of text, and the sign will happily render it, and even add some effects, but I wanted two lines in a smaller font instead. API doesn't have a "render image" function, but the sign allows you to create your own font glyphs, and insert them into text. The obvious idea of printing one 16x96 "symbol" worked. The wrapper does exactly this.

The sign can also loop through several images--just like the real Muni sign. API makes this very easy to program: this happens automatically when you send several messages. Integration with Pi went smoothly: just use $sign->send(device => "/dev/ttyUSB0"); in your program.

There were several glitches, though. First, each glyph is automatically centered, unless it's greater than about 80 pixels wide, in which case it's aligned to the left. Looping through differently aligned messages looks badly, so I force all images I send to the sign as 16x96, padding them with black pixels.

Second, there is no way to power off the sign programmatically. To power it off, I need to physically remove the USB connection, and turn it off because it starts using the battery otherwise. The sign manufacturer's intent probably was that you don't re-program the sign often, so you would program it once and then turn on and off manually. But we're programmers too; isn't working around things our bread and butter? Displaying an all-black picture did the trick of turning the sign "off".

Third, I refresh the sign each 30 seconds to ensure timely prediction, but it takes 1-2 seconds to update the sign. I suppose this is because the sign contains its own CPU and programming, or because I didn't try to play with send function arguments.

Although its internal plumbing probably affects the update speed, and I saw examples of offloading all the synchronization and flashing of the lamps to Raspberry Pi completely with different, more low-level signs, that would be too time-consuming for me to bother with. I also don't know how long the sign will last when it's re-programmed twice a minute. So far, I'm quite happy with the Bright LED sign I used and its API.

Font Rendering

I expected to spend zero time on font rendering, but it took much, much longer than I expected. Well, I was expecting that writing a custom font rendering engine would be simple, but boy was I proven wrong.

First, I couldn't find a "raster" font that is just a set of bitmaps. I ended up downloading "pixelated" TTF fonts and rendering them with FontConfig library. I didn't want the sign to have a lot of dependencies, so I pre-rendeded the glyphs on my Pi in a ready-to-use manner with this script, and wrote a small library to operate them. The most fun part is that I could easily add new glyphs, such as rainfall indicators.

We also take a lot of font rendering for granted, such as aligning to center, word wrapping, kerning. I implemented some of them in a very simple manner (in Ruby!) but I feel that a couple more features would make me ragequit in favor of a real font rendering library instead.

Accessing Muni Predictions

Another reason why I chose Ruby was the Muni library in Ruby, ready-to-use for getting predictions from the NextBus system. Or so I thought: the library was missing some functionality, such as refining routes or getting predictions for all routes of a stop (this is what a Muni sign displays). It also pulled in the whole Rail stack only to display nicely formatted time string! (like, convert interval in seconds to something fancy as "over 1 hour").

I fixed some of these things, and bundled the gem with the original source ~~for your convenience~~ because it was easier than forking it and adding it into an official repository.

Specify routeConfig!

Apparently, official Muni monitors operate in the same terms: routeConfig is one of the API commands of NextBus service.

Anyway, NextBus system has a nice API, which is seemingly free to use if the use is limited or distributed among clients. A spawn of this site, nextmuni.com is devoted to SF Muni.

The only thing I didn't find was the actual names of the

Weather Forecast Retrieval

Oh, well, here comes the easy part," I thought when I finished with everything else, and decided to throw in weather forecasts. I was wrong in a very interesting way.

Turns out, weather prediction APIs are usually not free. The websites I found at the top of Google search all required signups and including your personal key to all API calls. Many had free limited versions but required payment for hourly forecast--and I was mainly interested in conditions at a specific place at a specific hour of the day (San Francisco weather isn't very stable). Not that I'm opposed to paid services per se, or to parsing web pages with regexps, but I don't need that much from them to pay, do I?

I proceeded with a free government weather.gov service, which is actually kind of cool despite the web design of the last century. The service is limited to the U.S., but San Francisco Muni predictions are not very useful outside of the country either. To use it, you need to provide a link to XML table for your city/area, which would look like this: http://forecast.weather.gov/MapClick.php?lat=37.77493&lon=-122.41942&FcstType=digitalDWML.

I didn't use their API, because it's a full-blown SOAP, and ain't nobody got time for that.

Raspberry Pi

Surprisingly, I don't have much to say about this. It just works! I didn't have to use it, and I could have used a desktop PC. The best part about Pi is that it works backwards, too: I successfully replaced desktop PC with it without doing anything differently. The device's operating system is simply a variation of Debian Linux, as found on desktops. I didn't have to debug through COM-port nor use specialized software to upload programs: I downloaded compilers and interpreters directly from the internet onto the device!

And now it's neatly powered from a single micro-usb port, and doesn't hike my electric bill despite being always on.

Future work

That's all I have to say for now. I need to add a couple of things, such as Muni Messages that remind users about non-working elevators and fare increases, to make the Muni experience complete. I also need automatic sign power-off when unused. I heard, there are motion detectors for Raspberry Pi, but a simple schedule would be nice, too.

Comments imported from the old website

Pavel Shved on 23 April 2014 commented:

Here's a python version of similar code https://github.com/metral/led_sign/ . Mike Metral created it in order to implement a sports ticker in Python; here's his blog post https://medium.com/p/875c73a5339 .

Multithreaded Consensus Versus Practice

May 20, 2013 1700 word 8 min read

Many programmers complain that most of what they've learned in their universities is completely useless in their daily work. Anecdotal evidence, a survey among some MIPT alumni, suggests that the median fraction of courses actually useful to a former student is about 5%. That's like taking one class per week instead of whole week of studying, so could you simply follow those wise brats who indeed spent the whole week smoking weed and boozing?

I'm sure that you simply should look closer. In this post, I tell how a portion of academic knowledge appeared in my work out of the blue.

The Challenge

Last week, we've been trying to improve concurrency of a multithreaded messaging system. When multiple threads wait for incoming messages with distinct labels, one and only one of them has to do the work of parsing the incoming message and directing it to the thread that actually waits for it. Choosing a designated parser thread was not possible, so the waiting threads should somehow assign the work to one and only one of them.

This is not a hard problem, but we had some minor block when we first approached it. So I suggested to step back and take a look at the theory.

That Sounds Familiar

I recall that making several threads designate one and only one of them as a "worker" is called "consensus problem" (Wikipedia provides) a slightly different but equivalent definition.) I also recalled, from my reading of TAoMP ("The Art of Multiprocessor Programming" by Herlihy and Shavit, link), that it was used to demonstrate limitations of various synchronization primitives, and most of them were incapable of solving it. That's right, in the world of formalized computing, some software problems can be proved computationally infeasible. An by "infeasible" I don't mean making a decent AI in first-person shooters, which is definitely infeasible, that's more like Turing machines and Halting problem. So maybe "mutex" is one of the tools that can't solve Consensus?

Who Can and Can Not Solve Consensus

A concurrent object is wait-free if there exists and upper limit N on time/instruction count it takes every call to complete, such that the same limit remains in effect as the number of thread grows indefinitely. Very few concurrent algorithms satisfy this property, and those who do are considered impractical.

Chapter 5 of the book is devoted to assessing relative strength of synchronization primitive operations, and it uses their capability to provide a wait-free solution to consensus problem as a measure of their strength. It later turns out, in chapter 6, that Consensus problem is universal enough to construct a wrapper that turns any single-threaded object into a thread-safe one. So, which synchronization primitives are universal?

Consensus vs. Registers

Assume you only have “simple” memory cells or registers at your disposal, which can turn either way when multiple threads write to them simultaneously. Given these, you only can solve Consensus in a wait-free manner for no more than one thread. (In other words, they are completely useless).

The complete proof is sketched in the book. It's constructed as follows. Assume Consensus can be solved for two threads. At one point, the memory state must be such that the next operation over one of these registers in some thread will determine the outcome. Then all the cases of these operations are enumerated, and viable executions, which usually end up with one thread “running solo,” are constructed. They lead to a situation where the algorithm should return different results based on the same thread-local data, which can't happen.

In particular, this can't happen if you're trying to solve Consensus for two or more threads with simple registers. It is scientifically thus proven that one needs more sophisticated synchronization primitives than just reads and writes to one memory cell. Which could these be?

Consensus vs. Multi-Register Operations

Round Robin Table

That's a sample round robin table with some competition results. Threads fill a similar data structure concurrently when they solve Consensus with multi-cell atomic writes.

If you can atomically write to N*(N+1)/2 memory cells rather than just one, you can solve Consensus for N threads. The impossibility of solving it for more than N follows the same scheme as outlined in the previous section, but the sample solution is very interesting.

Imagine that threads compete in a round robin tournament; well, not really compete, but simply fill the table with its results. Each game is either a win or a lose, and the score is a number of wins against threads that played at least one game.

When a thread participates in the Consensus, it atomically writes “I lose” in N-1 cells corresponding to its results; it also writes something in the “against itself” cell to distinguish itself from those who did not participate yet. If all threads do that, then, whenever somebody examines the table and computes the score of each thread. the thread who has the highest score will never change. The highest-score thread will always be the same one even if the reads are not atomic, and results change as one is reading the table.

Consensus vs. Complex Data Structures

Section 5.4 of the book demonstrates a proof that synchronized FIFO queues can't solve Consensus for more than two threads. It also claims that the same is true for “many similar data types, such as sets, stacks, double-ended queues, and priority queues,” which I don't have any reason to not believe.

Consensus vs. Atomic Integer Operations

By “atomic integer operations” I mean such functions as swap(&x, v) (set x to v and return the previous value) and compareAndSet(&x, e, u) (set x to u if x == e; do not change otherwise; return whether the value was equal to e) and others. Infrequently used, they are somewhat known to system programmers.

Turns out, these operations are not created equal. For instance, swap can't solve Consensus for more than two threads, while comareAndSet can solve it for infinitely many threads with a very simple algorithm:

Mutex Solution Paradox

The book begins with demonstration how one can implement mutexes using only simple registers. I formulated one of such implementations as a puzzle in my previous post. If we could use mutexes succesfully to solve Consensus, we could have replaced them with register implementations, which is impossible. Does it mean that we can't solve consensus with mutexes for more than one thread? Well, what about this simple algorithm?

More than this, we can implement the all-powerful compareAndSet with a mutex by wrapping the comparison and setting into a mutex' lock-unlock. Yet the book claims it's not possible. How come?

Different Flavors of “Solve”

A concurrent object is starvation-free if every call to its method eventually completes regardless of how threads are scheduled. For instance, an algorithm that hangs if only one thread is (unfairly) scheduled, while others sleep indefinitely for no reason, is not starvation-free per se, but may be such if OS promises not to do this, and be fair. It is weaker than wait-freedom because it doesn't require a global bound on the call length.

The mutex-based consensus algorithm is only correct if mutex has strong enough progress condition, for instance, if it's starvation-free (wait-freedom is not required here; see the last exercise to the Chapter 5 in the book). Mutexes implemented with simple registers are not starvation-free hence they don't completely “solve” Consensus. The theoretical results on impossibility given above only provide proofs for wait-free algorithms, however, it depends on the actual implementation of mutex whether the resultant algorithm will be wait-free or not.

It indeed is a corollary of the results discussed above that you can't make a starvation-free lock with simple registers only. Threads in such locks usually “spin” waiting for the memory to enter a state that signals this particular thread (and only it) to enter the critical section. If, at this point, some other thread “runs solo,” you are stuck forever. If you apply the register "weakness" proof scheme to a register-based mutex, this infinite loop will eliminate the contradiction that same thread-local data will lead to different results: they'll lead to endless spinning instead. Notice how that's not the case for compareAndSet() solution to Consensus presented above.

So the resolution to the paradox described above is that the mutex-based program is a solution only if the underlying mutex is starvation-free--in addition to simply being a correct mutual exclusion algorithm. To implement a starvation-free mutex, you need to have compareAndSet or similarly strong primitive operations, but they may be concealed within the lock or the OS implementation, and don't have to appear in your program explicitly.

Consensus vs. Humans

Consensus and Humans

Here's how choices of individual subjects changed over time during the experiment on how humans solve Consesus. Read more here.

Another interesting example of a computational system that can's solve consensus problem is a social network of human beings. A group of researchers conducted an experiment where several people were given material incentives to reach a specific consensus (an equivalent of threads proposing their own value), and were set out to reach one. They could not communicate in any way except for announcing the current value each person agrees to to their neighbors, i.e. they didn't have a designated leader or a common strategy.

The experiments demonstrated that people can't reach consensus in this settings. An interesting finding, though, was that the subjects quickly created a "bipartisan" system with only two colors, and started to juggle them around. Sometimes, new colors appeared as an attempt to unite under a different color, but these attempts quickly died off. Doesn't it remind of anything?..

Unsurprising Conclusion

So, it turns out, the theory didn't help us much with the original task. It helped us understand that thread nomination is a separate, studied problem that can be solved given the right tools. However, our initial choice of tools, the "mutex" was correct, and the knowledge of theory couldn't improve our initial choice. I'm glad it made me read "The Art Of Multiprocessor Programming" book more thoroughly than I otherwise would, and I do not consider it a waste of time. Multicore computing is getting more ubiquitous, and knowing the theory won't hurt.