I noticed a pattern in some of my web software architecture failures, and I also notice similar pattern in others' designs. In many cases, people simply misuse memcache, hurting performance of their web applications.

If performance is your goal, you can’t pretend that your computations and rendering is fast, and add transparent caching where it later appears to be slow. Transparent caching is a fairy tale programmers have been telling one another, and we’ve also been writing tools and building hardware to remain in for decades. Turns out, it doesn’t scale. A high-performance web application is designed around serving static content that--sometimes--get recomputed

1. under very specific conditions;
2. only partially;
3. after very specific events;
4. most likely, by a component separate from the one serving the content.

Where would you keep such a static content derived from the raw data (I will later call it ”derived content”)? Memory caches are often unfit for the job of storing derived data: your items may be removed from cache ("evicted") if you don't have enough memory, or if your another web application or component is working within the same cache. And evictions are much more severe than you might think.

Assume it takes your webapp several seconds to recompute the item you want to serve even if it hasn’t changed, and all requests that come during this period are going to have at least these several seconds of latency. In modern multi-component web-services high latency for a small fraction of requests makes much more harm than it seems, as corroborated by this overview paper by Googlers, so you might want to avoid it. Not to mention the simple reasoning that if this eviction happens under heavy load, the stack of HTTP requests waiting for the data to be computed might kill you.

Is there a problem with caching data altogether? Yes, and no. The idea of memoization--storing results of complex computation for later reuse--is perfectly fine. The problem with caching as I see it is that many now view caching exclusively as storing data in unreliable but fast-to-access caches (think memcached). This is not how caches in web applications have to work despite that’s how they usually do.

### Caching Primer

There is another storage you can use for precomputed results, the Database (SQL, or NoSQL, or NewSQL) where you probably store the rest of your data.

Assume you are presenting users with a set of analytics over the data they entered yesterday; say some breakdown of yesterday’s sales by zip code. Say, it takes 20 seconds to compute all the data. Where would you store it?

What you could do is to cache the result of the first request, spending 20 seconds to serve it. You can be smarter, and add a cron job that hits that page every midnight. But if your data gets evicted, your next user will suffer these 20 seconds.

Instead, store the data in your persistent storage (database), the storage query being triggered by the same daily cron job. When the user requests the analytics, just serve it from there; it will never delay by 20 seconds since the data aren’t going anywhere unless you explicitly delete them, which you’ll be wise not to do needlessly. It will take you one database query to retrieve the results, and you can also cache this query in memcache for even faster results. No 20-second latency anymore, just the latency of one database select of a single row. Neat, right? Here’s a lame picture to illustrate this that shows that recomputing-on-write is more efficient as writes are less frequent usually:

### Why People Use Memcache Then

So if the world of using permanent storage for caches is all ponies and rainbows, why isn’t it used everywhere? The reasons are performance tradeoff, abusing unreliability of caches, and simple storage management.

If your computation is as fast as querying storage, there’s no sense to use persistent storage as cache. In rare cases when you have insufficient latency, but overprovisioned on CPU throughput persistent caching will improve your app, but most likely not. Also, if you think of updating cache often in background as triggered by user requests, you need a fast task offload solution that avoids write contention (sich as App Engine’s Task Queues).

#### Abusing Cache Unreliability for Eventual Consistency

Cache expiration algorithms make the state of the cache eventually correct with respect your changing data. To build a correct permanent storage cache, you must make sure that the data in this cache are explicitly updated or deleted if unused. Unreliable cache will recycle stale data for you: if a key is never going to be accessed, it will be simply evicted in favor of more recent keys without occupying extra storage. Impersistence of memcache is an asset rather than liability in such cases. One of my previous posts, about sharded counter, contains an example of this.

The problem with efficient record recycling is that you can’t update record’s access time in the persistent database every time you access it without significant performance penalty (see the previous section on fast task offloading). On the other hand, you implicitly do this on every access with specialized in-memory caches. In some cases, a background recurring thread that performs DELETE FROM cache WHERE created_at > NOW() - 600 will take care of this, but this isn’t always possible.

#### Automatic Storage Management

A memcache can successfully and automatically operate when you don’t have enough memory to store all the results. Its efficiency doesn’t roll downhill with the pace increase because more frequently accessed entries stay hot, while performance suffers only on less “hot” entries hence rarely. Finding space for storage of one extra column per database row is rarely a problem today, though.

### Narrowing Down the Scope

Given the above limitations, storing intermediate results in database makes perfect sense when the cost of computing these results is greater than the cost of database query, and when the data are tied to other database records. Say, storing a pre-rendered version of a blog post or a wiki page that uses a custom markup (especially if it’s badly implemented like the one in this blog) makes sense: you simply update it on write, and remove if the post is deleted. Storing a result of 10-minutes analytics query in a table rather than in a cache also makes sense (and some still rely on memcache to hold such data). Storing a join by key of two SQL tables is probably not worth it.

You need to design your application so that it is capable of making derived data updates after writes. Relying on caching will hurt you in the long run, because your application will be improperly architectured.

#### "SF Muni Sign at Home" series

Several post on how I was building a San Francisco Muni sign that shows train arrival times at home.

Running like this

My android phone wakes me up with its stock alarm clock into a cold San Francisco summer morning. I lie around a bit, but eventually get up to enjoy yet another day. An hour later, I'm running in my dress shoes, tie waving in the wind like that of an anime superhero schoolboy, towards a light rail train stop. Twenty seconds before I get there, the train leaves, right before an unanticipated 20-minute break in the service. I eat a bagel I don't want in a cafe nearby to sit there and work while waiting; my day, having barely started, is already ruined.

Could I have made it? Of course, I could. I could have not re-tied my tie in order to drape it half inch shorter; I could have not procrastinated on reddit for five minutes while sipping tea, I could have postponed paying my bill to the evening. On the other hand, I don't want to finish too early and wait at home instead of in a cafe. I need something to let me know, at all times, when the next train is.

### Observations

San Francisco "Muni" public transportation vehicles, like those of many other modern big cities, are equipped with real-time GPS trackers, and publish location data on the internet. My smartphone has QuickMuni app installed, so I could easily be checking it. I would pick up my phone, unlock it, wait for all the lags Adnroid greets me with, wait while the app downloads the upcoming train times, and repeat this every five minutes. This would be implausible. If only I had something like a wall clock so that it constantly displays upcoming train time predictions...

One morning I located the item I wanted in the wild. Actually, I've been seeing it every day, but was not realizing that it would solve my problem.

That is it! I can mount a similar sign on my wall, drive it with something like Raspberry Pi or even my desktop PC, and write some software to download, process, and display the train arrival times! I can even teach it to show weather forecast to quickly see if I need to take an umbrella.

### The Result

The sign I used is probably not the only one with Perl API: here's something else on CPAN that is probably about these guys, but I didn't try them so I can't vouch that they work.

### Source Code

As usual, I'll begin with the source code, which is on github. Here's a local mirror, just in case The source code of a patched Next Muni gem is here.

#### How to Run

There's a small documentation in the README file, and sample command lines are in the daemon startup script for this spec. Basically, you select a stop on nextmuni.com to find out your stop ID, and run client/client.rb --stopId 12345. Alternatively, you can run a morning dashboard by specifying up to two routes and stops in text, and add an URL to retrieve weather from weather.gov:

### PI configuration notes

During the first run, connect mouse, keyboard, and Wi-Fi card through the USB-hub. Follow up the Getting Started guide to download initial image onto SD card. Boot it up, install the default Raspbian Debian-based distro.

Do select "automatically boot desktop environment on startup" when asked. While you won't need it, it's the easiest way to configure Wi-Fi. Once at the desktop, select "Wi-Fi configuration" under one of the menus accessed via Start button, and configure the wi-fi card you inserted to connect to the router. Make sure you also configure it to start up wi-fi automatically at boot.

Configure SSH server, and make sure it starts up at boot. Either configure hostname, or write down IP address. Reboot and try to SSH from another machine. If this works, disconnect mouse, keyboard, and HDMI: you don't need them anymore.

Make sure git is installed via sudo apt-get install git, download the sources, and follow the README instructions to install language runtimes. Play with script's command line options to figure out what you want to run, and configure a daemon by installing the initscript. Once confident enough in resilience of the setup, mount this onto your wall.

### Results

So the solution helped: I stopped running like a Japanese schoolboy, and started to control my morning procrastination, too. As a bonus, I had a ballpark estimation of how much it should cost taxpayers to program this sign, and, of course, I had a lot of fun, too. Finally, I now have a geeky item in my living room... although it looks to others as a stream of stock prices rather than of Muni arrival times.

### Notes

The rest are less "hot" sections, detailing my experience with the tools I used for the job.

#### Ruby

Raspberry Pi's default Linux, as a Debian derivative, contain decent support for Ruby, including 1.9. which is a bit old already, though. It takes Pi several seconds to load the interpreter and a dozen of gems I needed for this sign, but it's OK since the program only starts up once when the system is powered up.

#### LED Sign

So the sign I used provided a Perl API, so it would be natural to write the client program in Perl, too... not really. While I have a lot of experience programming in Perl, I would much rather prefer writing in a more interesting language, such as Python or Ruby. That was one of the reasons why I wrote a small wrapper over the Perl API, so that I could exec this simple program from any other language.

Another reason why I wrote this API was that the sign does not support two-line text in its built-in text renderer. The API allows you to supply a line of text, and the sign will happily render it, and even add some effects, but I wanted two lines in a smaller font instead. API doesn't have a "render image" function, but the sign allows you to create your own font glyphs, and insert them into text. The obvious idea of printing one 16x96 "symbol" worked. The wrapper does exactly this.

The sign can also loop through several images--just like the real Muni sign. API makes this very easy to program: this happens automatically when you send several messages. Integration with Pi went smoothly: just use sign->send(device => "/dev/ttyUSB0"); in your program. There were several glitches, though. First, each glyph is automatically centered, unless it's greater than about 80 pixels wide, in which case it's aligned to the left. Looping through differently aligned messages looks badly, so I force all images I send to the sign as 16x96, padding them with black pixels. Second, there is no way to power off the sign programmatically. To power it off, I need to physically remove the USB connection, and turn it off because it starts using the battery otherwise. The sign manufacturer's intent probably was that you don't re-program the sign often, so you would program it once and then turn on and off manually. But we're programmers too; isn't working around things our bread and butter? Displaying an all-black picture did the trick of turning the sign "off". Third, I refresh the sign each 30 seconds to ensure timely prediction, but it takes 1-2 seconds to update the sign. I suppose this is because the sign contains its own CPU and programming, or because I didn't try to play with send function arguments. Although its internal plumbing probably affects the update speed, and I saw examples of offloading all the synchronization and flashing of the lamps to Raspberry Pi completely with different, more low-level signs, that would be too time-consuming for me to bother with. I also don't know how long the sign will last when it's re-programmed twice a minute. So far, I'm quite happy with the Bright LED sign I used and its API. #### Font Rendering I expected to spend zero time on font rendering, but it took much, much longer than I expected. Well, I was planning to write my own simple font rendering engine, but several hidden rocks were waiting for me beneath the thin sand. First, I couldn't find a "raster" font that is just a set of bitmaps. I ended up downloading "pixelated" TTF fonts and rendering them with FontConfig library. I didn't want the sign to have a lot of dependencies, so I pre-rendeded the glyphs on my Pi in a ready-to-use manner with this script, and wrote a small library to operate them. The most fun part is that I could easily add new glyphs, such as rainfall indicators. We also take a lot of font rendering for granted, such as aligning to center, word wrapping, kerning. I implemented some of them in a very simple manner (in Ruby!) but I feel that a couple more features would make me ragequit in favor of a real font rendering library instead. #### Accessing Muni Predictions Another reason why I chose Ruby was the Muni library in Ruby, ready-to-use for getting predictions from the NextBus system. Or so I thought: the library was missing some functionality, such as refining routes or getting predictions for all routes of a stop (this is what a Muni sign displays). It also pulled in the whole Rail stack only to display nicely formatted time string! (like, convert interval in seconds to something fancy as "over 1 hour"). I fixed some of these things, and bundled the gem with the original source for your convenience because it was easier than forking it and adding it into an official repository. #### Specify routeConfig! Apparently, official Muni monitors operate in the same terms: routeConfig is one of the API commands of NextBus service. Anyway, NextBus system has a nice API, which is seemingly free to use if the use is limited or distributed among clients. A spawn of this site, nextmuni.com is devoted to SF Muni. The only thing I didn't find was the actual names of the #### Weather Forecast Retrieval Oh, well, here comes the easy part," I thought when I finished with everything else, and decided to throw in weather forecasts. I was wrong in a very interesting way. Turns out, weather prediction APIs are usually not free. The websites I found at the top of Google search all required signups and including your personal key to all API calls. Many had free limited versions but required payment for hourly forecast--and I was mainly interested in conditions at a specific place at a specific hour of the day (San Francisco weather isn't very stable). Not that I'm opposed to paid services per se, or to parsing web pages with regexps, but I don't need that much from them to pay, do I? I proceeded with a free government weather.gov service, which is actually kind of cool despite the web design of the last century. The service is limited to the U.S., but San Francisco Muni predictions are not very useful outside of the country either. To use it, you need to provide a link to XML table for your city/area, which would look like this: http://forecast.weather.gov/MapClick.php?lat=37.77493&lon=-122.41942&FcstType=digitalDWML. I didn't use their API, because it's a full-blown SOAP, and ain't nobody got time for that. #### Raspberry Pi Surprisingly, I don't have much to say about this. It just works! I didn't have to use it, and I could have used a desktop PC. The best part about Pi is that it works backwards, too: I successfully replaced desktop PC with it without doing anything differently. The device's operating system is simply a variation of Debian Linux, as found on desktops. I didn't have to debug through COM-port nor use specialized software to upload programs: I downloaded compilers and interpreters directly from the internet onto the device! And now it's neatly powered from a single micro-usb port, and doesn't hike my electric bill despite being always on. #### Future work That's all I have to say for now. I need to add a couple of things, such as Muni Messages that remind users about non-working elevators and fare increases, to make the Muni experience complete. I also need automatic sign power-off when unused. I heard, there are motion detectors for Raspberry Pi, but a simple schedule would be nice, too. # Comments imported from the old website Pavel Shved on 23 April 2014 commented: Here's a python version of similar code https://github.com/metral/led_sign/ . Mike Metral created it in order to implement a sports ticker in Python; here's his blog post https://medium.com/p/875c73a5339 . Many programmers complain that most of what they've learned in their universities is completely useless in their daily work. Anecdotal evidence, a survey among some MIPT alumni, suggests that the median fraction of courses actually useful to a former student is about 5%. That's like taking one class per week instead of whole week of studying, so could you simply follow those wise brats who indeed spent the whole week smoking weed and boozing? I'm sure that you simply should look closer. In this post, I tell how a portion of academic knowledge appeared in my work out of the blue. ### The Challenge Last week, we've been trying to improve concurrency of a multithreaded messaging system. When multiple threads wait for incoming messages with distinct labels, one and only one of them has to do the work of parsing the incoming message and directing it to the thread that actually waits for it. Choosing a designated parser thread was not possible, so the waiting threads should somehow assign the work to one and only one of them. This is not a hard problem, but we had some minor block when we first approached it. So I suggested to step back and take a look at the theory. ### That Sounds Familiar I recall that making several threads designate one and only one of them as a "worker" is called "consensus problem" (Wikipedia provides) a slightly different but equivalent definition.) I also recalled, from my reading of TAoMP ("The Art of Multiprocessor Programming" by Herlihy and Shavit, link), that it was used to demonstrate limitations of various synchronization primitives, and most of them were incapable of solving it. That's right, in the world of formalized computing, some software problems can be proved computationally infeasible. An by "infeasible" I don't mean making a decent AI in first-person shooters, which is definitely infeasible, that's more like Turing machines and Halting problem. So maybe "mutex" is one of the tools that can't solve Consensus? ### Who Can and Can Not Solve Consensus A concurrent object is wait-free if there exists and upper limit N on time/instruction count it takes every call to complete, such that the same limit remains in effect as the number of thread grows indefinitely. Very few concurrent algorithms satisfy this property, and those who do are considered impractical. Chapter 5 of the book is devoted to assessing relative strength of synchronization primitive operations, and it uses their capability to provide a wait-free solution to consensus problem as a measure of their strength. It later turns out, in chapter 6, that Consensus problem is universal enough to construct a wrapper that turns any single-threaded object into a thread-safe one. So, which synchronization primitives are universal? #### Consensus vs. Registers Assume you only have “simple” memory cells or registers at your disposal, which can turn either way when multiple threads write to them simultaneously. Given these, you only can solve Consensus in a wait-free manner for no more than one thread. (In other words, they are completely useless). The complete proof is sketched in the book. It's constructed as follows. Assume Consensus can be solved for two threads. At one point, the memory state must be such that the next operation over one of these registers in some thread will determine the outcome. Then all the cases of these operations are enumerated, and viable executions, which usually end up with one thread “running solo,” are constructed. They lead to a situation where the algorithm should return different results based on the same thread-local data, which can't happen. In particular, this can't happen if you're trying to solve Consensus for two or more threads with simple registers. It is scientifically thus proven that one needs more sophisticated synchronization primitives than just reads and writes to one memory cell. Which could these be? #### Consensus vs. Multi-Register Operations #### Round Robin Table That's a sample round robin table with some competition results. Threads fill a similar data structure concurrently when they solve Consensus with multi-cell atomic writes. If you can atomically write to N*(N+1)/2 memory cells rather than just one, you can solve Consensus for N threads. The impossibility of solving it for more than N follows the same scheme as outlined in the previous section, but the sample solution is very interesting. Imagine that threads compete in a round robin tournament; well, not really compete, but simply fill the table with its results. Each game is either a win or a lose, and the score is a number of wins against threads that played at least one game. When a thread participates in the Consensus, it atomically writes “I lose” in N-1 cells corresponding to its results; it also writes something in the “against itself” cell to distinguish itself from those who did not participate yet. If all threads do that, then, whenever somebody examines the table and computes the score of each thread. the thread who has the highest score will never change. The highest-score thread will always be the same one even if the reads are not atomic, and results change as one is reading the table. #### Consensus vs. Complex Data Structures Section 5.4 of the book demonstrates a proof that synchronized FIFO queues can't solve Consensus for more than two threads. It also claims that the same is true for “many similar data types, such as sets, stacks, double-ended queues, and priority queues,” which I don't have any reason to not believe. #### Consensus vs. Atomic Integer Operations By “atomic integer operations” I mean such functions as swap(&x, v) (set x to v and return the previous value) and compareAndSet(&x, e, u) (set x to u if x == e; do not change otherwise; return whether the value was equal to e) and others. Infrequently used, they are somewhat known to system programmers. Turns out, these operations are not created equal. For instance, swap can't solve Consensus for more than two threads, while comareAndSet can solve it for infinitely many threads with a very simple algorithm: ### Mutex Solution Paradox The book begins with demonstration how one can implement mutexes using only simple registers. I formulated one of such implementations as a puzzle in my previous post. If we could use mutexes succesfully to solve Consensus, we could have replaced them with register implementations, which is impossible. Does it mean that we can't solve consensus with mutexes for more than one thread? Well, what about this simple algorithm? More than this, we can implement the all-powerful compareAndSet with a mutex by wrapping the comparison and setting into a mutex' lock-unlock. Yet the book claims it's not possible. How come? ### Different Flavors of “Solve” A concurrent object is starvation-free if every call to its method eventually completes regardless of how threads are scheduled. For instance, an algorithm that hangs if only one thread is (unfairly) scheduled, while others sleep indefinitely for no reason, is not starvation-free per se, but may be such if OS promises not to do this, and be fair. It is weaker than wait-freedom because it doesn't require a global bound on the call length. The mutex-based consensus algorithm is only correct if mutex has strong enough progress condition, for instance, if it's starvation-free (wait-freedom is not required here; see the last exercise to the Chapter 5 in the book). Mutexes implemented with simple registers are not starvation-free hence they don't completely “solve” Consensus. The theoretical results on impossibility given above only provide proofs for wait-free algorithms, however, it depends on the actual implementation of mutex whether the resultant algorithm will be wait-free or not. It indeed is a corollary of the results discussed above that you can't make a starvation-free lock with simple registers only. Threads in such locks usually “spin” waiting for the memory to enter a state that signals this particular thread (and only it) to enter the critical section. If, at this point, some other thread “runs solo,” you are stuck forever. If you apply the register "weakness" proof scheme to a register-based mutex, this infinite loop will eliminate the contradiction that same thread-local data will lead to different results: they'll lead to endless spinning instead. Notice how that's not the case for compareAndSet() solution to Consensus presented above. So the resolution to the paradox described above is that the mutex-based program is a solution only if the underlying mutex is starvation-free--in addition to simply being a correct mutual exclusion algorithm. To implement a starvation-free mutex, you need to have compareAndSet or similarly strong primitive operations, but they may be concealed within the lock or the OS implementation, and don't have to appear in your program explicitly. ### Consensus vs. Humans #### Consensus and Humans Here's how choices of individual subjects changed over time during the experiment on how humans solve Consesus. Read more here. Another interesting example of a computational system that can's solve consensus problem is a social network of human beings. A group of researchers conducted an experiment where several people were given material incentives to reach a specific consensus (an equivalent of threads proposing their own value), and were set out to reach one. They could not communicate in any way except for announcing the current value each person agrees to to their neighbors, i.e. they didn't have a designated leader or a common strategy. The experiments demonstrated that people can't reach consensus in this settings. An interesting finding, though, was that the subjects quickly created a "bipartisan" system with only two colors, and started to juggle them around. Sometimes, new colors appeared as an attempt to unite under a different color, but these attempts quickly died off. Doesn't it remind of anything?.. ### Unsurprising Conclusion So, it turns out, the theory didn't help us much with the original task. It helped us understand that thread nomination is a separate, studied problem that can be solved given the right tools. However, our initial choice of tools, the "mutex" was correct, and the knowledge of theory couldn't improve our initial choice. I'm glad it made me read "The Art Of Multiprocessor Programming" book more thoroughly than I otherwise would, and I do not consider it a waste of time. Multicore computing is getting more ubiquitous, and knowing the theory won't hurt. I wrote earlier that I know very few decent programming puzzles. To be more precise, I know one, The Prisoners Riddle. So I was reading that The Art of Multiprocessor Programming book (I already made a post influenced by what I read there, and I'll soon make another one--stay tuned). Some very interesting things the book tells about can be formulated as puzzles. Here's one. Two threads need a mutex object with Lock and Unlock methods, but they only have three usual shared boolean variables. How to implement such a mutex? Every thread can call GetThreadId() function which returns 0 for one thread and 1 to the other, and the boolean variables are automatically initialized to false before threads execute. If you solved it, congratulations; you have just reinvented this wheel. Since I know the solution, I can't really understand if it's a good riddle or not. It's definitely too hard for a job interview, but still, it can be a good one. So I need your comments. What do you think? Could you solve it? As you already know, binary search is an O(log N) way to search if a particular value exists in a sorted array. There are some reports on how "most of our implementation" of binary searches are broken. The mistake discussed in that post is the overflow when calculating the index of the middle element. But regardless of whether you get this bit right, there's a different problem with all binary search implementations I've seen. They're just too damn complicated and hard to get right out of your memory. Which is exactly what you want, say, at a job interview. ## Recap Let's recap what binary search algorithm is, at a high level. We are given a sorted array, and a value we're looking for. Is the value in the array? Here's the basic recursive algorithm: 1. Check if the middle element is the value? Yes--good, found it. 2. Is it smaller than what we're looking for? Yes--repeat the search in the array between the middle and the end of the original array. 3. So it must be larger then--repeat the search in the array between the beginning and the middle of the original array. If you want a more detailed basic explanation, see other articles, e.g. this one. However, be mindful that many "simple" implementations make mistakes at the corner cases or present algorithms that require more special cases than needed. They might work for you but for me, they seem harder to remember. For example, the algorithm linked above has three if-else branches whereas we can do away with two. Read on how. ## Key points Almost always, if you're relying on your memory, you should prefer simple algorithms to complex, like you should implement a treap instead of an AVL tree. So here's a sequence of steps that helps me to write binary search correctly. 1. Simplicity: solve "bisection" rather than "binary search" 2. Consistency: use consistent definition of search bounds 3. Overflows: avoid overflows for integer operations 4. Termination: ensure termination Let's explore them one by one. ### 1. Simplicity: Solve "Bisection" rather than "Binary Search" From now on, let's redefine the problem. We aren't searching for anything anymore. "Searching" for an element in array is a bit hard to define. Say, what should "searching" return if the element doesn't exist, or there are several equal elements? That's a bit too confusing. Instead, let's bisect an array, pose a problem that always have a single solution no matter what. Bisection is defined as follows: given a boolean function f, find the smallest k such that, for all i < k, f(a[i]) == false, and for all j >= k, f(a[j] == true). In other words, we are to find the smallest k such that for all i within array bounds f(a[i]) == (i<k). If the array is sorted in such a way that f(a[i]) is monotonic, being first false and then true, this problem always has a single solution. In some cases, when f(a[i]) is false for all array elements, the resultant k is greater by one than the last index of the array, but this is still not a special case because this is exactly the answer as defined by bold text above. This is more powerful than the binary "search", which can be reduced to "bisection" as follows. In order to find if an element x exists in the array a of length N, we can do the following: Now let's proceed with solving the bisection, now that we've defined it. ### 2. Consistency: use consistent definition of search bounds Let's consider the array zero-indexed with length N, so the indexes range from 0 to N. The answer to our bisection problem is between 0 and N inclusively. Yes, N would be outside of the array bounds, but the answer to bisection can indeed be outside of the bounds if the entire array is smaller than the element we're searching for!) We start by assigning i to 0, and j to N. No special cases allowed: we can do without them at all here. So at each step, we will check the value at index k, where i <= k < j. Inspecting this value will allow to reduce the range size. ### 3. Overflows: Avoid Overflows for Integer Operations We literally have 1 integer operation here: given the bounds, compute the pivot point. i.e. the point at the middle of the array we'll apply f to to make a step. How hard can that be to compute the middle of an interval right? Turns out, quite hard. This problem is discussed in the article I linked above more. If we lived in the world of unbounded integers, it would simply be (i+j)/2, but we live in the world of integer overflows. If i and j are positive integers, which makes sense for array indexes in C++, the never-overflowing expression would be: Another useful property of this expression (in C++, at least) is that it's never equal to j if i < j. Here's why it's important. ### 4. Termination: ensure termination A basic way to make sure that your program terminates some day is to construct such a "loop variant" that strictly decreases with each loop iteration, and which can't decrease forever. For our bisection, the natural variant is the length of the part of the original array we currently look for the answer in, which is max(j-i, 0). It can't decrease forever because its value would reach zero at some point, from which there's nowhere to proceed. We only need to prove that this variant decreases each iteration by at least one. Fun fact: special programs can prove that a program terminates automatically, without human intervention! One of these programs is named... Terminator. You can read more about this in a very accessible article "Proving Program Termination" by Byron Cook et al. I first learned about it in a summer school where Mr. Cook himself gave a talk about this. First step is to make sure that the time it reaches zero would be the last iteration of the loop (otherwise it wouldn't decrease!), so we use while (i < j). Now assume that we've understood that the pivot point s does not satisfy the target function f. Our key difference with most binary search algorithms is that we should assign the new i to s+1 rather than s. Given our definition of the problem, s is never the answer in this case, and we should keep our candidate range as small as possible. If f(a[s]) is true, then s could still be the answer, so it does not apply here. Here's our iteration step now: We mentioned in the previous section that i <= s < j, hence i < s + 1 and s < j, so each of assignments decreases j-i by at least 1 regardless of f's value. This proves that this loop never hangs up forever. ### The complete program So here's our bisection that works for all sorted arrays (in the sense of f(a[i]) being first false then true as i increases). The search itself in terms of finding if a value exists in a sorted array, would be then written (I'm just repeating what I wrote above, no surprises here): ### Proof By Authority And GNU C++ STL implementation of binary_search follows the same guidelines I posted here. In my version of GCC 4.3.4 binary search looks like this: Where lower_bound is the bisection function. ### Conclusion Here's my version of bisection and binary search I usually use when I need one. It is designed to be simple, and simple things are easier to get right. Just make sure your loop variant decreases, you do not overflow, and you get rid of as much special cases as possible. # Comments imported from the old website Pavel Shved on 13 July 2013 commented: Many thanks to Martin Ward who pointed to inaccuracies in termination proof description. The binary search is a little bit more complicated than usually presented. An exercise that can help a lot in writing a correct version of binary search is given here http://arxiv.org/abs/1402.4843 I highly recommend looking at it. Without it, it looks like guessing game. If we exclude recursive solution, there are no more than 4-5 correct implementations of the algorithm. All solutions including the one you gave can be derived from the table given in exercises. (Actually I am collecting implementations and showing how to do that.) Your example is optimized so it is not a good starting point. j in your code is keeping the index which is one after the last element in the selected subarray. You need to explain this explicitely. GNU code is optimized for several reasons, one of them is speeding up on assembly level. I am saying this is an advanced level what you are writing about. Nobody can use it as a starting point. It is hard to come up with a google-friendly name for the ||= construct you see in some programming languages quite often, "pipe-pipe-equals" being the is the closest (other names include "double-pipe equals," "or-equal," or "double-or equals"). Why should we name this monster in the first place? While cryptic at first sight, it is a very convenient shorthand that, I believe, allows us to write cleaner code. Let's see how it works on this example: Essentially this ||= construct is equivalent to other convenience operators, such as +=. x += 2; means x = x + 2. The "or" operator || usually denotes a weak logical "or" that doesn't try to compute the left-hand side if the right-hand side is true. Usually, meaningful values evaluate to true, while the only evaluating to false are NULL pointers/empty values, zeroes, boolean Falses, and, sometimes, empty strings. Alternatively, you could set a default hash value, but ||= is used more widely, and also takes exactly one line... doesn't it? ;-) The statement is used mostly to set a variable to a "default" value if it's "unset" at some point. This gives the most visible advantage when you initialized dictionary elements in a loop. Here's a piece that saves counts of each array element into a dictionary (Ruby): In some languages (Perl, Javascript), you could not even bother with this, as += 1 on an unset value would result in its assignment to 1. If you don't initialize output[x], you'll get a runtime error for trying to increment NULL. The advantage of ||= against other ways is that you don't repeat anything. You could have written the same piece as Oh yes... we forgot about the most natural use of ||=, for booleans. Here's how we'd check if an array contains zeroes if we need to iterate over its elements for something else. Pipe-pipe-equals is rarely used this "natural" way, though. But here you have to type output[x] twice and add two more lines of code, which is a complete waste of screen space and, possibly, computing resources if the interpreter doesn't optimize the duplicated computation out. Let's take a look how pipe-pipe-equals works in different languages (we've already seen it in action in Ruby). ### Perl Perl was the first interpreted language I learned, and the first place I saw the pipe-pipe-equals operator in. It works as expected, but is used less often than the direct version of the weak logical "or" to specify default parameters of a function. Here's how you'd put a default to a hash bucket: This works in a strict mode, too. Note that, in Perl, many things evaluate to false, including empty strings (I once even tried to emulate it in Ruby). To restrict this action to undefs only, use //= in Perl6 instead, which sinks to its right like the Pisa tower, and look as if you're trying to put a division while drunk. ### Python Python has no equivalent of this operator. You have to type the very[long(expression)] twice, and it will be computed twice. You have several different ways to do this: The engineers more experienced in Python programming, though, assure me that this is not a big deal since you have a different way of supplying default arguments for a function, which seemingly covers half of the use cases for pipe-pipe-equals (this doesn't prevent Ruby from having both, though). Another half is covered by dictionary's dict.get(key, []) method, so that the code piece #1 can be written in a succinct manner. But I still miss it. ### Bash (Linux shell) Bash? Its language is so simplistic, and it looks creepy; how come it would have an equivalent of such a beautiful shortcut? Here it is: This assigns y to variable if the former is unset or null (as per Bash manual). I mostly used it to set default parameters for environment variables user may or may not set before invoking the script. ### C++ While |= is a syntactically correct expression (C++ does not have the "short-circuit" version of this expression), it doesn't do what we discussed here. C++ is statically typed, so the result of the standard logical "or" is boolean. Retaining the nice semantics we find in a dynamically typed language would require it to be "either the type of the left-hand side or the type of the right-hand side". This is hard to pull in a pass-by-value statically typed language. Pass-by-value semantics also means that not everything can be assigned a NULL, and not everything can be converted to boolean value. C++ has default arguments as well as Python, so the same reasoning could apply here. You'll have to be more verbose in C++. That's probably why only |= expression is available, which is only useful if its left-hand-side is bool (see sidebar above for similar usage.) ### OCaml Everything said above about C++ applies to OCaml as well. Moreover, OCaml, as a functional language, doesn't have a flawless support for mutation, and pipe-pipe-equals statement its inherently mutational. However, its matching operator would require us to use the very_long_variable twice. However, OCaml and other functional languages have a very interesting construct called "option". If something has "X option" type, it may contain either "nothing" or a value of x. Then, this value may be "unpacked" trough pattern matching: let very_long_variable = match very_long_variable with None -> y | Some t -> t here, t is not an shorthand for another very long expression; instead, it's just an identifier, written as is. The match...with allows us to "unpack" values of structured (algebraic) types with shorthands like this. Since this was too long, OCaml has made this a library function Option#default: let very_long_variable = Option.default very_long_variable y Anyway, OCaml programs are even more explicit than those in C++ and Python, so trying to tie pipe-pipe-equals into them is quite pointless. ### Ruby "Erm, we saw how it is in Ruby at the beginning," you might think. Well, I lied to you a bit. The thing is that, in Ruby, it is not strictly equivalent to an assignment to a result of logical "or". Which do you think x ||= y is equivalent to? In Ruby, and only for ||= and &&=, it's the second. If you assign to something other than a local variables, what looks like an assignment is actually a method call (think properties,) and, if so, this assignment does not happen at all if the left-hand side of ||= is false. Which makes sense, but looks like a special case. Read more here. ### Why You Might not Need This Some argue that this operator is mostly useless, especially if their favourite language doesn't have it. Here are some arguments they list. #### Redundancy Indeed, this expression is redundant. You can do the same in a multiple different ways, All the examples I demonstrated above showed how to write essentially a simple if statement in a very short-hand form. The number of characters spared is probably not worth it to include support for this feature to a language designer's must-have checklist. The statement discussed decreases the redundancy in code in return of broader language definition; each language seeks a balance between these, and often leaves the pipe-pipe-equals aside. #### Default Function Arguments The ability to specify default function argument (C++, Python, Ruby, but not Bash or Perl) covers many use-cases for ||=. However, this doesn't help fancy loops that fill complex structures, one of which we showed above. Nor helps it when you have to specify the default parameter anyway but the concrete value is not known at the time of coding, and is an optional field user may or may not fill. #### Confusion It is confusing. The pipe-pipe-equals requires explanations how it works. It becomes a special snowflake, different from its "mathematical" counterparts +=, if a complex language wants to make its very useful (see Ruby section above). While "confusion" as an excuse of not doing something is my pet peeve (and I tried to explain why), it indeed requires some effort to understand the mechanics. I'm sure that you'll love it once you understand it, and the way I learned about this operator is not by reading a textbook or "The Most Confusing and Obscure Programming Language Features Possible" newsletter, but by reading someone's code. ### To Use or Not To Use? When it comes to deciding whether or not to use pipe-pipe-equals in the "creepy" way we discussed throughout the post, the criticism somehow fades out. If the language supports this, it will be used. I haven't encountered any coding style document that bans this feature. In Ruby and Perl, it is considered as good of an idiom as many. So the answer on the question of whether or not you sohuld use pipe-pipe-equals (||=) in your programs is, definitely, "yes". If I was to explain Zeno's paradoxes to a layperson, I'd start with the Zeno's Flat Screen TV Paradox. It is very practical: most have firsthand experience in real-life application of this deeply philosophical problem each time we shop for something. That's just too much choice! (pic taken here). Thankfully, ancient Greek philosophers are to the rescue. Imagine you come to a store, and see two TVs: a cheaper 30 inch one for500, and a more expensive 50 inch one for $1000; otherwise the TVs are identical. Which one would you buy? Zeno's Flat Screen TV paradox states that the only rational choice would be to buy a 50'' TV. Here's the proof. Imagine you buy a 30'' TV instead. You bring it home, unpack it, start watching your favorite show, and a salesman knocks on your door. "Hey, I see your TV is only 30'' long," he says. "Would you like to upgrade to a 31'' TV for just 25 bucks?" Well, 25 bucks isn't that much. It's like a half tank of gas, two cinema tickets, or a cab ride across the city. For that price, you permanently get one inch of screen diagonal! Only a fool wouldn't agree. You seal the deal, the salesman takes your old TV, and installs a new one, and you enjoy your 31 inch LED flat screen. A moment later, a salesman knocks onto your door and asks if you'd like to increase a diagonal of your TV for 1 inch at only a$25 price. This is as good of a deal as the previous one, Even better: as the diagonal length increases, you get more screen space per inch for the same price! You seal the deal, and get a 32 inch TV. But before you even install it, a salesman knocks onto your door with a great offer...

This repeats until you end up paying $500 + 20*$25 = \$1000 for a 50'' inch TV set--just as you could have done at the store in the first place. Since you got it as a series of increasingly lucrative deals, there is no sense to buy a 30'' one, and you should definitely go for a thousand dollar 50''. Which means that, a choice of a 30'' TV is never optimal.

Quot Erat Demonstrandum.

Note that it's sensible to manufacture smaller TVs in smaller quantities, because the comparison with a smaller screen is the gist of the paradox. Meanwhile, people have been buying 30'' televisions over and over, while this is completely irrational! And that's the Zeno's Flatscreen TV Paradox.

I remember this picture of my typical day two years ago. As sunny Russian winter morning finally banishes the dull winter night, I finally say "bye" to my girlfriend and leave for historical part of Moscow. Where I'm heading to, in a three-story, old cottage that predates even the communist government, lies my beloved office. I'm alone there. It is deserted, but not because of lay-offs, a party I'm not invited to, or because it's a one-man startup. It is just Sunday.

I log in to my workstation, and start doing it. Checking experiment results. Debugging new experiment frameworks. Just thinking, trying to push limitations of a suboptimal algorithm beyond what the humanty knows so far. Being productive, though less than usual: the presence of peers doing something pressures you to check reddit less frequently.

That was my typical Sunday afternoon two years ago. Today is Sunday too, and I'm writing this very sentence riding a ferry over the Bay rather than sitting in an empty office. To answer why, first I should understand why I went to the office on Sundays in the first place.

### Integrity

Of course you've realized that the Sunday office was my shelter from the girlfriend, because it would be impossible to work at home in her presence. Keeping off of distractions is the reason why one would prefer to go to the office to do work, not just on Sunday, but on any other day. Other reasons exist, too.

Sadly, I couldn't find any reference to "anchoring" in a more reputable context than, say, site about fitness or life-coaching. This could be more "scientific" foundation behind it.

Some people take keeping everything off from home to the extreme. For instance, one successful finance consultant doesn't even have a fridge, let alone a PC, at home, because cooking and dining do not belong there as well as working.

One of them is psychological "anchoring". Office is associated in your mind as a place to do work, so your mere presence there, in formal clothing and in working hours, sets the right mood, and makes you more productive than relaxing in a comfy chair wearing slippers and a robe. The reverse is useful too: keeping work off of your home as much as possible makes your relaxation there more fulfilling.

Purely practical reasons kick in, too. At work, you have a fast intranet with a fast access to your cluster. It's somewhat amusing that at every place I worked as a programmer there was a cluster to which a large part of computation was offloaded. First it was a cluster that runs experiments, then it was a distributed software compiling and packaging system, and then the product itself was a distributed system that runs user's code. You definitely want to access this and other helpful corporate resources, such as repositories, with low latency.

Moreover, on your office machine you have a ready environment you work in, with editors opened, and consoles chdir-ed, so you can just dive in in and start instantly where you left. Finally, your corporate policies could prevent you from working from anywhere except the office for security reasons.

However, the reason why I no longer work on weekends is not that office became a less appalling place. The list above misses one crucial component.

### Alone in the Ivory Tower

After all, I was not a biologist, so I didn't need any help from a fellow researcher to stare at my tube.

When I was a research intern, working on weekends was easy. I spent a lot of time coding new algorithms and experimenting with them, reading papers written by others, writing my own, or just thinking on stuff. I spent total of 2/3 of the overall time on these activities. They all have one thing in common: I could do this alone.

While interacting with others is a very important part of a researcher's work I used to underestimate, you could work alone for a good chunks of time and still be somewhat productive. No wonder I did. Besides, our project had only 4-5 people, and interacting with them just couldn't occupy much enough time.

### Industry and a Larger Project

When I, as a Software Engineer, moved to what they usually call "Industry," I also tried to establish the stream of working on weekends. I managed to sneak in on two Sundays, but that was it. With a much more "officy" office, and still a challenging project, it didn't work anymore.

This is a real picture from Android patch workflow. Many workflows in a larger team look no less insane.

The work now demanded collaboration. The team was 5 times larger, and most of the work I was doing required to check with others frequently. Sometimes it was as ridiculous as making three meetings over three weeks to agree over a certain feature or change that took 15 minutes to implement. More often, you were just stuck waiting on others' decisions, on others' feedback, and on others' work as programmers or sysadmins.

I could find side projects and less important things a programmer always has in his backlog that required no collaboration, but unimportant work was never worth a Sunday spent in the office.

Besides, working on weekends now involved making peers feel uncomfortable. Some practical applications involved assigning a security guard to protect me working, which required me to apply for each weekend work permit separately as per company's policies. Application could be automated, but making another person waste his time was too much. I went twice, when the presence of the guard was justified by sysadmins and managers doing other work they couldn't postpone, and never bothered with it afterwards.

### Let's Multiply By Ten

Although I spent weekends just like the normal people, I still stayed longer hours to get more work done. I still made security guards uncomfortable because they have much less opportunities to close earlier, but it was now legally my time anyway.

Now let's multiply by ten the size of the team, the volume of the codebase and the complexity of tasks, which describes my shift to the next job. The amount of work that could be done without the communication also shrunk by the same factor.

Email became the primary work tool. Most of the work is triggered either by notifications from others, then it is discussed with the peers who are much familiar with the codebase. When I am stuck with something I can spend hours in the late evening trying to debug the problem on my own, or I can ask my peers who worked on the project for a long time, and get it resolved within minutes.

Not only the office stopped to enjoy my visits on Sundays, I started to work late hours less. The communication with peers is no longer just red tape, it is the only way to be productive. And this communication is so much more efficient in the office than over e-mail, video conferencing, and phone calls, that you being in the office is inevitable. Studies showed ridiculously big numbers for how the bandwidth of the information our brain receives in real-world communication is larger than what we get by a video call. And it's more fulfilling, too.

To increase my productivity I even started watching my sleep cycle, and I now try hard to move my typical shift from 12-9pm to the normal 9-6pm. It doesn't work out well. I try.

### Christmas as a Way to Enjoy an Empty Office

This amount of communication sometimes becomes overwhelming. The incoming mail speed starts to become more than the speed with which you can read it. No wonder that people started to value the Empty Office, when distractions and stress decrease, and you can just enjoy exercising your favorite art. Of special value is the US Christmas holiday, where you have two days others usually take an unpaid leave on so you can work with fewer interruptions.

Few come on weekends to achieve the same effect, of course, because it is still not productive; but a legitimate reason to have a less populated office is of understandable value.

### ***

I still like my job, and am as passionate about it as when I was younger; I still can handle 6-day work week. But, on Sunday, I'd rather read about something in the comfort of my home, go out with my friends, or visit some local nature. Hiking usually ends with a ferry ride with an awe-inspiring view on San Francisco skyline I can't really enjoy because—let's face it—I open my chromebook and read or write something about programming. I just don't do it in the office anymore.

This post is not about programming at all, but just about solving old problems with new computing tools. The old problem is finding the neighbourhood to rent the apartment in, and the tool is a map. Not just a map, but the Google Map. While looking at map when renting is a good advice, Google Map has a killer feature that unlocks a very new option.

Google Map can quickly recalculate the best public transportation route between two points on-the-fly while you drag one of them. This allows you to quickly learn what neighbourhoods provide a good commute to your specific office. You quickly uncover all possible commute options you have by moving the start marker around a neighborhood in order to filter then out the worst from your Craigslist search.

You may have other priorities rather than just getting the best commute, of course: quality schools, places to go out, parks nearby, et cetera.  I only describe how to filter by one criteria, and it doesn't have to be primary.

I only have four "small" requirements to daily commute:

• Public transportation. Driving a car in a big city is a nightmare, and commuting by bus or train allows you to read an interesting book or magazine (well, for now);
• Zero transfers. Each transfer drains your lifepower, and robs you of several minutes you could have been standing calmly, reading.
• Short total walk (5-10 minutes). Walking extra 10 minutes on a flat surface will have zero effect on your health, but will rob you of 10 minutes--or even more if you cross a lot of traffic lights.
• Overall time about 30 minutes. While you can do something while commuting, in-city public transport distracts you too much. 10 minute shorter commute saves you about 2.5 days per year, so investing 20 more minutes into commute time screening repays itself.

So for me, an ideal commute is a 30-minute ride on a bus or train directly to the office. And without Google Map I wouldn't know that I actually can accomplish this, and live in a cheaper neighbourhood at the same time. The map also uncovered a train stop I didn't pay much attention to--right next to the office.

So, here's a 6-minute video how to scan the area with a couple of tips of Google Maps usage.

That was exactly how I scanned the city when I was looking for my new home. It turned out that I avoided the areas many people prefer to live in, but I found the best place specifically tuned for me instead.

Please, bear in mind that I know very little about the City of San Francisco, so I might have missed something, or discarded certain areas too vigorously. But knowledge may play against you: say, you know that you can arrive to Embarcadero by any train, and this prevents you from even trying to find closer public transport stops to your specific office, which Google Map does for you automatically.

There are more features in the Map I didn't show, and more ideas still implausible. You can set more preferences (click "More options," and select Bus/Train or prefer "Fewer Transfers"). You could use waypoint search if you were looking for a good commute for two persons at once (make a route between two offices, add a waypoint, which represents the home, and drag it over the map), but waypoints don't work for public transport for some reason, and this doesn't scale well for three, sadly. And if you know more tips, make your videos and comment here, I'd be happy to hear about them!

"Forget these silly runtime errors you had in Ruby: C++ is statically-typed!" Levap, a seasoned engineer, was enlightening the new hire. "You no longer have to wait while the sample run or unit tests finish to learn that your variables have the wrong type, nor you have to worry about this happening in production".

So, Paul, the younger engineer, stopped worrying. He went back to implementing a subclass that, unlike its parent, would only perform a low-level blocking write of data stored in a buffer object instead of more complex writing semantics found in the parent:

The new Write function would assert that the second parameter was false, and would just call a regular system's write, familiar to any C programmer. The only exception was that it fetched the target from the SimpleWriter object's private variables, and it wrote data chunk-by-chunk until finished. Hence, it didn't have to return the number of bytes written, and threw a fatal error on a severe error.

The yonger engineer soon found out that creating NeverBlockingWriter in place of SimpleWriter only required modifying method, and should involve no data copying. In a dynamic language, he would do just that by replacing the SimpleWrite method of the object with a nonblocking-only version, but in C++, Paul had to instantiate a whole new class and copy the information from the parent one to it. An alternative could be to force the superclass to be friends with its own child, which sounded more suitable for a dumb scene in a sitcom. Oh, if he only had the powers of Ruby...

Paul first merely called the parent's SimpleWrite(buffer, block), but then decided that calling SystemWrite directly was clearer and faster, so he fixed the code shortly afterwards:

This code looked right, but didn't work. What was worse, Paul planned to get the code working on the last hour before Christmas, and he couldn't, which created a sense of incompleteness and incompetence for the whole holiday.

What happened was that Paul updated the name of function to call, but forgot to update its arguments. Normally, C++ would barf a compile error in an instant. In this case, however, the trap was perfect: C++ silently, without a single warning, cast the pointer to Buffer to pointer to void, and boolean true to one. The program expected the other endpoint to reply to the data written, so the lack of the complete write created a very weird bug. Apparently, these functions were not different enough for C++ to show a compile error.

The fixed version immediately demonstrated that everything else was OK:

"Statically typed, my ass," were Paul's next words. Nobody heard him, since he stayed later to fight this bug.

Paul saved the code, and finally left the office. On his train, he shed a tear for OCaml that distinguished between integer and boolean, and promised that next time he saw a weird bug, he would check if he fell into another loophole in C++'s "static" typing.

"Lol, just pay more attention next time, smartass," the seasoned engineer advised.

# Comments imported from the old website

http://www.koryavov.net/ on 04 January 2013 commented:

Forget C++. Use Java! It is a true statically typed language. :)

Petar Petrov on 29 January 2013 commented:

From: http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html I tried using: -Wconversion what people on StackOverflow recommended too. NADA! no err no warn :( there must be a way to atleast warn on implicit conversation!

Reproducible on G++3.7

Pavel Shved on 08 February 2013 commented:

Well, technically, it isn't a bug. The conversions are very well defined, and should be expected. You are very unlikely to encounter them at the same time, this unlikeliness is why the post appeared in the first place :)