Recent blog updates

Shown here are only posts related to c++. You can view all posts here.

Division by Zero? How About Multiplication by Zero

Crash due to division by zero? How about multiplication by zero?

I recently transferred to the next level of floating point arithmetic caveats. Division by zero is something we all know about, but multiplication by zero can be as harmful.

An algorithm computed weight of entities, then converted them to integers, as in

This code may fail the assertion. Imagine y is large (say, 1000), so exp(y) no longer fits double. The value of exp(y) will be +Infinity in this case. Surprisingly, it will correctly understand that INT_MAX is less than +Infinity, and will pass the check as expected. But here's when multiplication by zero kicks in.

What will happen if x is 0, and exp(y) is infinity? Any result will be mathematically nonsense, unless it's a special value of "Not A Number". min() would also return NaN, and integer conversion will happily convert it to... -2147483648. The assertion fails, and the production job crashes because it does not expect the result to be negative. We're multiplying two positive floating point numbers, how can it be negative?

Yet it is. All because of multiplication by zero.

Read on | Comments (0) | Make a comment >>


How to Get Binary Search Right

As you already know, binary search is a O(log N) way to search if a particular value exists in a sorted array. There are some reports on how "most of our implementation" of binary searches are broken. The mistake discussed in that post is the overflow when calculating the index of the middle element. But regardless of whether you get this bit right, there's a different problem with all binary search implementations I've seen.

They're just too damn complicated and hard to get right out of your memory.

Key points

Almost always, if you're relying on your memory, you should prefer simple algorithms to complex, like you should implement a treap instead of an AVL tree. So here's a sequence of steps that helps me to write binary search correctly.

Bisection instead of "binary search"

Starting from this point, we aren't searching for anything. Searching for an element in array has a too complex definition. What should we return if the element doesn't exist, or there are several of them equal to each other? That's a bit too confusing. Instead, let's bisect an array, pose a problem that always have a single solution no matter what.

Given a boolean function f, find the smallest k such that, for all i < k, f(a[i]) == false, and for all j >= k, f(a[j] == true). In other words, we are to find the smallest k such that for all i within array bounds f(a[i]) == (i<k). If the array is sorted in such a way that f(a[i]) is monotonic, being first false and then true, this problem always has a single solution. In some cases, when f(a[i]) is false for all array elements, the resultant k is greater by one than the last index of the array, but this is still not a special case because this is exactly the answer as defined by bold text above.

This is more powerful than the binary search, which can be reduced to this problem I'll call here "bisection". Indeed, to find an element x in the array a of length N, we can do the following:

Now let's get back to solving the bisection, once we've defined it.

Well-defined bounds

The answer to our bisection problem is between 0 and N inclusively, so we will start by assigning i to 0, and j to N. No special cases allowed: we can do without them at all. At each step, the answer we seek will be between our values i <= k <= j, and we will not know any information that would help us to reduce the range size.

Non-overflowing pivot point

It's easy to make an overflow mistake in computing the pivot point; i.e. the point at the middle of the array we'll apply f to to make a step. This problem is discussed in the article I linked above more. If we were in the world of non-bounded integers, it would simply be (i+j)/2, but we live in the world of integer overflows. If i and j are positive integers, which makes sense for array indexes in C++, the never-overflowing expression would be:

Another useful property of this expression (in C++, at least) is that it's never equal to j if i < j. Here's why it's important.

Well-founded termination

A basic way to make sure that your program terminates some day is to construct such a "loop variant" that strictly decreases with each loop iteration, and which can't decrease forever. For our bisection, the natural variant is the length of the part of the original array we currently look for the answer in, which is max(j-i, 0). It can't decrease forever because its value would reach zero at some point, from which there's nowhere to proceed. We only need to prove that this variant decreases each iteration by at least one.

Fun fact: special programs can prove that a program terminates automatically, without human intervention! One of these programs is named... Terminator. You can read more about this in a very accessible article "Proving Program Termination" by Byron Cook et al. I first learned about it in a summer school where Mr. Cook himself gave a talk about this.

First step is to make sure that the time it reaches zero would be the last iteration of the loop (otherwise it wouldn't decrease!), so we use while (i < j).

Now assume that we've understood that the pivot point s does not satisfy the target function f. Our key difference with most binary search algorithms is that we should assign the new i to s+1 rather than s. Given our definition of the problem, s is never the answer in this case, and we should keep our candidate range as small as possible. If f(a[s]) is true, then s could still be the answer, so it does not apply here. Here's our iteration step now:

We mentioned in the previous section that i <= s < j, hence i < s + 1 and s < j, so each of assignments decreases j-i by at least 1 regardless of f's value. This proves that this loop never hangs up forever.

The complete program

So here's our bisection that works for all sorted arrays (in the sense of f(a[i]) being first false then true as i increases).

The search itself in terms of finding if a value exists in a sorted array, would be then written (I'm just repeating what I wrote above, no surprises here):

Proof By Authority

And GNU C++ STL implementation of binary_search follows the same guidelines I posted here. In my version of GCC 4.3.4 binary search looks like this:

Where lower_bound is the bisection function.

Conclusion

Here's my version of bisection and binary search I usually use when I need one. It is designed to be simple, and simple things are easier to get right. Just make sure your loop variant decreases, you do not overflow, and you get rid of as much special cases as possible.

Read on | Comments (2) | Make a comment >>


The Pipe-Pipe-Equals

It is hard to come up with a google-friendly name for the ||= construct you see in some programming languages quite often, "pipe-pipe-equals" being the is the closest (other names include "double-pipe equals," "or-equal," or "double-or equals"). Why should we name this monster in the first place? While cryptic at first sight, it is a very convenient shorthand that, I believe, allows us to write cleaner code. Let's see how it works on this example:

Essentially this ||= construct is equivalent to other convenience operators, such as +=. x += 2; means x = x + 2. The "or" operator || usually denotes a weak logical "or" that doesn't try to compute the left-hand side if the right-hand side is true. Usually, meaningful values evaluate to true, while the only evaluating to false are NULL pointers/empty values, zeroes, boolean Falses, and, sometimes, empty strings.

Alternatively, you could set a default hash value, but ||= is used more widely, and also takes exactly one line... doesn't it? ;-)

The statement is used mostly to set a variable to a "default" value if it's "unset" at some point. This gives the most visible advantage when you initialized dictionary elements in a loop. Here's a piece that saves counts of each array element into a dictionary (Ruby):

In some languages (Perl, Javascript), you could not even bother with this, as += 1 on an unset value would result in its assignment to 1.

If you don't initialize output[x], you'll get a runtime error for trying to increment NULL. The advantage of ||= against other ways is that you don't repeat anything. You could have written the same piece as

Oh yes... we forgot about the most natural use of ||=, for booleans. Here's how we'd check if an array contains zeroes if we need to iterate over its elements for something else.

Pipe-pipe-equals is rarely used this "natural" way, though.

But here you have to type output[x] twice and add two more lines of code, which is a complete waste of screen space and, possibly, computing resources if the interpreter doesn't optimize the duplicated computation out. Let's take a look how pipe-pipe-equals works in different languages (we've already seen it in action in Ruby).

Perl

Perl was the first interpreted language I learned, and the first place I saw the pipe-pipe-equals operator in. It works as expected, but is used less often than the direct version of the weak logical "or" to specify default parameters of a function. Here's how you'd put a default to a hash bucket:

This works in a strict mode, too. Note that, in Perl, many things evaluate to false, including empty strings (I once even tried to emulate it in Ruby). To restrict this action to undefs only, use //= in Perl6 instead, which sinks to its right like the Pisa tower, and look as if you're trying to put a division while drunk.

Python

Python has no equivalent of this operator. You have to type the very[long(expression)] twice, and it will be computed twice. You have several different ways to do this:

The engineers more experienced in Python programming, though, assure me that this is not a big deal since you have a different way of supplying default arguments for a function, which seemingly covers half of the use cases for pipe-pipe-equals (this doesn't prevent Ruby from having both, though). Another half is covered by dictionary's dict.get(key, []) method, so that the code piece #1 can be written in a succinct manner. But I still miss it.

Bash (Linux shell)

Bash? Its language is so simplistic, and it looks creepy; how come it would have an equivalent of such a beautiful shortcut? Here it is:

This assigns y to variable if the former is unset or null (as per Bash manual). I mostly used it to set default parameters for environment variables user may or may not set before invoking the script.

C++

While |= is a syntactically correct expression (C++ does not have the "short-circuit" version of this expression), it doesn't do what we discussed here.

C++ is statically typed, so the result of the standard logical "or" is boolean. Retaining the nice semantics we find in a dynamically typed language would require it to be "either the type of the left-hand side or the type of the right-hand side". This is hard to pull in a pass-by-value statically typed language.

Pass-by-value semantics also means that not everything can be assigned a NULL, and not everything can be converted to boolean value. C++ has default arguments as well as Python, so the same reasoning could apply here. You'll have to be more verbose in C++. That's probably why only |= expression is available, which is only useful if its left-hand-side is bool (see sidebar above for similar usage.)

OCaml

Everything said above about C++ applies to OCaml as well. Moreover, OCaml, as a functional language, doesn't have a flawless support for mutation, and pipe-pipe-equals statement its inherently mutational. However, its matching operator would require us to use the very_long_variable twice. However, OCaml and other functional languages have a very interesting construct called "option". If something has "X option" type, it may contain either "nothing" or a value of x. Then, this value may be "unpacked" trough pattern matching:

let very_long_variable = match very_long_variable with None -> y | Some t -> t

here, t is not an shorthand for another very long expression; instead, it's just an identifier, written as is. The match...with allows us to "unpack" values of structured (algebraic) types with shorthands like this. Since this was too long, OCaml has made this a library function Option#default:

let very_long_variable = Option.default very_long_variable y

Anyway, OCaml programs are even more explicit than those in C++ and Python, so trying to tie pipe-pipe-equals into them is quite pointless.

Ruby

"Erm, we saw how it is in Ruby at the beginning," you might think. Well, I lied to you a bit. The thing is that, in Ruby, it is not strictly equivalent to an assignment to a result of logical "or". Which do you think x ||= y is equivalent to?

In Ruby, and only for ||= and &&=, it's the second. If you assign to something other than a local variables, what looks like an assignment is actually a method call (think properties,) and, if so, this assignment does not happen at all if the left-hand side of ||= is false. Which makes sense, but looks like a special case. Read more here.

Why You Might not Need This

Some argue that this operator is mostly useless, especially if their favourite language doesn't have it. Here are some arguments they list.

Redundancy

Indeed, this expression is redundant. You can do the same in a multiple different ways, All the examples I demonstrated above showed how to write essentially a simple if statement in a very short-hand form. The number of characters spared is probably not worth it to include support for this feature to a language designer's must-have checklist.

The statement discussed decreases the redundancy in code in return of broader language definition; each language seeks a balance between these, and often leaves the pipe-pipe-equals aside.

Default Function Arguments

The ability to specify default function argument (C++, Python, Ruby, but not Bash or Perl) covers many use-cases for ||=. However, this doesn't help fancy loops that fill complex structures, one of which we showed above. Nor helps it when you have to specify the default parameter anyway but the concrete value is not known at the time of coding, and is an optional field user may or may not fill.

Confusion

It is confusing. The pipe-pipe-equals requires explanations how it works. It becomes a special snowflake, different from its "mathematical" counterparts +=, if a complex language wants to make its very useful (see Ruby section above). While "confusion" as an excuse of not doing something is my pet peeve (and I tried to explain why), it indeed requires some effort to understand the mechanics. I'm sure that you'll love it once you understand it, and the way I learned about this operator is not by reading a textbook or "The Most Confusing and Obscure Programming Language Features Possible" newsletter, but by reading someone's code.

To Use or Not To Use?

When it comes to deciding whether or not to use pipe-pipe-equals in the "creepy" way we discussed throughout the post, the criticism somehow fades out. If the language supports this, it will be used. I haven't encountered any coding style document that bans this feature. In Ruby and Perl, it is considered as good of an idiom as many. So the answer on the question of whether or not you sohuld use pipe-pipe-equals (||=) in your programs is, definitely, "yes".

Read on | Comments (0) | Make a comment >>


Not So Static

"Forget these silly runtime errors you had in Ruby: C++ is statically-typed!" Levap, a seasoned engineer, was enlightening the new hire. "You no longer have to wait while the sample run or unit tests finish to learn that your variables have the wrong type, nor you have to worry about this happening in production".

So, Paul, the younger engineer, stopped worrying. He went back to implementing a subclass that, unlike its parent, would only perform a low-level blocking write of data stored in a buffer object instead of more complex writing semantics found in the parent:

The new Write function would assert that the second parameter was false, and would just call a regular system's write, familiar to any C programmer. The only exception was that it fetched the target from the SimpleWriter object's private variables, and it wrote data chunk-by-chunk until finished. Hence, it didn't have to return the number of bytes written, and threw a fatal error on a severe error.

The yonger engineer soon found out that creating NeverBlockingWriter in place of SimpleWriter only required modifying method, and should involve no data copying. In a dynamic language, he would do just that by replacing the SimpleWrite method of the object with a nonblocking-only version, but in C++, Paul had to instantiate a whole new class and copy the information from the parent one to it. An alternative could be to force the superclass to be friends with its own child, which sounded more suitable for a dumb scene in a sitcom. Oh, if he only had the powers of Ruby...

Paul first merely called the parent's SimpleWrite(buffer, block), but then decided that calling SystemWrite directly was clearer and faster, so he fixed the code shortly afterwards:

This code looked right, but didn't work. What was worse, Paul planned to get the code working on the last hour before Christmas, and he couldn't, which created a sense of incompleteness and incompetence for the whole holiday.

What happened was that Paul updated the name of function to call, but forgot to update its arguments. Normally, C++ would barf a compile error in an instant. In this case, however, the trap was perfect: C++ silently, without a single warning, cast the pointer to Buffer to pointer to void, and boolean true to one. The program expected the other endpoint to reply to the data written, so the lack of the complete write created a very weird bug. Apparently, these functions were not different enough for C++ to show a compile error.

The fixed version immediately demonstrated that everything else was OK:

"Statically typed, my ass," were Paul's next words. Nobody heard him, since he stayed later to fight this bug.

Paul saved the code, and finally left the office. On his train, he shed a tear for OCaml that distinguished between integer and boolean, and promised that next time he saw a weird bug, he would check if he fell into another loophole in C++'s "static" typing.

"Lol, just pay more attention next time, smartass," the seasoned engineer advised.

Read on | Comments (3) | Make a comment >>


Syntax elements? User-defined functions!

Many languages, especially those non-functional ones that originate from 20th century, have immutable syntax. Such symbols as +, (, or . have predefined meaning, which can't be changed by user.

Based on this, when you see a similar symbol, you think that its meaning is predefined by specification of the language. But it's not always true.

C++

The most beautiful example of operator overloading in C++ is "lazy evaluation". Assume you defined a matrix class. You may want to use operator overloading to work with it, so you could write:

The expression at the right-hand side will be evaluated like this: first A*B will be computed and stored in some internal temporary variable tmp1, then (tmp1*C) will be computed, and stored in tmp2, etc.

However, it's known that evaluation of matrix multiplication chain can be optimized. So instead of straightforward definition where multiplying two matrices returns a matrix, we could define it in such a way that it returns a special "chain handler". This "handler" would accumulate references to all matrices involved in the multiplication, and perform the actual multiplication only once, when it gets converted to matrix M as a part of the assignment.

C++ is notorious for its operator overloading. You can imbue any meaning to (nearly) any operator used in expressions. This means, that any simple expression, such as "a = b + c" could actually invoke a complex user-defined procedure.

Of course, you can't redefine anything. Syntax elements used to define control-flow can't be redefined, as well as some operators that provide. And, most importantly, you can't create your own operators. But even with the powers you have, you can hide some userspace functions behind the usual syntax.

Linux Shell

If you wrote or read Bash programs, you probably encountered such constructs:

The [...] "conditional" expressions are used to query file attributes and to compare strings. They are builtin shell operators...or not? Let's check:

pavel@lonely ~ $ ls -l "/usr/bin/["
-rwxr-xr-x 1 root root 35032 Jul 20 01:53 /usr/bin/[

What the hell? So these brackets are actually small programs that merely take the subsequent tokens as command-line arguments? Well, they could be:

pavel@lonely ~ $ "/usr/bin/[" -f /bin/bash ] && echo Yes
Yes

Thankfully, Bash itself doesn't invoke them (it uses built-in implementation of this functionality). You can check it with strace or by temporarily removing this file. Perhaps, this file is kept for compatibility with other shells.

This example demonstrates that what you think of as syntax elements could actually be a user-defined function.

OCaml

Haskell language has a special notation to make an arbitrary function infix. I.e. you could make an "operator" out of a simple function:

Prelude> min 1 2
1
Prelude> 1 `min` 2
1

However, another functional language, OCaml, doesn't have such syntax. It can define function as infix, but not "convert" it to infix as shown above. And there's no hope for it to work: it would require introducing additional syntax to language specification... would it?

# let ( /* ) x y = y x
  and ( */ ) x y = x y
  in 1 /*min*/ 2 ;;
- : int = 1

The idea is to define some functions that look and feel like "syntax", leaving language definition intact. The above code defines infix functions named "/*" and "*/". And if you saw code like this (1 /*min*/ 2) at distance from the definition of these "functions", you would really thing it's some kind of OCaml syntax.

Here's the origin of this idea.

***

Modern languages give users more freedom to define and re-define meaning of what looks like syntax elements. This freedom can be used to write more beautiful and concise code, as shown in the examples above. By the way, do you know more?

Read on | Comments (4) | Make a comment >>


Deleted functions in C++0x and binary compatibility

Having read a recent version of C++ ABI standard, I noticed that certain bits of it were revised (the version of the standard hasn't nevertheless been changed). Particularly, a vtable entry of a mysterious "deleted virtual function" was documented. I thought that it could influence binary compatibility, and here's what I found out.

What deleted functions are for?

C++0x introduces another form of a function definition -- deleted function:

When a function is deleted, it means that calling it is prohibited, but substitution of it is not a failure. This is mainly to be used instead of making constructors private and providing abort()ing definitions to them (to explicitly prohibit copying objects, as in example above):

Another use-case is to prohibit certain type conversions. The use cases are well described at the relevant wikipedia entry.

In other words deleted function means "the function is declared but it actually doesn't exist."

What is SFINAE?

SFINAE is an acronym for Substitution Failure Is Not An Error. It is used for tricky compile-time template-based lookups that add a lot of flexibility to C++ language. The most obvious case, from the Wikipedia entry is this:

However, sometimes SFINAE seems to empower C++ beyond imagination of an average C++ programmer. It can be used to:

Yes, C++ can do this; with pain and squeak, but it can. A good explanation of WTF??! can be found here.

That's all it could be used for. It can't benefit SFINAE, since deleted functions do not influence lookup rules, so a deleted function should be considered a proper substitution; this, in turn, leads to compile error because deleted functions can't be used in any way, even if they're not called. However, this sometimes is of benefit, since we may want certain lookups to fail, instead of silently picking a badly-working substitution.

So the main purpose of deleted functions is additional compile-time checks. Not even altering the way a code that uses certain functions and classes works, but mere checking. However, hiding constructors arises issues that concern binary compatibility. Let's check if there's anything interesting.

Binary compatibility issues

In the problem of binary compatibility we consider two versions of a C++ library: A and B. We try to write B in such a way that any program built against A lib keeps functioning if there's just a B library in the system. I usually restrict it to GCC compiler, because I don't know much about the other ones :-) (and because of the means described in the other blog post about C++).

If a function is declared deleted, then a) it's considered inline, and b) it can't be called by a complying program.

The property b) is interesting since it influences source-code compatibility. In my research I assumed that binary compatibility of libraries should only be retained for source compatible programs, i.e. for those which compile unmodified with both versions of a library.

This means that if from a deleted function in library A the delete specifier was removed in library B, then it can't influence anything, because that function wasn't referred to in any program built against A library. Moreover, if it's the case of a virtual function, the claim cited above of ABI standard makes virtual table not change in size.

This also means that if a function gains delete specifier in B library, the source compatibility requires all programs that are built against A library... just not use that function. This means that anything can happen on binary level (note again that a slot in vtable is kept) with this function, and it won't influence any of older programs.

API modifications: would delete help?

So for source-compatible programs deleted functions introduce absolutely no problems. I'd like to refrain from analyzing source incompatible programs, because, strictly speaking, we could then dive into unfruitful analysis of compatibility of sound library and instant-messaging libs.

However, there are practical patterns of API changes that break source compatibility but still make sense. For example, we want to deprecate a function, so that newly linked programs (against B library) will use a safer equivalent, while those linked against A will keep working. Would delete be of help here? (That's actually the first thing I thought about when I saw delete modifier).

Unfortunately, it wouldn't. Deleted function don't have a body; it's just not emitted, because they're inline and... erm, inapplicable to calls, by definition. Moreover, providing a body for a function previously declared "deleted" is a violation of one-definition rule. So, unless you're doing dirty hacks, when the header you compile library with and the header you expose to user are different, you're out of luck here.

What would help is making functions private. If you want to safely "delete" your virtual function, you should just make it private (and ditch all references to it from your internals, of course). The virtual tables won't change in any way. For non-members, where "private" is not available, the best you could is putting a line in your documentation "DON'T CALL THIS!"

Of course, after some time you may declare your function deleted thus making all programs that still use it incapable to start (dynamic linker will signal unmet dependency). If the function was virtual, no problems with virtual tables will arise as well. That seems like a sane and safe deprecation method (normal-private-deleted).

No problems may arise if a private abort()ing function was declared deleted in the new library version. Such function couldn't be referenced from the complying programs (unless you for an insane reason did this in your public inline member functions). So no dependency on their symbols would be introduced.

Conclusion

The first thing I thought when I saw a delete specifier was, "Wow, that is probably cool stuff for binary compatibility!" Indeed, it helps to remove obsolete functions (and break compatibility if it's intentional), but in most cases (when it's non-virtual) it's no different from just removing this function. So deleted functions don't bring any new concepts to binary compatibility. Nevertheless, to be fair, they don't bring in anything harmful as well.

Read on | Comments (0) | Make a comment >>


How I applied for a web server developer, or why the fastest servers are written in C instead of C++

I realized why I was so aggressive in attempts to guide one of the latest StackOverflow questions so it would gain useful answers and wouldn't be closed as subjective. The question title is "Why are most really fast servers written in C instead of C++", and I really wondered why they are. Because that's... a bit personal to me. So, let me tell you this story.

Some time ago, after I suffered a fiasco to get a job in the USA, I thought it was a good idea to change my career to a less ivory-tower-ish. I thought about what was going on in the industry, and decided that since everything was moving to web, it would be a fruitful choice to become a web server developer. So I sent my resume to three companies that were hiring a newbie Linux C++ web server developer. Being under impression of "Smart and Gets things done" stuff, I wrote a resume that way and explicitly stated that I don't know anything about web serving.

First

One company, which is the top Russian search engine, declined me with a note "Your skills are valuable, but we can't offer you a job now" (bold is mine). I still wonder what that means: am I good or bad? Should I get more experience and try again or that's just a polite rejection? But well, alright, that's the number-one company, so I didn't frown upon that.

Second

The other company was an emerging startup. They had a shiny big office with free coffee, and a large restroom (at least that's what they said). A sign of warning was that the team failed its previous startup, but hell, why not give it a try? I was contacted by the CEO (it's a small startup, isn't it? Anyway in one Silicon Valley startup I was also interviewed by a CEO, so that didn't bother me) and was sent a sample task.

The task was to develop a C++ anti-DDOS module that drops any IP that exceeds a connection limit per a prespecified time. The task had been sent without any prior notice, and had a week-long time limit (who cares that I have plans for the week?). Period.

Well, alright. I didn't have any experience with Apache and web serving at all, so I started googling like crazy. After five hours of googling, I learned how to create apache2 modules, I devised a way to create a C++, not a C, module and I wrote a module that worked but was thread-local (apache starts a lot of threads). Essentially it was useless, but, well, spending more than 5 hours for a job application is a commonly acknowledged overkill!

I thought it did prove that I was a fast learner and a capable programmer. I sent that solution with notes how it should be improved to make up a complete module. However, "the architect was not satisfied with my code," and I was asked to finish what I had been assigned to. A note that I already had invested enough time for them to see if I was capable good enough to do the job was replied with a hilarious letter. It happens that I don't profess the company's philosophy that values result rather than capability! And unless I finish the code I'm not accepted.

The startup I applied to has, I guess, already failed; they still are not spoken about--whole google output is their SEO technologies. And they still require a registration on their website to all applicants--pathetic, isn't it?

Well, if the company values working for it for free, then it indeed is of a different philosophy than me. Good luck to these guys, I hope they won't fail many startups until they learn something.

Third

The reasons why I don't and won't apply to Google deserve a separate post, I think.

The third was another top Russian Internet company. They assigned an interview in their shiny big office with free a pong table, PS3 avaliable, as well as the usual free coffee and a large restroom! The interview started with a question "Why didn't you apply to Google?" -- apparently because of my Summer of Code participation (or because I was a candidate to replace a guy who transferred to Google from there).

The interview went smoothly, even considering that I didn't know anything about how HTTP or web servers work. After more usual questions, I was given a coding assignment. The clouds started gathering: they asked to write a C program to parse apache log and detect how many different IP addresses were in it. "Why not C++?" I asked, "it's easy and fast in C++!" The reply was that they wanted to judge my algorithm skills, not the knowledge of standard libraries. Well, OK, that seemed fair.. So, let's reckon how I did it in programming contests... *rubbing hands*

In 20 minutes the program was ready, but the remote server I was ssh-ed to suddenly went out of space, and the source somehow managed to get erased after a failed attempt to save it (Vim let me down this time). I spent another 20 minutes to code it again; it was all OK, and the interviewer even didn't see anything to criticize my code about.

The interview seemed to have gone great. The architect that interviewed me happened to come from the same university as me, was a fan of Vim as well, and sounded pleasant. We proceeded to discussing my salary, I made an offer, and there it happened.

"Okay, now I must confess," the architect started, "We lied to you a bit. All our software is actually written in C. Although we were looking for a C++ developer, you'll have to do C programming mostly."

O_O

The thing, yet unknown to the interviewer, is that I hate C; I hate it as much as I love C++. Feeling so deceived, I burst into explaining that writing in C is disgusting, since C is the new assembler, regardless of that I can do it quite well... No wonder I didn't get this job either.

Of course, I asked why they did development in C. And the answer was, "well, we sometimes feel urges to start rewriting it all in C++, module by module. But then we question ourselves—"but why?", and leave things as they are".

***

And now I also question myself—"why?" The fact that servers are mostly developed in C, I think, was crucial for my life. Perhaps, if it weren't the case, I would already have said goodbye to science and my masters studies, and would become a full-time corporate server developer. But sometimes at nights I still wonder why... why didn't they rewrite it in C++? And that's why I wanted that SO question answered.

Read on | Comments (0) | Make a comment >>


Binary compatibility of C++ Shared Libraries in Linux

A year ago I've completed my first "scientific" paper. I presented it in a local conference for junior researches. It actually didn't have much effect: as far as I understood no one at the conference was into the subject (ah, no, I'm wrong, there was one guy who developed boost::build). I wanted to fix it (since it contained minor bugs) for a very long time. But since I haven't done it, then, I suppose, I will never manage to fix it.

So here it is. A pdf text is available online freely, and there's even a ppt presentation.

Small abstract: it aims to describe why C++ libraries may lose binary compatibility. It also analyzes if certain compatibility restrictions may be relinquished if the user of the library is only allowed to use a subset of C++ by library's specifications. No one did such analysis before.

The paper is also a somewhat successful attempt to analyze C++ with scientific method. It appeared that this language is not eligible for such analysis, since it's just too huge. But anyway this is the most complete--and thus the most useless--compatibility guide. It was quoted by GCC folks in their manual, but that's an only known reference; and I have no idea how they found my article.

The developer of ABI Compliance Checker used the knowledge gained during this study, and this checker happens to be a working tool. This proves that my studies haven't been useless or erroneous.

Well, on this weekend I'm committing another article to a conference; and I hope it will be less useless to the world. Wish me luck.

Read on | Comments (0) | Make a comment >>


Busting C++ myths: virtual function calls are slow

Here and there, over the Internet, you sometimes see unproved claims that "C++ is slow", "C++ ABI is unstable" etc. The rationale behind these claims is usually not provided by the speakers. However, many people believe in these myths, having never seen a proof of any. so I'd like to bust a couple of them.

What C++ is good at is the ratio between plausibility to describe abstract and high-level concepts and performance of the compiled program. No wonder that the argument about the performance penalty of these concepts never gets old.

Background

On of these concepts is, of course, virtual functions. Their calling mechanisms are often called "slow", or, at least, "unclear". Of course, "unclear" should mean "hard to predict", but most likely it means "I couldn't/didn't try to understand it". Let's fix this.

What about other compilers?

Indeed, I only take into account the implementation of C++ depths on one compiler, and that's the C++ compiler from the famous GNU Compiler Collection.

What happens on other compilers, you may ask. Actually I don't know. And I think it's a bit out-of-topic here. The purpose of this article is to advocate C++ as a language by showing that there exists at least one reasonably fast implementation of it. If it happens that your compiler generates worse code and the assessments made here don't hold for it, address your compiler vendor with this question.

C++ language is governed by a standard, but the way it's implemented under the hood is left for a compiler vendor to define. However, developers of certain compilers attempt to govern parts of "binary" behavior of their implementation of C++ language by separate standards. In this article I will follow the "standard", that GCC follows: Itanium C++ ABI. Contrary to the title, it's followed on other architectures as well, particularly, on x86.

How are the virtual functions implemented?

From very old times virtual functions were implemented as "vritual tables". Being successors of function pointers as the way to control program flow (contrary to pointers as a way to implement first-class functions), virtual functions hide these pointers into an implementation-defined structure named, again, "virtual table" (or just "vtable"). Vtable is common for all objects of a giver runtime type, and the pointer to the proper vtable is stored in each object.

A key to understanding a vtable and why it's performant is that it's not a table, but, rather, an array:

(vtable for QTableWidget class)

00  QTableWidget::metaObject() const
01  QTableWidget::qt_metacast(char const*)
02  QTableWidget::qt_metacall(QMetaObject::Call, int, void**)
03  QTableWidget::~QTableWidget()
04  QTableWidget::~QTableWidget()
05  QTableWidget::event(QEvent*)
06  QObject::eventFilter(QObject*, QEvent*)
07  QTableView::timerEvent(QTimerEvent*)
08  QObject::childEvent(QChildEvent*)
09  QObject::customEvent(QEvent*)
010 QObject::connectNotify(char const*)
011 QObject::disconnectNotify(char const*)

(snipped)

For each function the index is known at compile time. For example, if the code calls tableWidgetPtr->timerEvent(event), the calling algorithm loads function pointer from vtable[7] cell (number 7 corresponds to timerEvent() member function) and calls it instead. Not for all cases it is that simple. Sometimes (we'll show it later) it is a bit more hard. But there's one common truth: no dynamic distpatching by function name, actual base class etc ever happens during virtual function call. This "7" number is precompiled into the object code, not looked up at runtime. C++ is not a modern interpreted language, where you can run an object method a user has just written in a textbox.

Performance assessment

To measure performance I will take into account "shifted-pointer dereferences" (loading base[offset] from memory into a register) and jump instructions. I am not too much into assembly language, but, I guess, these operations take more time than arithmetics on registers and stuff.

So, how slower is it than calling a usual, non-virtual function? For this simple case (single-inheritance) it takes, in addition to mere jump instruction that happens on a usual function call, loading a vtable pointer vp from the body of the object and loading one pointer from an already known index of a virtual table vp[index_of_func] (probably, triggering a cache-miss).

One constant-shifted pointer dereference. How short should your function be for it to be noticeable?

More tricky inheritance schemes

What is "inheritance branch"?

A non-common term I use here is inheritance branch. For a given inheritance graph that's a maximal chain where each class is a primary base of previous one. A primary base is the first non-virtual class with virtual functions or, if none, a first nearly empty virtual base or, if none, a first virtual base. By "first" it's meant "first one in list of base classes in the declaration of derived ones". (See here, case 2b for reference.)

Here's the example of inheritance hierarchy:

The inheritance branches are red, green, olive, blue, cyan. That's the order the sub-objects will be laid out in the complete, most derived one (ecah branch has one virtual table and is laid out near the same offset). First, branches with non-virtual roots are laid out in "depth-first" order; then, branches, roots of which are virtual, are laid out in the same depth-first order.

For more tricky inheritance schemes the processing will be more complex. If your class has a multiple-inheritance hierarchy, it will actually have several vtables, one for each separate inheritance branch (see side note). Consider you have an hierarchy of form

Then the first inheritance branch of C will be C-A, and the second one will be B. The thing is that a virtual table pointer for some class B in C is located right under pointer to (B*)c subobject of C c object (so, the first vtable pointer is located right at the first bytes of the complete object). Therefore a pointer to C can be immediately dereferenced to get a proper virtual table only for aa() function call. For bb() function call the pointer cp should be adjusted to point to B-in-C subobject and only then a dispatching mechanism described above, for single-inheritance case, should be performed. The adjustment is by a fixed, known at compile-time offset.

But you can't immediately call the body of the final overrider for a function distpatched this way? Assume that cp actually contains object of more derived class:

If a pointer to B-in-C would be passed as "this" pointer to bb's implementation, the this->i, being implemented as *(this+offset_of_i) would result into breaking the object's border (i is at the first bytes of the object, while "B-in-C" is allocated after i). An adjustment to this pointer is needed for such caller. A special function that does small adjustment before jumping to an actual user's function is called "thunk". Pointers to these thunks are inserted in vtables instead of raw function pointers. In our example, a thunk emitted for bb-in-D virtual function would adjust pointer by an offset (known at compile time for non-virtual bases and taken from a special part of vtable for virtual bases).

Therefore, as compared to usual function call, we need up to three pointer loads (4 in case of calling a virtual base) with one cache miss and--sometimes--a jump. Since thunks may be consecutive, jumps may be replaced by mere additions:

    a2: /* Thunk for non-virtual base A2.  */
        this += offsetof (B, A1) - offsetof (B, A2)
        /* Fall through.  */
    a1: /* Thunk for non-virtual base A1.  */
        this += offsetof (B, C) - offsetof (B, A1)
        /* Fall through.  */
    f:  /* Non-adjusting entry point.  */

But I won't try to judge its performance: compiler authors surely know better, when to do this optimization. We'll assume that the result of optimization is not slower than the straightforward case.

Calling virtual functions via interface pointers

Interface pattern is implemented in C++ as "public virtual" classes. Therefore, calling virtual function through the pointer to interface base requires 3 pointer adjusts and a jump. We need 3 adjusts instead of 4, since compiler knows that the subobject pointer points to contains a virtual function declaration, so we can save one offset (the one needed to look up for a proper vtable pointer).

Conclusion

Summing it up, overhead for a virtual function call compared to a call of usual function is the following (all shifts are known at compile-time):

  1. for simple single inheritance -- one constant-shifted pointer dereference (1 cache miss);
  2. for calling through interfaces (virtual bases) -- 3 constant-shifted pointer dereferences and a jump (that may be substituted with several increments/decrements of a value in a register);
  3. more complex inheritance schemes may lead to 4 pointer derefs (2 cache misses).

Busted?

Read on | Comments (1) | Make a comment >>


More posts about c++ >>