Busting C++ myths: virtual function calls are slow

November 30, 2009 1457 word 7 min read

Here and there, over the Internet, you sometimes see unproved claims that "C++ is slow", "C++ ABI is unstable" etc. The rationale behind these claims is usually not provided by the speakers. However, many people believe in these myths, having never seen a proof of any. so I'd like to bust a couple of them.

What C++ is good at is the ratio between plausibility to describe abstract and high-level concepts and performance of the compiled program. No wonder that the argument about the performance penalty of these concepts never gets old.

Background

On of these concepts is, of course, virtual functions. Their calling mechanisms are often called "slow", or, at least, "unclear". Of course, "unclear" should mean "hard to predict", but most likely it means "I couldn't/didn't try to understand it". Let's fix this.

What about other compilers?

Indeed, I only take into account the implementation of C++ depths on one compiler, and that's the C++ compiler from the famous GNU Compiler Collection.

What happens on other compilers, you may ask. Actually I don't know. And I think it's a bit out-of-topic here. The purpose of this article is to advocate C++ as a language by showing that there exists at least one reasonably fast implementation of it. If it happens that your compiler generates worse code and the assessments made here don't hold for it, address your compiler vendor with this question.

C++ language is governed by a standard, but the way it's implemented under the hood is left for a compiler vendor to define. However, developers of certain compilers attempt to govern parts of "binary" behavior of their implementation of C++ language by separate standards. In this article I will follow the "standard", that GCC follows: Itanium C++ ABI. Contrary to the title, it's followed on other architectures as well, particularly, on x86.

How are the virtual functions implemented?

From very old times virtual functions were implemented as "vritual tables". Being successors of function pointers as the way to control program flow (contrary to pointers as a way to implement first-class functions), virtual functions hide these pointers into an implementation-defined structure named, again, "virtual table" (or just "vtable"). Vtable is common for all objects of a giver runtime type, and the pointer to the proper vtable is stored in each object.

A key to understanding a vtable and why it's performant is that it's not a table, but, rather, an array:

(vtable for QTableWidget class)

00  QTableWidget::metaObject() const
01  QTableWidget::qt_metacast(char const*)
02  QTableWidget::qt_metacall(QMetaObject::Call, int, void**)
03  QTableWidget::~QTableWidget()
04  QTableWidget::~QTableWidget()
05  QTableWidget::event(QEvent*)
06  QObject::eventFilter(QObject*, QEvent*)
07  QTableView::timerEvent(QTimerEvent*)
08  QObject::childEvent(QChildEvent*)
09  QObject::customEvent(QEvent*)
010 QObject::connectNotify(char const*)
011 QObject::disconnectNotify(char const*)

(snipped)

For each function the index is known at compile time. For example, if the code calls tableWidgetPtr->timerEvent(event), the calling algorithm loads function pointer from vtable[7] cell (number 7 corresponds to timerEvent() member function) and calls it instead. Not for all cases it is that simple. Sometimes (we'll show it later) it is a bit more hard. But there's one common truth: no dynamic distpatching by function name, actual base class etc ever happens during virtual function call. This "7" number is precompiled into the object code, not looked up at runtime. C++ is not a modern interpreted language, where you can run an object method a user has just written in a textbox.

Performance assessment

To measure performance I will take into account "shifted-pointer dereferences" (loading base[offset] from memory into a register) and jump instructions. I am not too much into assembly language, but, I guess, these operations take more time than arithmetics on registers and stuff.

So, how slower is it than calling a usual, non-virtual function? For this simple case (single-inheritance) it takes, in addition to mere jump instruction that happens on a usual function call, loading a vtable pointer vp from the body of the object and loading one pointer from an already known index of a virtual table vp[index_of_func] (probably, triggering a cache-miss).

One constant-shifted pointer dereference. How short should your function be for it to be noticeable?

More tricky inheritance schemes

What is "inheritance branch"?

A non-common term I use here is inheritance branch. For a given inheritance graph that's a maximal chain where each class is a primary base of previous one. A primary base is the first non-virtual class with virtual functions or, if none, a first nearly empty virtual base or, if none, a first virtual base. By "first" it's meant "first one in list of base classes in the declaration of derived ones". (See here, case 2b for reference.)

Here's the example of inheritance hierarchy:

The inheritance branches are red, green, olive, blue, cyan. That's the order the sub-objects will be laid out in the complete, most derived one (ecah branch has one virtual table and is laid out near the same offset). First, branches with non-virtual roots are laid out in "depth-first" order; then, branches, roots of which are virtual, are laid out in the same depth-first order.

For more tricky inheritance schemes the processing will be more complex. If your class has a multiple-inheritance hierarchy, it will actually have several vtables, one for each separate inheritance branch (see side note). Consider you have an hierarchy of form

Then the first inheritance branch of C will be C-A, and the second one will be B. The thing is that a virtual table pointer for some class B in C is located right under pointer to (B*)c subobject of C c object (so, the first vtable pointer is located right at the first bytes of the complete object). Therefore a pointer to C can be immediately dereferenced to get a proper virtual table only for aa() function call. For bb() function call the pointer cp should be adjusted to point to B-in-C subobject and only then a dispatching mechanism described above, for single-inheritance case, should be performed. The adjustment is by a fixed, known at compile-time offset.

But you can't immediately call the body of the final overrider for a function distpatched this way? Assume that cp actually contains object of more derived class:

If a pointer to B-in-C would be passed as "this" pointer to bb's implementation, the this->i, being implemented as *(this+offset_of_i) would result into breaking the object's border (i is at the first bytes of the object, while "B-in-C" is allocated after i). An adjustment to this pointer is needed for such caller. A special function that does small adjustment before jumping to an actual user's function is called "thunk". Pointers to these thunks are inserted in vtables instead of raw function pointers. In our example, a thunk emitted for bb-in-D virtual function would adjust pointer by an offset (known at compile time for non-virtual bases and taken from a special part of vtable for virtual bases).

Therefore, as compared to usual function call, we need up to three pointer loads (4 in case of calling a virtual base) with one cache miss and--sometimes--a jump. Since thunks may be consecutive, jumps may be replaced by mere additions:

    a2: /* Thunk for non-virtual base A2.  */
        this += offsetof (B, A1) - offsetof (B, A2)
        /* Fall through.  */
    a1: /* Thunk for non-virtual base A1.  */
        this += offsetof (B, C) - offsetof (B, A1)
        /* Fall through.  */
    f:  /* Non-adjusting entry point.  */

But I won't try to judge its performance: compiler authors surely know better, when to do this optimization. We'll assume that the result of optimization is not slower than the straightforward case.

Calling virtual functions via interface pointers

Interface pattern is implemented in C++ as "public virtual" classes. Therefore, calling virtual function through the pointer to interface base requires 3 pointer adjusts and a jump. We need 3 adjusts instead of 4, since compiler knows that the subobject pointer points to contains a virtual function declaration, so we can save one offset (the one needed to look up for a proper vtable pointer).

Conclusion

Summing it up, overhead for a virtual function call compared to a call of usual function is the following (all shifts are known at compile-time):

for simple single inheritance -- one constant-shifted pointer dereference (1 cache miss);
for calling through interfaces (virtual bases) -- 3 constant-shifted pointer dereferences and a jump (that may be substituted with several increments/decrements of a value in a register);
more complex inheritance schemes may lead to 4 pointer derefs (2 cache misses).

Busted?

Comments imported from the old website

Gautier Portet (id) on 25 September 2010 commented:

Virtual methods are slow. Not because of the vtable implementation, but just because they are called. Virtuals cannot be inlined, and that's slow. But this is only noticeable in high performance loops...

Programming as Controlling Mindless Zombies

November 30, 2009 646 word 4 min read

The question that bothers many developers and, maybe, even people in related topics is "What is Programming?"

Indeed, many of us do it every day (more seasoned ones taking a break on weekends), but who really bothered himself with a philosophy of what he's doing? Since I'm fond of making up my own philosophy, I'll try to pore upon the subject in several blog posts. In this one I'd address one side of it, and you, having read the title, already know which.

Computer programming is indeed controlling a horde of mindless zombies. Let's check definition of a zombie on wikipedia:

A zombie is a creature that appears in books and popular culture typically as a reanimated dead or a mindless human being. Stories of zombies originated in the Afro-Caribbean spiritual belief system of Vodou, which told of the people being controlled as laborers by a powerful wizard

First of all, computers are dumb. They really are. Just compare the time you need to devise an idea, to think it out with the time you have to spend to explain it to a computer. To a machine that understands only ugly, primitive texts that do not make sense to any human, unless he's a programmer. How many times did you scold your computer for being not as smart as you? For not understanding that here he shouldn't do this, because it's obvious that he should do that instead?

Recently I had been writing an OCaml program. OCaml is a very smart language that can infer the types of identifiers by looking at operations applied to them. That's right, he's smart--as long as things go the way he used to perceiving. But as soon as you write "separate the list to a list concatenated with an element" instead of "separate the list to an element concatenated with a list", he becomes unbelievably dumb. If it was a russian bureaucrat, I would even think he's doing it on purpose to extort a bribe...

So, the computers are totally incapable of doing anything not told to them explicitly. That's quite a good definition of being mindless.

But these mindless machines happen to do some useful work. And the complexity of this work, usually inhandleable by any human, sometimes gives a false sense that these machines possesses intellect. But that's not intellect of their own. The intellect within a binary is that of the programmer who coded it, of the tester who checked that the dumb machine doesn't crush all around, of the manager who gathered them, and of millions of people who invested their mindpower into compilers and interpreters--the things that make one mindless machine teach another. I can't help quoting the famous Conway's law:

organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

(quoting via Wikipedia). This law is a direct consequence of the fact that computers are dumb, and it's we, the humans, the organizations, that inculcate our own properties to them.

So the result of computer programming is a horde of creatures that carry the will of their masters. Well, sometimes not exactly what the masters meant, thus you see strange error messages here and there, where you really don't expect them. That's because the creatures are so dumb, that a separate programming engineering discipline is devoted to "talking to zombies" (it's also known as "coding"), and the discipline is not practiced well enough by some developers.

And the last thing that concludes the post. While it's always hard to instruct a zombie, it's sometimes also hard to make an already instructed zombie to fulfill your needs as user's. Therefore, users are sometimes required to posses the thing these creatures do not have: brains...

Mindless creatures that look for brains? That just can't sound more zomby! And it's precisely what we deal with while programming.

About this blog

November 30, 2009 708 word 4 min read

Hello! My name Pavel Shved, 21, I am a russian programmer and I'm a blogger now. More info about me is here. And this is is my blog, because of which I wrote this web-site.

On this blog, I express my own opinions only, and not those of my employer, or any of my former employers. Only my personal stuff belongs here. The employers' both official and non-official opinions and announcements reside elsewhere.

What is this blog about?

The title of this blog is self-descriptive and is well recognized by its target audience as a nice joke. Yep, it's about programming, the things that surround it, the things that bother me in it, the things I find interesting or useful.

I just start to learn the world of programming. Although I already know some stuff, there's still a lot to discover. And the things I encounter on that road are what I write about here. My posts diverge into two categories:

Serious post: sometimes I find out that something I learned is not described, known, or presented well. Or maybe I just want to feel like a teacher that has something to say. Then I write a "serious" blog post about it.
Relax post: and sometimes I notice amusing things, curiously connected concepts, or just want to express a non-common thought on a non-technical controversial question. Then I--you got it right--post in my blog.

I don't try to focus on particular technologies; neither in the blog, nor in career. Different ones amuse me over time, I praise them, study them, use them, criticize them, rant at them and finally condemn to personal oblivion (Delphi, for example, has made all the way through it). However, I'm just starting, so who knows what this blog project will turn into. Anyway, check the tag cloud to get a rough estimate of what I blog most about.

You stole design from "Coding Horror"!

No, I didn't! The design became just... well... like this.

It appears that Jeff Atwood's blog "Coding Horror", which I, among other blogs, regularly read, might really have influenced the design. But I noticed it post factum. After I completed the design that fits my sense of beauty, is logical, and easy to implement, it appeared very much like Jeff's one.

Maybe such set of CSSs can't be considered "design" at all, maybe it's not that uncommon, but, well, I'll just satisfy my shame with putting the proper acknowledgment. Sorry, Jeff.

Commenting

To comment a post, you should identify yourself. For now, you may login via OpenID. The OpenID you log in with will be published near your comment. Here's an explanation why the engine does that.

Commenting functionality is intentionally limited to single-threaded line. Please, be literate and polite.

Language

The language used here is... well... Any Russian would perceive it outright as "English", while any English-speaking reader would discern "bad English" in it, and I would accept is as compliment.

Nevertheless, you may comment in any language you want, provided I understand it (these are Russian, English and bad German).

Links to external sites, language of which is Russian, will be marked with small Russian flag, like this.

Why not Russian?

Indeed, although I was born and still, unfortunately, live in Russia, the fact that I blog in English is somewhat surprising. There are many reasons I use English. First, it's easier to write technical texts in English than in Russian. Second, English is an industrial standard for programming; many names and captions are in English and translating them all would be cumbersome. Third, I want to improve my English. And at last, it will increase my audience.

I was going to make a multilingual version of the site, but I realized that I just hate translating, as it doesn't create more information than it were before.

So, thank you and I hope you'll enjoy reading.

Comments imported from the old website

Roman Kecher on 05 January 2010 commented:

I liked the blog's name alot :)

I'll make an effort to visit from time to time as your blog seems pretty interesting. Good luck.

Pavel Shved on 07 January 2010 commented:

Thank you for feedback! Being the first commenter, you also made me fix the comments %).