Recent blog updates

Shown here are only posts related to gcc. You can view all posts here.

Deleted functions in C++0x and binary compatibility

Having read a recent version of C++ ABI standard, I noticed that certain bits of it were revised (the version of the standard hasn't nevertheless been changed). Particularly, a vtable entry of a mysterious "deleted virtual function" was documented. I thought that it could influence binary compatibility, and here's what I found out.

What deleted functions are for?

C++0x introduces another form of a function definition -- deleted function:

When a function is deleted, it means that calling it is prohibited, but substitution of it is not a failure. This is mainly to be used instead of making constructors private and providing abort()ing definitions to them (to explicitly prohibit copying objects, as in example above):

Another use-case is to prohibit certain type conversions. The use cases are well described at the relevant wikipedia entry.

In other words deleted function means "the function is declared but it actually doesn't exist."

What is SFINAE?

SFINAE is an acronym for Substitution Failure Is Not An Error. It is used for tricky compile-time template-based lookups that add a lot of flexibility to C++ language. The most obvious case, from the Wikipedia entry is this:

However, sometimes SFINAE seems to empower C++ beyond imagination of an average C++ programmer. It can be used to:

Yes, C++ can do this; with pain and squeak, but it can. A good explanation of WTF??! can be found here.

That's all it could be used for. It can't benefit SFINAE, since deleted functions do not influence lookup rules, so a deleted function should be considered a proper substitution; this, in turn, leads to compile error because deleted functions can't be used in any way, even if they're not called. However, this sometimes is of benefit, since we may want certain lookups to fail, instead of silently picking a badly-working substitution.

So the main purpose of deleted functions is additional compile-time checks. Not even altering the way a code that uses certain functions and classes works, but mere checking. However, hiding constructors arises issues that concern binary compatibility. Let's check if there's anything interesting.

Binary compatibility issues

In the problem of binary compatibility we consider two versions of a C++ library: A and B. We try to write B in such a way that any program built against A lib keeps functioning if there's just a B library in the system. I usually restrict it to GCC compiler, because I don't know much about the other ones :-) (and because of the means described in the other blog post about C++).

If a function is declared deleted, then a) it's considered inline, and b) it can't be called by a complying program.

The property b) is interesting since it influences source-code compatibility. In my research I assumed that binary compatibility of libraries should only be retained for source compatible programs, i.e. for those which compile unmodified with both versions of a library.

This means that if from a deleted function in library A the delete specifier was removed in library B, then it can't influence anything, because that function wasn't referred to in any program built against A library. Moreover, if it's the case of a virtual function, the claim cited above of ABI standard makes virtual table not change in size.

This also means that if a function gains delete specifier in B library, the source compatibility requires all programs that are built against A library... just not use that function. This means that anything can happen on binary level (note again that a slot in vtable is kept) with this function, and it won't influence any of older programs.

API modifications: would delete help?

So for source-compatible programs deleted functions introduce absolutely no problems. I'd like to refrain from analyzing source incompatible programs, because, strictly speaking, we could then dive into unfruitful analysis of compatibility of sound library and instant-messaging libs.

However, there are practical patterns of API changes that break source compatibility but still make sense. For example, we want to deprecate a function, so that newly linked programs (against B library) will use a safer equivalent, while those linked against A will keep working. Would delete be of help here? (That's actually the first thing I thought about when I saw delete modifier).

Unfortunately, it wouldn't. Deleted function don't have a body; it's just not emitted, because they're inline and... erm, inapplicable to calls, by definition. Moreover, providing a body for a function previously declared "deleted" is a violation of one-definition rule. So, unless you're doing dirty hacks, when the header you compile library with and the header you expose to user are different, you're out of luck here.

What would help is making functions private. If you want to safely "delete" your virtual function, you should just make it private (and ditch all references to it from your internals, of course). The virtual tables won't change in any way. For non-members, where "private" is not available, the best you could is putting a line in your documentation "DON'T CALL THIS!"

Of course, after some time you may declare your function deleted thus making all programs that still use it incapable to start (dynamic linker will signal unmet dependency). If the function was virtual, no problems with virtual tables will arise as well. That seems like a sane and safe deprecation method (normal-private-deleted).

No problems may arise if a private abort()ing function was declared deleted in the new library version. Such function couldn't be referenced from the complying programs (unless you for an insane reason did this in your public inline member functions). So no dependency on their symbols would be introduced.

Conclusion

The first thing I thought when I saw a delete specifier was, "Wow, that is probably cool stuff for binary compatibility!" Indeed, it helps to remove obsolete functions (and break compatibility if it's intentional), but in most cases (when it's non-virtual) it's no different from just removing this function. So deleted functions don't bring any new concepts to binary compatibility. Nevertheless, to be fair, they don't bring in anything harmful as well.

Read on | Comments (0) | Make a comment >>


Binary compatibility of C++ Shared Libraries in Linux

A year ago I've completed my first "scientific" paper. I presented it in a local conference for junior researches. It actually didn't have much effect: as far as I understood no one at the conference was into the subject (ah, no, I'm wrong, there was one guy who developed boost::build). I wanted to fix it (since it contained minor bugs) for a very long time. But since I haven't done it, then, I suppose, I will never manage to fix it.

So here it is. A pdf text is available online freely, and there's even a ppt presentation.

Small abstract: it aims to describe why C++ libraries may lose binary compatibility. It also analyzes if certain compatibility restrictions may be relinquished if the user of the library is only allowed to use a subset of C++ by library's specifications. No one did such analysis before.

The paper is also a somewhat successful attempt to analyze C++ with scientific method. It appeared that this language is not eligible for such analysis, since it's just too huge. But anyway this is the most complete--and thus the most useless--compatibility guide. It was quoted by GCC folks in their manual, but that's an only known reference; and I have no idea how they found my article.

The developer of ABI Compliance Checker used the knowledge gained during this study, and this checker happens to be a working tool. This proves that my studies haven't been useless or erroneous.

Well, on this weekend I'm committing another article to a conference; and I hope it will be less useless to the world. Wish me luck.

Read on | Comments (0) | Make a comment >>


Busting C++ myths: virtual function calls are slow

Here and there, over the Internet, you sometimes see unproved claims that "C++ is slow", "C++ ABI is unstable" etc. The rationale behind these claims is usually not provided by the speakers. However, many people believe in these myths, having never seen a proof of any. so I'd like to bust a couple of them.

What C++ is good at is the ratio between plausibility to describe abstract and high-level concepts and performance of the compiled program. No wonder that the argument about the performance penalty of these concepts never gets old.

Background

On of these concepts is, of course, virtual functions. Their calling mechanisms are often called "slow", or, at least, "unclear". Of course, "unclear" should mean "hard to predict", but most likely it means "I couldn't/didn't try to understand it". Let's fix this.

What about other compilers?

Indeed, I only take into account the implementation of C++ depths on one compiler, and that's the C++ compiler from the famous GNU Compiler Collection.

What happens on other compilers, you may ask. Actually I don't know. And I think it's a bit out-of-topic here. The purpose of this article is to advocate C++ as a language by showing that there exists at least one reasonably fast implementation of it. If it happens that your compiler generates worse code and the assessments made here don't hold for it, address your compiler vendor with this question.

C++ language is governed by a standard, but the way it's implemented under the hood is left for a compiler vendor to define. However, developers of certain compilers attempt to govern parts of "binary" behavior of their implementation of C++ language by separate standards. In this article I will follow the "standard", that GCC follows: Itanium C++ ABI. Contrary to the title, it's followed on other architectures as well, particularly, on x86.

How are the virtual functions implemented?

From very old times virtual functions were implemented as "vritual tables". Being successors of function pointers as the way to control program flow (contrary to pointers as a way to implement first-class functions), virtual functions hide these pointers into an implementation-defined structure named, again, "virtual table" (or just "vtable"). Vtable is common for all objects of a giver runtime type, and the pointer to the proper vtable is stored in each object.

A key to understanding a vtable and why it's performant is that it's not a table, but, rather, an array:

(vtable for QTableWidget class)

00  QTableWidget::metaObject() const
01  QTableWidget::qt_metacast(char const*)
02  QTableWidget::qt_metacall(QMetaObject::Call, int, void**)
03  QTableWidget::~QTableWidget()
04  QTableWidget::~QTableWidget()
05  QTableWidget::event(QEvent*)
06  QObject::eventFilter(QObject*, QEvent*)
07  QTableView::timerEvent(QTimerEvent*)
08  QObject::childEvent(QChildEvent*)
09  QObject::customEvent(QEvent*)
010 QObject::connectNotify(char const*)
011 QObject::disconnectNotify(char const*)

(snipped)

For each function the index is known at compile time. For example, if the code calls tableWidgetPtr->timerEvent(event), the calling algorithm loads function pointer from vtable[7] cell (number 7 corresponds to timerEvent() member function) and calls it instead. Not for all cases it is that simple. Sometimes (we'll show it later) it is a bit more hard. But there's one common truth: no dynamic distpatching by function name, actual base class etc ever happens during virtual function call. This "7" number is precompiled into the object code, not looked up at runtime. C++ is not a modern interpreted language, where you can run an object method a user has just written in a textbox.

Performance assessment

To measure performance I will take into account "shifted-pointer dereferences" (loading base[offset] from memory into a register) and jump instructions. I am not too much into assembly language, but, I guess, these operations take more time than arithmetics on registers and stuff.

So, how slower is it than calling a usual, non-virtual function? For this simple case (single-inheritance) it takes, in addition to mere jump instruction that happens on a usual function call, loading a vtable pointer vp from the body of the object and loading one pointer from an already known index of a virtual table vp[index_of_func] (probably, triggering a cache-miss).

One constant-shifted pointer dereference. How short should your function be for it to be noticeable?

More tricky inheritance schemes

What is "inheritance branch"?

A non-common term I use here is inheritance branch. For a given inheritance graph that's a maximal chain where each class is a primary base of previous one. A primary base is the first non-virtual class with virtual functions or, if none, a first nearly empty virtual base or, if none, a first virtual base. By "first" it's meant "first one in list of base classes in the declaration of derived ones". (See here, case 2b for reference.)

Here's the example of inheritance hierarchy:

The inheritance branches are red, green, olive, blue, cyan. That's the order the sub-objects will be laid out in the complete, most derived one (ecah branch has one virtual table and is laid out near the same offset). First, branches with non-virtual roots are laid out in "depth-first" order; then, branches, roots of which are virtual, are laid out in the same depth-first order.

For more tricky inheritance schemes the processing will be more complex. If your class has a multiple-inheritance hierarchy, it will actually have several vtables, one for each separate inheritance branch (see side note). Consider you have an hierarchy of form

Then the first inheritance branch of C will be C-A, and the second one will be B. The thing is that a virtual table pointer for some class B in C is located right under pointer to (B*)c subobject of C c object (so, the first vtable pointer is located right at the first bytes of the complete object). Therefore a pointer to C can be immediately dereferenced to get a proper virtual table only for aa() function call. For bb() function call the pointer cp should be adjusted to point to B-in-C subobject and only then a dispatching mechanism described above, for single-inheritance case, should be performed. The adjustment is by a fixed, known at compile-time offset.

But you can't immediately call the body of the final overrider for a function distpatched this way? Assume that cp actually contains object of more derived class:

If a pointer to B-in-C would be passed as "this" pointer to bb's implementation, the this->i, being implemented as *(this+offset_of_i) would result into breaking the object's border (i is at the first bytes of the object, while "B-in-C" is allocated after i). An adjustment to this pointer is needed for such caller. A special function that does small adjustment before jumping to an actual user's function is called "thunk". Pointers to these thunks are inserted in vtables instead of raw function pointers. In our example, a thunk emitted for bb-in-D virtual function would adjust pointer by an offset (known at compile time for non-virtual bases and taken from a special part of vtable for virtual bases).

Therefore, as compared to usual function call, we need up to three pointer loads (4 in case of calling a virtual base) with one cache miss and--sometimes--a jump. Since thunks may be consecutive, jumps may be replaced by mere additions:

    a2: /* Thunk for non-virtual base A2.  */
        this += offsetof (B, A1) - offsetof (B, A2)
        /* Fall through.  */
    a1: /* Thunk for non-virtual base A1.  */
        this += offsetof (B, C) - offsetof (B, A1)
        /* Fall through.  */
    f:  /* Non-adjusting entry point.  */

But I won't try to judge its performance: compiler authors surely know better, when to do this optimization. We'll assume that the result of optimization is not slower than the straightforward case.

Calling virtual functions via interface pointers

Interface pattern is implemented in C++ as "public virtual" classes. Therefore, calling virtual function through the pointer to interface base requires 3 pointer adjusts and a jump. We need 3 adjusts instead of 4, since compiler knows that the subobject pointer points to contains a virtual function declaration, so we can save one offset (the one needed to look up for a proper vtable pointer).

Conclusion

Summing it up, overhead for a virtual function call compared to a call of usual function is the following (all shifts are known at compile-time):

  1. for simple single inheritance -- one constant-shifted pointer dereference (1 cache miss);
  2. for calling through interfaces (virtual bases) -- 3 constant-shifted pointer dereferences and a jump (that may be substituted with several increments/decrements of a value in a register);
  3. more complex inheritance schemes may lead to 4 pointer derefs (2 cache misses).

Busted?

Read on | Comments (1) | Make a comment >>


More posts about gcc >>