Recent blog updates

Shown here are only posts related to c++. You can view all posts here.

Syntax elements? User-defined functions!

Many languages, especially those non-functional ones that originate from 20th century, have immutable syntax. Such symbols as +, (, or . have predefined meaning, which can't be changed by user.

Based on this, when you see a similar symbol, you think that its meaning is predefined by specification of the language. But it's not always true.

C++

The most beautiful example of operator overloading in C++ is "lazy evaluation". Assume you defined a matrix class. You may want to use operator overloading to work with it, so you could write:

The expression at the right-hand side will be evaluated like this: first A*B will be computed and stored in some internal temporary variable tmp1, then (tmp1*C) will be computed, and stored in tmp2, etc.

However, it's known that evaluation of matrix multiplication chain can be optimized. So instead of straightforward definition where multiplying two matrices returns a matrix, we could define it in such a way that it returns a special "chain handler". This "handler" would accumulate references to all matrices involved in the multiplication, and perform the actual multiplication only once, when it gets converted to matrix M as a part of the assignment.

C++ is notorious for its operator overloading. You can imbue any meaning to (nearly) any operator used in expressions. This means, that any simple expression, such as "a = b + c" could actually invoke a complex user-defined procedure.

Of course, you can't redefine anything. Syntax elements used to define control-flow can't be redefined, as well as some operators that provide. And, most importantly, you can't create your own operators. But even with the powers you have, you can hide some userspace functions behind the usual syntax.

Linux Shell

If you wrote or read Bash programs, you probably encountered such constructs:

The [...] "conditional" expressions are used to query file attributes and to compare strings. They are builtin shell operators...or not? Let's check:

pavel@lonely ~ $ ls -l "/usr/bin/["
-rwxr-xr-x 1 root root 35032 Jul 20 01:53 /usr/bin/[

What the hell? So these brackets are actually small programs that merely take the subsequent tokens as command-line arguments? Well, they could be:

pavel@lonely ~ $ "/usr/bin/[" -f /bin/bash ] && echo Yes
Yes

Thankfully, Bash itself doesn't invoke them (it uses built-in implementation of this functionality). You can check it with strace or by temporarily removing this file. Perhaps, this file is kept for compatibility with other shells.

This example demonstrates that what you think of as syntax elements could actually be a user-defined function.

OCaml

Haskell language has a special notation to make an arbitrary function infix. I.e. you could make an "operator" out of a simple function:

Prelude> min 1 2
1
Prelude> 1 `min` 2
1

However, another functional language, OCaml, doesn't have such syntax. It can define function as infix, but not "convert" it to infix as shown above. And there's no hope for it to work: it would require introducing additional syntax to language specification... would it?

# let ( /* ) x y = y x
  and ( */ ) x y = x y
  in 1 /*min*/ 2 ;;
- : int = 1

The idea is to define some functions that look and feel like "syntax", leaving language definition intact. The above code defines infix functions named "/*" and "*/". And if you saw code like this (1 /*min*/ 2) at distance from the definition of these "functions", you would really thing it's some kind of OCaml syntax.

Here's the origin of this idea.

***

Modern languages give users more freedom to define and re-define meaning of what looks like syntax elements. This freedom can be used to write more beautiful and concise code, as shown in the examples above. By the way, do you know more?

Read on | Comments (2) | Make a comment >>


Deleted functions in C++0x and binary compatibility

Having read a recent version of C++ ABI standard, I noticed that certain bits of it were revised (the version of the standard hasn't nevertheless been changed). Particularly, a vtable entry of a mysterious "deleted virtual function" was documented. I thought that it could influence binary compatibility, and here's what I found out.

What deleted functions are for?

C++0x introduces another form of a function definition -- deleted function:

When a function is deleted, it means that calling it is prohibited, but substitution of it is not a failure. This is mainly to be used instead of making constructors private and providing abort()ing definitions to them (to explicitly prohibit copying objects, as in example above):

Another use-case is to prohibit certain type conversions. The use cases are well described at the relevant wikipedia entry.

In other words deleted function means "the function is declared but it actually doesn't exist."

What is SFINAE?

SFINAE is an acronym for Substitution Failure Is Not An Error. It is used for tricky compile-time template-based lookups that add a lot of flexibility to C++ language. The most obvious case, from the Wikipedia entry is this:

However, sometimes SFINAE seems to empower C++ beyond imagination of an average C++ programmer. It can be used to:

Yes, C++ can do this; with pain and squeak, but it can. A good explanation of WTF??! can be found here.

That's all it could be used for. It can't benefit SFINAE, since deleted functions do not influence lookup rules, so a deleted function should be considered a proper substitution; this, in turn, leads to compile error because deleted functions can't be used in any way, even if they're not called. However, this sometimes is of benefit, since we may want certain lookups to fail, instead of silently picking a badly-working substitution.

So the main purpose of deleted functions is additional compile-time checks. Not even altering the way a code that uses certain functions and classes works, but mere checking. However, hiding constructors arises issues that concern binary compatibility. Let's check if there's anything interesting.

Binary compatibility issues

In the problem of binary compatibility we consider two versions of a C++ library: A and B. We try to write B in such a way that any program built against A lib keeps functioning if there's just a B library in the system. I usually restrict it to GCC compiler, because I don't know much about the other ones :-) (and because of the means described in the other blog post about C++).

If a function is declared deleted, then a) it's considered inline, and b) it can't be called by a complying program.

The property b) is interesting since it influences source-code compatibility. In my research I assumed that binary compatibility of libraries should only be retained for source compatible programs, i.e. for those which compile unmodified with both versions of a library.

This means that if from a deleted function in library A the delete specifier was removed in library B, then it can't influence anything, because that function wasn't referred to in any program built against A library. Moreover, if it's the case of a virtual function, the claim cited above of ABI standard makes virtual table not change in size.

This also means that if a function gains delete specifier in B library, the source compatibility requires all programs that are built against A library... just not use that function. This means that anything can happen on binary level (note again that a slot in vtable is kept) with this function, and it won't influence any of older programs.

API modifications: would delete help?

So for source-compatible programs deleted functions introduce absolutely no problems. I'd like to refrain from analyzing source incompatible programs, because, strictly speaking, we could then dive into unfruitful analysis of compatibility of sound library and instant-messaging libs.

However, there are practical patterns of API changes that break source compatibility but still make sense. For example, we want to deprecate a function, so that newly linked programs (against B library) will use a safer equivalent, while those linked against A will keep working. Would delete be of help here? (That's actually the first thing I thought about when I saw delete modifier).

Unfortunately, it wouldn't. Deleted function don't have a body; it's just not emitted, because they're inline and... erm, inapplicable to calls, by definition. Moreover, providing a body for a function previously declared "deleted" is a violation of one-definition rule. So, unless you're doing dirty hacks, when the header you compile library with and the header you expose to user are different, you're out of luck here.

What would help is making functions private. If you want to safely "delete" your virtual function, you should just make it private (and ditch all references to it from your internals, of course). The virtual tables won't change in any way. For non-members, where "private" is not available, the best you could is putting a line in your documentation "DON'T CALL THIS!"

Of course, after some time you may declare your function deleted thus making all programs that still use it incapable to start (dynamic linker will signal unmet dependency). If the function was virtual, no problems with virtual tables will arise as well. That seems like a sane and safe deprecation method (normal-private-deleted).

No problems may arise if a private abort()ing function was declared deleted in the new library version. Such function couldn't be referenced from the complying programs (unless you for an insane reason did this in your public inline member functions). So no dependency on their symbols would be introduced.

Conclusion

The first thing I thought when I saw a delete specifier was, "Wow, that is probably cool stuff for binary compatibility!" Indeed, it helps to remove obsolete functions (and break compatibility if it's intentional), but in most cases (when it's non-virtual) it's no different from just removing this function. So deleted functions don't bring any new concepts to binary compatibility. Nevertheless, to be fair, they don't bring in anything harmful as well.

Read on | Comments (0) | Make a comment >>


How I applied for a web server developer, or why the fastest servers are written in C instead of C++

I realized why I was so aggressive in attempts to guide one of the latest StackOverflow questions so it would gain useful answers and wouldn't be closed as subjective. The question title is "Why are most really fast servers written in C instead of C++", and I really wondered why they are. Because that's... a bit personal to me. So, let me tell you this story.

Some time ago, after I suffered a fiasco to get a job in the USA, I thought it was a good idea to change my career to a less ivory-tower-ish. I thought about what was going on in the industry, and decided that since everything was moving to web, it would be a fruitful choice to become a web server developer. So I sent my resume to three companies that were hiring a newbie Linux C++ web server developer. Being under impression of "Smart and Gets things done" stuff, I wrote a resume that way and explicitly stated that I don't know anything about web serving.

First

One company, which is the top Russian search engine, declined me with a note "Your skills are valuable, but we can't offer you a job now" (bold is mine). I still wonder what that means: am I good or bad? Should I get more experience and try again or that's just a polite rejection? But well, alright, that's the number-one company, so I didn't frown upon that.

Second

The other company was an emerging startup. They had a shiny big office with free coffee, and a large restroom (at least that's what they said). A sign of warning was that the team failed its previous startup, but hell, why not give it a try? I was contacted by the CEO (it's a small startup, isn't it? Anyway in one Silicon Valley startup I was also interviewed by a CEO, so that didn't bother me) and was sent a sample task.

The task was to develop a C++ anti-DDOS module that drops any IP that exceeds a connection limit per a prespecified time. The task had been sent without any prior notice, and had a week-long time limit (who cares that I have plans for the week?). Period.

Well, alright. I didn't have any experience with Apache and web serving at all, so I started googling like crazy. After five hours of googling, I learned how to create apache2 modules, I devised a way to create a C++, not a C, module and I wrote a module that worked but was thread-local (apache starts a lot of threads). Essentially it was useless, but, well, spending more than 5 hours for a job application is a commonly acknowledged overkill!

I thought it did prove that I was a fast learner and a capable programmer. I sent that solution with notes how it should be improved to make up a complete module. However, "the architect was not satisfied with my code," and I was asked to finish what I had been assigned to. A note that I already had invested enough time for them to see if I was capable good enough to do the job was replied with a hilarious letter. It happens that I don't profess the company's philosophy that values result rather than capability! And unless I finish the code I'm not accepted.

The startup I applied to has, I guess, already failed; they still are not spoken about--whole google output is their SEO technologies. And they still require a registration on their website to all applicants--pathetic, isn't it?

Well, if the company values working for it for free, then it indeed is of a different philosophy than me. Good luck to these guys, I hope they won't fail many startups until they learn something.

Third

The reasons why I don't and won't apply to Google deserve a separate post, I think.

The third was another top Russian Internet company. They assigned an interview in their shiny big office with free a pong table, PS3 avaliable, as well as the usual free coffee and a large restroom! The interview started with a question "Why didn't you apply to Google?" -- apparently because of my Summer of Code participation (or because I was a candidate to replace a guy who transferred to Google from there).

The interview went smoothly, even considering that I didn't know anything about how HTTP or web servers work. After more usual questions, I was given a coding assignment. The clouds started gathering: they asked to write a C program to parse apache log and detect how many different IP addresses were in it. "Why not C++?" I asked, "it's easy and fast in C++!" The reply was that they wanted to judge my algorithm skills, not the knowledge of standard libraries. Well, OK, that seemed fair.. So, let's reckon how I did it in programming contests... *rubbing hands*

In 20 minutes the program was ready, but the remote server I was ssh-ed to suddenly went out of space, and the source somehow managed to get erased after a failed attempt to save it (Vim let me down this time). I spent another 20 minutes to code it again; it was all OK, and the interviewer even didn't see anything to criticize my code about.

The interview seemed to have gone great. The architect that interviewed me happened to come from the same university as me, was a fan of Vim as well, and sounded pleasant. We proceeded to discussing my salary, I made an offer, and there it happened.

"Okay, now I must confess," the architect started, "We lied to you a bit. All our software is actually written in C. Although we were looking for a C++ developer, you'll have to do C programming mostly."

O_O

The thing, yet unknown to the interviewer, is that I hate C; I hate it as much as I love C++. Feeling so deceived, I burst into explaining that writing in C is disgusting, since C is the new assembler, regardless of that I can do it quite well... No wonder I didn't get this job either.

Of course, I asked why they did development in C. And the answer was, "well, we sometimes feel urges to start rewriting it all in C++, module by module. But then we question ourselves—"but why?", and leave things as they are".

***

And now I also question myself—"why?" The fact that servers are mostly developed in C, I think, was crucial for my life. Perhaps, if it weren't the case, I would already have said goodbye to science and my masters studies, and would become a full-time corporate server developer. But sometimes at nights I still wonder why... why didn't they rewrite it in C++? And that's why I wanted that SO question answered.

Read on | Comments (0) | Make a comment >>


Binary compatibility of C++ Shared Libraries in Linux

A year ago I've completed my first "scientific" paper. I presented it in a local conference for junior researches. It actually didn't have much effect: as far as I understood no one at the conference was into the subject (ah, no, I'm wrong, there was one guy who developed boost::build). I wanted to fix it (since it contained minor bugs) for a very long time. But since I haven't done it, then, I suppose, I will never manage to fix it.

So here it is. A pdf text is available online freely, and there's even a ppt presentation.

Small abstract: it aims to describe why C++ libraries may lose binary compatibility. It also analyzes if certain compatibility restrictions may be relinquished if the user of the library is only allowed to use a subset of C++ by library's specifications. No one did such analysis before.

The paper is also a somewhat successful attempt to analyze C++ with scientific method. It appeared that this language is not eligible for such analysis, since it's just too huge. But anyway this is the most complete--and thus the most useless--compatibility guide. It was quoted by GCC folks in their manual, but that's an only known reference; and I have no idea how they found my article.

The developer of ABI Compliance Checker used the knowledge gained during this study, and this checker happens to be a working tool. This proves that my studies haven't been useless or erroneous.

Well, on this weekend I'm committing another article to a conference; and I hope it will be less useless to the world. Wish me luck.

Read on | Comments (0) | Make a comment >>


Busting C++ myths: virtual function calls are slow

This small article begins a series of "Busting C++ myths" posts.

Here and there, over the Internet, you sometimes see unproved claims that "C++ is slow", "C++ ABI is unstable" etc. The rationale behind these claims is usually not provided by the speakers. However, many people believe in these myths, having never seen a proof of any. so I'd like to bust a couple of them.

What C++ is good at is the ratio between plausibility to describe abstract and high-level concepts and performance of the compiled program. No wonder that the argument about the performance penalty of these concepts never gets old.

Background

On of these concepts is, of course, virtual functions. Their calling mechanisms are often called "slow", or, at least, "unclear". Of course, "unclear" should mean "hard to predict", but most likely it means "I couldn't/didn't try to understand it". Let's fix this.

What about other compilers?

Indeed, I only take into account the implementation of C++ depths on one compiler, and that's the C++ compiler from the famous GNU Compiler Collection.

What happens on other compilers, you may ask. Actually I don't know. And I think it's a bit out-of-topic here. The purpose of this article is to advocate C++ as a language by showing that there exists at least one reasonably fast implementation of it. If it happens that your compiler generates worse code and the assessments made here don't hold for it, address your compiler vendor with this question.

C++ language is governed by a standard, but the way it's implemented under the hood is left for a compiler vendor to define. However, developers of certain compilers attempt to govern parts of "binary" behavior of their implementation of C++ language by separate standards. In this article I will follow the "standard", that GCC follows: Itanium C++ ABI. Contrary to the title, it's followed on other architectures as well, particularly, on x86.

How are the virtual functions implemented?

From very old times virtual functions were implemented as "vritual tables". Being successors of function pointers as the way to control program flow (contrary to pointers as a way to implement first-class functions), virtual functions hide these pointers into an implementation-defined structure named, again, "virtual table" (or just "vtable"). Vtable is common for all objects of a giver runtime type, and the pointer to the proper vtable is stored in each object.

A key to understanding a vtable and why it's performant is that it's not a table, but, rather, an array:

(vtable for QTableWidget class)

00  QTableWidget::metaObject() const
01  QTableWidget::qt_metacast(char const*)
02  QTableWidget::qt_metacall(QMetaObject::Call, int, void**)
03  QTableWidget::~QTableWidget()
04  QTableWidget::~QTableWidget()
05  QTableWidget::event(QEvent*)
06  QObject::eventFilter(QObject*, QEvent*)
07  QTableView::timerEvent(QTimerEvent*)
08  QObject::childEvent(QChildEvent*)
09  QObject::customEvent(QEvent*)
010 QObject::connectNotify(char const*)
011 QObject::disconnectNotify(char const*)

(snipped)

For each function the index is known at compile time. For example, if the code calls tableWidgetPtr->timerEvent(event), the calling algorithm loads function pointer from vtable[7] cell (number 7 corresponds to timerEvent() member function) and calls it instead. Not for all cases it is that simple. Sometimes (we'll show it later) it is a bit more hard. But there's one common truth: no dynamic distpatching by function name, actual base class etc ever happens during virtual function call. This "7" number is precompiled into the object code, not looked up at runtime. C++ is not a modern interpreted language, where you can run an object method a user has just written in a textbox.

Performance assessment

To measure performance I will take into account "shifted-pointer dereferences" (loading base[offset] from memory into a register) and jump instructions. I am not too much into assembly language, but, I guess, these operations take more time than arithmetics on registers and stuff.

So, how slower is it than calling a usual, non-virtual function? For this simple case (single-inheritance) it takes, in addition to mere jump instruction that happens on a usual function call, loading a vtable pointer vp from the body of the object and loading one pointer from an already known index of a virtual table vp[index_of_func] (probably, triggering a cache-miss).

One constant-shifted pointer dereference. How short should your function be for it to be noticeable?

More tricky inheritance schemes

What is "inheritance branch"?

A non-common term I use here is inheritance branch. For a given inheritance graph that's a maximal chain where each class is a primary base of previous one. A primary base is the first non-virtual class with virtual functions or, if none, a first nearly empty virtual base or, if none, a first virtual base. By "first" it's meant "first one in list of base classes in the declaration of derived ones". (See here, case 2b for reference.)

Here's the example of inheritance hierarchy:

The inheritance branches are red, green, olive, blue, cyan. That's the order the sub-objects will be laid out in the complete, most derived one (ecah branch has one virtual table and is laid out near the same offset). First, branches with non-virtual roots are laid out in "depth-first" order; then, branches, roots of which are virtual, are laid out in the same depth-first order.

For more tricky inheritance schemes the processing will be more complex. If your class has a multiple-inheritance hierarchy, it will actually have several vtables, one for each separate inheritance branch (see side note). Consider you have an hierarchy of form

Then the first inheritance branch of C will be C-A, and the second one will be B. The thing is that a virtual table pointer for some class B in C is located right under pointer to (B*)c subobject of C c object (so, the first vtable pointer is located right at the first bytes of the complete object). Therefore a pointer to C can be immediately dereferenced to get a proper virtual table only for aa() function call. For bb() function call the pointer cp should be adjusted to point to B-in-C subobject and only then a dispatching mechanism described above, for single-inheritance case, should be performed. The adjustment is by a fixed, known at compile-time offset.

But you can't immediately call the body of the final overrider for a function distpatched this way? Assume that cp actually contains object of more derived class:

If a pointer to B-in-C would be passed as "this" pointer to bb's implementation, the this->i, being implemented as *(this+offset_of_i) would result into breaking the object's border (i is at the first bytes of the object, while "B-in-C" is allocated after i). An adjustment to this pointer is needed for such caller. A special function that does small adjustment before jumping to an actual user's function is called "thunk". Pointers to these thunks are inserted in vtables instead of raw function pointers. In our example, a thunk emitted for bb-in-D virtual function would adjust pointer by an offset (known at compile time for non-virtual bases and taken from a special part of vtable for virtual bases).

Therefore, as compared to usual function call, we need up to three pointer loads (4 in case of calling a virtual base) with one cache miss and--sometimes--a jump. Since thunks may be consecutive, jumps may be replaced by mere additions:

    a2: /* Thunk for non-virtual base A2.  */
        this += offsetof (B, A1) - offsetof (B, A2)
        /* Fall through.  */
    a1: /* Thunk for non-virtual base A1.  */
        this += offsetof (B, C) - offsetof (B, A1)
        /* Fall through.  */
    f:  /* Non-adjusting entry point.  */

But I won't try to judge its performance: compiler authors surely know better, when to do this optimization. We'll assume that the result of optimization is not slower than the straightforward case.

Calling virtual functions via interface pointers

Interface pattern is implemented in C++ as "public virtual" classes. Therefore, calling virtual function through the pointer to interface base requires 3 pointer adjusts and a jump. We need 3 adjusts instead of 4, since compiler knows that the subobject pointer points to contains a virtual function declaration, so we can save one offset (the one needed to look up for a proper vtable pointer).

Conclusion

Summing it up, overhead for a virtual function call compared to a call of usual function is the following (all shifts are known at compile-time):

  1. for simple single inheritance -- one constant-shifted pointer dereference (1 cache miss);
  2. for calling through interfaces (virtual bases) -- 3 constant-shifted pointer dereferences and a jump (that may be substituted with several increments/decrements of a value in a register);
  3. more complex inheritance schemes may lead to 4 pointer derefs (2 cache misses).

Busted?

Read on | Comments (1) | Make a comment >>


More posts about c++ >>