Recent blog updates

Shown here are only posts related to languages. You can view all posts here.

The Pipe-Pipe-Equals

It is hard to come up with a google-friendly name for the ||= construct you see in some programming languages quite often, "pipe-pipe-equals" being the is the closest (other names include "double-pipe equals," "or-equal," or "double-or equals"). Why should we name this monster in the first place? While cryptic at first sight, it is a very convenient shorthand that, I believe, allows us to write cleaner code. Let's see how it works on this example:

Essentially this ||= construct is equivalent to other convenience operators, such as +=. x += 2; means x = x + 2. The "or" operator || usually denotes a weak logical "or" that doesn't try to compute the left-hand side if the right-hand side is true. Usually, meaningful values evaluate to true, while the only evaluating to false are NULL pointers/empty values, zeroes, boolean Falses, and, sometimes, empty strings.

Alternatively, you could set a default hash value, but ||= is used more widely, and also takes exactly one line... doesn't it? ;-)

The statement is used mostly to set a variable to a "default" value if it's "unset" at some point. This gives the most visible advantage when you initialized dictionary elements in a loop. Here's a piece that saves counts of each array element into a dictionary (Ruby):

In some languages (Perl, Javascript), you could not even bother with this, as += 1 on an unset value would result in its assignment to 1.

If you don't initialize output[x], you'll get a runtime error for trying to increment NULL. The advantage of ||= against other ways is that you don't repeat anything. You could have written the same piece as

Oh yes... we forgot about the most natural use of ||=, for booleans. Here's how we'd check if an array contains zeroes if we need to iterate over its elements for something else.

Pipe-pipe-equals is rarely used this "natural" way, though.

But here you have to type output[x] twice and add two more lines of code, which is a complete waste of screen space and, possibly, computing resources if the interpreter doesn't optimize the duplicated computation out. Let's take a look how pipe-pipe-equals works in different languages (we've already seen it in action in Ruby).

Perl

Perl was the first interpreted language I learned, and the first place I saw the pipe-pipe-equals operator in. It works as expected, but is used less often than the direct version of the weak logical "or" to specify default parameters of a function. Here's how you'd put a default to a hash bucket:

This works in a strict mode, too. Note that, in Perl, many things evaluate to false, including empty strings (I once even tried to emulate it in Ruby). To restrict this action to undefs only, use //= in Perl6 instead, which sinks to its right like the Pisa tower, and look as if you're trying to put a division while drunk.

Python

Python has no equivalent of this operator. You have to type the very[long(expression)] twice, and it will be computed twice. You have several different ways to do this:

The engineers more experienced in Python programming, though, assure me that this is not a big deal since you have a different way of supplying default arguments for a function, which seemingly covers half of the use cases for pipe-pipe-equals (this doesn't prevent Ruby from having both, though). Another half is covered by dictionary's dict.get(key, []) method, so that the code piece #1 can be written in a succinct manner. But I still miss it.

Bash (Linux shell)

Bash? Its language is so simplistic, and it looks creepy; how come it would have an equivalent of such a beautiful shortcut? Here it is:

This assigns y to variable if the former is unset or null (as per Bash manual). I mostly used it to set default parameters for environment variables user may or may not set before invoking the script.

C++

While |= is a syntactically correct expression (C++ does not have the "short-circuit" version of this expression), it doesn't do what we discussed here.

C++ is statically typed, so the result of the standard logical "or" is boolean. Retaining the nice semantics we find in a dynamically typed language would require it to be "either the type of the left-hand side or the type of the right-hand side". This is hard to pull in a pass-by-value statically typed language.

Pass-by-value semantics also means that not everything can be assigned a NULL, and not everything can be converted to boolean value. C++ has default arguments as well as Python, so the same reasoning could apply here. You'll have to be more verbose in C++. That's probably why only |= expression is available, which is only useful if its left-hand-side is bool (see sidebar above for similar usage.)

OCaml

Everything said above about C++ applies to OCaml as well. Moreover, OCaml, as a functional language, doesn't have a flawless support for mutation, and pipe-pipe-equals statement its inherently mutational. However, its matching operator would require us to use the very_long_variable twice. However, OCaml and other functional languages have a very interesting construct called "option". If something has "X option" type, it may contain either "nothing" or a value of x. Then, this value may be "unpacked" trough pattern matching:

let very_long_variable = match very_long_variable with None -> y | Some t -> t

here, t is not an shorthand for another very long expression; instead, it's just an identifier, written as is. The match...with allows us to "unpack" values of structured (algebraic) types with shorthands like this. Since this was too long, OCaml has made this a library function Option#default:

let very_long_variable = Option.default very_long_variable y

Anyway, OCaml programs are even more explicit than those in C++ and Python, so trying to tie pipe-pipe-equals into them is quite pointless.

Ruby

"Erm, we saw how it is in Ruby at the beginning," you might think. Well, I lied to you a bit. The thing is that, in Ruby, it is not strictly equivalent to an assignment to a result of logical "or". Which do you think x ||= y is equivalent to?

In Ruby, and only for ||= and &&=, it's the second. If you assign to something other than a local variables, what looks like an assignment is actually a method call (think properties,) and, if so, this assignment does not happen at all if the left-hand side of ||= is false. Which makes sense, but looks like a special case. Read more here.

Why You Might not Need This

Some argue that this operator is mostly useless, especially if their favourite language doesn't have it. Here are some arguments they list.

Redundancy

Indeed, this expression is redundant. You can do the same in a multiple different ways, All the examples I demonstrated above showed how to write essentially a simple if statement in a very short-hand form. The number of characters spared is probably not worth it to include support for this feature to a language designer's must-have checklist.

The statement discussed decreases the redundancy in code in return of broader language definition; each language seeks a balance between these, and often leaves the pipe-pipe-equals aside.

Default Function Arguments

The ability to specify default function argument (C++, Python, Ruby, but not Bash or Perl) covers many use-cases for ||=. However, this doesn't help fancy loops that fill complex structures, one of which we showed above. Nor helps it when you have to specify the default parameter anyway but the concrete value is not known at the time of coding, and is an optional field user may or may not fill.

Confusion

It is confusing. The pipe-pipe-equals requires explanations how it works. It becomes a special snowflake, different from its "mathematical" counterparts +=, if a complex language wants to make its very useful (see Ruby section above). While "confusion" as an excuse of not doing something is my pet peeve (and I tried to explain why), it indeed requires some effort to understand the mechanics. I'm sure that you'll love it once you understand it, and the way I learned about this operator is not by reading a textbook or "The Most Confusing and Obscure Programming Language Features Possible" newsletter, but by reading someone's code.

To Use or Not To Use?

When it comes to deciding whether or not to use pipe-pipe-equals in the "creepy" way we discussed throughout the post, the criticism somehow fades out. If the language supports this, it will be used. I haven't encountered any coding style document that bans this feature. In Ruby and Perl, it is considered as good of an idiom as many. So the answer on the question of whether or not you sohuld use pipe-pipe-equals (||=) in your programs is, definitely, "yes".

Read on | Comments (0) | Make a comment >>


Logging in Expressive Languages

I am not aware of any strict mathematical definition of expressiveness of a language. Informally, a language is more expressive if it allows you to achieve the same with less keystrokes. From this point of view, for instance, Perl or Python are more expressive than C. Of course, you can't always achieve with Python most things that C is capable of at low level, and therefore one may question whether Python is really more capable than C. But capability has little to do with expressiveness. In certain domains, one language may be more expressive than the other, and in other it could be just vice versa. (For instance, Python is more expressive in string operations while C is more expressive in low-level raw data representation.)

One may notice, however, that the majority of languages are Turing complete, and you basically may express the same with all of them. What is the difference in expressiveness then, as seen by a Practicioner rather than a Computer Scientist?

I usually use the following criterion. If you write one non-splittable expression instead of a chain of statements, the language is more expressive. Compare the following two programs that compute a sum of an array in C:

...and in Ruby:

..and, in C, we also should find a place to deallocate the array when it's no longer needed.

You may notice that Ruby can achieve with a single expression what C needs several operators and variable declarations for. Three Ruby features contribute to this, anonymous lambda functions, syntax sugar to pass them to functions (such as inject), and to avoid the really useless variable declarations. Isn't it always that beautiful?

In some cases it's not.

What does it take, to log?

Now let's try to add some logging to the programs above. Assume we actually want to hook into the loop, and make it print the square result it has calculated to a logging stream.

Let's take a look at the code again.

C:

Ruby:

Ewww! Why does our Ruby code looks like C? Where is our one-liner? Where is the expressiveness? How come we're explicitly introducing a temporary variable just like in C?

I spotted such patterns, "compute-log-return" in a lot of places in my practice. For i instance, the BLAST codebase I maintain, contains these here and there, because it's written in an expressive functional language (OCaml), and uses logging as a primary debugging mechanism. Because of it, the code that should look beautiful looks as an old prostitute trying to mask their weary skin with tons of makeup, and I'm not happy with this.

Trying to get asleep on a plane, I suddenly realized that this logging could be achieved with a better code. Since "assign-debugprint-return" is a recurring pattern, let's define a helper function

Then, the resultant Ruby code will look like this:

Now our Ruby code looks like Ruby again.

Of course, this is not limited to Ruby. Though less versatile, the sane function may be implemented for the C code as well.

And this brings us back to the original comparison with 1 line in Ruby and five--in C.

Discussion

while you may write such function for a less expressive language (as seen above), it will "stand out" in a large C code sheet and look like ugly duckling. What may benefit most are languages that mix functional and imperative coding styles, such as the aforementioned Ruby and OCaml. The reason why this helper might be useful is that it's relevant to the functional languages in two different ways.

First, functional language allow the programmer to define anonymous function inline, which makes them be frequently passed as parameters to other functions. In C, defining a separate function is, inmost cases, uglier than just adding a convenient logging shortcut.

Second, logging an intermediate result contradicts the functional level paradigm. In functional languages you define a computation as a whole rather than specify concrete sequence of actions. In case of logging, you have to violate a chain of functions with a side-effect operation. Of course, you can try to make it look nice, as we did, but the very reason we need the kludge is that we're doing something a bit wrong.

But it's not limited to functional languages. Most expressive languages that do not require from user a strict specification of type of every identifier can. For example, we might want to debug Perl's regular expression substitution with using a similar function:

(we couldn't use log because it conflicted with the mathematical logarithm function that bears the same name).

***

In any case, the "compute-log-return" function that encapsulated the debug print is a widely useful pattern. And it's especially useful when you're writing a program in an expressive language. Do not let the programming cruft conceal the actual idea of your code, use loggers that hide all the details.

Read on | Comments (0) | Make a comment >>


The Most Stupid Mistake You Can Make with Ruby

Programming languages have strict and mostly very compressed syntax. The origins of that are twofold.

First, actively used programming languages have a considerable history, and were started decades ago. Back then, screen space was limited, and, before that, there was little storage available for source code. Second, programmers have technically-oriented minds and like to endow abstract symbols with complex abstractions instead of using words to describe them. That's why, I guess, many programming languages heavily depend on correct usage of individual characters. Even as few characters as one may make a difference in program behavior as well as in whether program compiles.

Such caveats in different languages include:

  1. C++ templates. You can't write map<int,vector<int>> when you define a mapping from integers to arrays of integers, because C++ parsers thinks that >> as a bit-shift operator in an improper place. A correct program differs by a single space between the < signs.
  2. Makefile tabs. A rule body in the Unix Makefiles should be indented with a Tab. Spaces do not work. Being a software with more than 30 years of history, make had it fixed only a year ago.
  3. CSS delimiters. When you define a cascading style sheet, amd want to define the same block of style attributes for a certain class inside a certain tag, you write the selector as tag .class. It's just a space away from tag.class that defines the styles only for tag elements of class class.
  4. C-style equality in conditional statements. If you're a seasoned professional, you should have already forgotten about this. In many languages, including C, Ruby, Python, Javascript, C# and others, if (a = 1) is always true, since it's an assignment of 1 to a, followed by checking the a's value for truthfulness. The correct version of that has one extra equality sign: if (a == 1). More confusion is created by languages, where the first version is the legitimate way to compare values, such as Pascal.

Imagine how surprised I was when I realized that I made a variation of mistake #4 in my Ruby code today! Here's what it was about.

Ruby hash niceness

Named parameter is a way to specify function arguments at call by name (rather than by order, as in the standard function call notation).

Here you may find how the Named Parameter is implemented in various programming languages.

To emulate a Named Parameter Idiom, Ruby uses hashes and some syntax sugar. The last parameter of a function may be a hash, which maps from parameter names to values. Syntax sugar allows a programmer to write

This sugar is not unique to Ruby; Perl also supports it.

instead of

The :name notation specifies a symbolic constant, which effectively is an immutable string that is defined by only one ancillary character.

Such hashes and symbols seem as a very useful feature. It allows you to emulate DSL-s; here's an example of Ruby on Rails web framework routing configuration:

Until one day, after an hour of debugging you find yourself having written something like this:

See what's wrong here? Indeed, here's how the code of the function called might look like:

So, options[:attribute_check] should evaluate to false boolean value, but... :false is a totally different thing; it's an immutable string of five characters that evaluates to true instead! Just one colon that lurked into the code, and it made it behaving very wrong way.

Just like in a C-style typo, some expressions that are evaluated as true in boolean context look like those that are evaluated as false, and you should be careful with the borderline.

New named attribute definition style in Ruby

New named attribute passing style was not designed to address this problem. However, the abundance of colons in such a code makes it look worrying in case there is a mistake like the above:

You see that the mistake is easy to spot, because the conjunction between the name and the parameter value is so ugly, that it immediately draws attention. However, if you actually need to specify symbol as a value, then you'll have to look ugly with this style.

Moreover, you can't erase the space between the parameter name and value, because for this code:

Ruby parser will think that it's false method in an attribute_check object, as :: is a scope resolution operator, just like in C++. Space matters again, as in typo #1 desccribed above.

People say that this style resembles that of C# or JSON. So, maybe, it is a good idea to migrate to it. Only two things prevent me from doing this so far: it's not portable to previous version of Ruby, 1.8 (though it slowly becomes obsolete), and I find the old style look much more cute :-)

***

This was yet another typo that makes our programs behave differently than we expect them to. And again, it was just one character that breaks the expected behavior of if statements that implicitly convert the condition to boolean type. Unfortunately, while the new Ruby named parameter syntactic sugar could help, it sometimes looks even worse to me.

I hope this post will help you avoid the similar mistake if you code in Ruby.

I would like to end with a joke about a mythical C++ programmer leaving hist last commit after having been fired:

Happy debugging to you, indeed!

Read on | Comments (0) | Make a comment >>


Searching for the declarative language

I'm really tired of telling these dumb machines every instruction they should perform! I know that they only pretend to be dumb, and they can act clever if they're taught to.

After I implemented yet another graph traversing algorithm, after I wrote "if it doesn't exists, create, and only then proceed" for the million times, after I had troubles with parallelization of totally independent tasks just because I didn't write the program to be parallel, I felt that I'm done. Most programs I wrote had a similar scheme: loosely connected computations at different nodes with some dependencies between the results of them.

But over and over again I had to implement the exact sequence, while it really either doesn't matter, or is easily computable. I started looking for a language that takes this burden off me, for the language that allows me to declare my intent and build the sequence of actions on its own.

I discovered that there's a programming paradigm devoted explicitly to this: declare intent rather than steps. And, not least, I should be able to use it at work, to write production code with it! However, the languages I knew back then didn't support this paradigm. So I started looking, both for new languages and inside the things I already knew to spot declarativeness in them.

I was disappointed. Nearly nothing out there was really what I want. Here's what I've discovered so far.

Markov algorithms

What is Markov algorithm?

Markov algorithm is one of the basic forms of algorithm notations. It expresses a computations as a series of string rewrites. Given an input—a string—execution of Markov algorithm is a prioritized substitution of one substring with another, the substitutions being prespecified and immutable; the list of substitutions forms an algorithm. Here's a program that converts binary to unary:

|0 → 0||
1  → 0|
0  → 

You can see why this works on the Wikipedia page.

You can simulate a Turing machine with Markov algorithm. Just draw a "caret" on the string and express the algorithm as the caret movements. Inside the caret you may store and modify the "state" of a Turing machine. Here's how a Turing machine's rule "if a caret reads A and is in the fourth state, write B, change state to 10th and move right" would look like:

|4|A → B|10|

That is, Markov algorithms form a Turing-complete language.

This is a declarative language, about which almost everyone who studied computer science knows. (Well, at least, in Russia; due to national origin of its inventor it may be more widespread there). However, nearly no one could name it when asked about declarative languages. Sad that people don't establish connections between what they learn at the university and what they call "real life"...

Back to the topic, Markov algorithm is an ordered set of statements A → B, each of which really means "there can't be sequence A in the resultant string! But if there is, it should be replaced with B"

It's nevertheless questionable if Markov algorithm is declarative. Sure, it doesn't tell exactly how the computation should occur. Of course, the exact calculations can be easily inferred, but it is true for any declarative language.

The problem with it is that Markov algorithm is a mathematical abstraction. Although, following the notation of Markov algorithms, some useful languages were designed (one of them is the Refal language), they still resemble mathematical abstractions, and I can't see how I could use them in production.

Make

What I could—and do—use in production is Make, the language of Linux makefiles. It doesn't have a specific name, actually, so I'll call it "Make" here.

Basically, a structure of a Make program is a list of "rules". Each rule specifies a target (or a target pattern, like %.cpp), and a list of prerequisites. A rule means that in order to accomplish the target, you must

  1. get done with prerequisites
  2. execute commands specified within the rule.

The commands usually take prerequisites as input, but they don't have to. This creates an oriented graph of dependencies, which is walked until you reach the target. Sounds like a cool concept, which should have been backed up with some kind of mathematical theory and notation!..

In reality, it's not. Make evolved from (and, perhaps, still remains) a build system. And here go some gory details. The targets and prerequisites denote files in the filesystem, and the timestamp of these files (i.e. the time they were last modified) is used as a measure of whether you're "done" with prerequisites.

The rules are implemented in one of the shell languages you choose. Technically, they could be written in any language, but shell scripts are chosen because they're tied to work with files as input—and files are the primary objects Make works with. This fits building application perfectly, but you can go over this domain if you treat file system as just a key-value database.

However, the biggest limitation of Make is that it can't modify the dependency graph after the execution is started. There are techniques to overcome this restriction, but they're not generic, and are too tricky and fragile to be used in production.

Another thing is that Make is just...too old. Programming has changed a little since 1978, you know... Make is not flexible, has problems with debugging, and doesn't evolve much: its computational model is already exhausted.

Whenever

An esoteric language I was introduced to in a programmers.SE question is Whenever. Basically, a Whenever program consists of an unordered list of clauses. A clause can contain an operator. An operator is executed if the clause that contains it is in the to-do list, and if some conditions apply. Operators may add and/or remove clauses from the to-do list, or print text (and we know that printing text is no different from any useful work, actually).

Memory cells are implemented as possibility to have a clause several times on the to-do list and as expression that returns this number. The conditions mentioned above can refer to this expression.

Fairness is a property of a nondeterministic system, which can repeatedly and infinitely face a multiple choice, to not allow behavior when a certain item is never chosen past any moment of time. I wrote a big article about fairness, check it out for more details.

The language is called "Whenever" because its main feature is that it's not clear when a certain clause could be executed! Each clause in to-do list is considered for execution with uniform probability. (I think the equality of probability could be safely replaced with enforcing fairness). Sadly, the author was so excited with the concept of such "absence of urgency", that he overlooked the declarative nature of the language.

Here's a sample program, which is just a "while" loop. It should print A ten times:

1 defer (2) 3;
2 print ("A");
3 3;
4 defer (1 || N(3)<=10) -3#N(3),-5;
5 again (5) defer (1 || N(3)>10) 2,1;

Play with the language yourself! You can read more about the available commands, and download Java (sic!) interpreter at the Whenever language homepage. It seems that David Morgan-Mar is its author.

Whenever language's key difference from Make is that it has control over its own dependency graph. It can impose and drop dependencies as the clauses are executed. However, the weakness of this language is its esoteric nature coupled with absence of a developed mathematical model (it resembles several concepts at once, but not any particular one).

Still not what I want

So, my results of searching a declarative language, which could fit into production, are still not great. The languages I observed here are either too conceptual (Markov algorithms and Whenever), or too limited (Refal, Make). I believe this gap will be getting closer over time. Or, perhaps, I'm just searching in the wrong place? Maybe such languages as Haskell and Prolog can already do what I want, but I'm just not aware of it? And do I really know what I want?..

Well, this only means that there'll be another blog post when I get to it!

Read on | Comments (2) | Make a comment >>


More posts about languages >>