Recent blog updates

Shown here are only posts related to languages. You can view all posts here.

Logging in Expressive Languages

I am not aware of any strict mathematical definition of expressiveness of a language. Informally, a language is more expressive if it allows you to achieve the same with less keystrokes. From this point of view, for instance, Perl or Python are more expressive than C. Of course, you can't always achieve with Python most things that C is capable of at low level, and therefore one may question whether Python is really more capable than C. But capability has little to do with expressiveness. In certain domains, one language may be more expressive than the other, and in other it could be just vice versa. (For instance, Python is more expressive in string operations while C is more expressive in low-level raw data representation.)

One may notice, however, that the majority of languages are Turing complete, and you basically may express the same with all of them. What is the difference in expressiveness then, as seen by a Practicioner rather than a Computer Scientist?

I usually use the following criterion. If you write one non-splittable expression instead of a chain of statements, the language is more expressive. Compare the following two programs that compute a sum of an array in C:

...and in Ruby:

..and, in C, we also should find a place to deallocate the array when it's no longer needed.

You may notice that Ruby can achieve with a single expression what C needs several operators and variable declarations for. Three Ruby features contribute to this, anonymous lambda functions, syntax sugar to pass them to functions (such as inject), and to avoid the really useless variable declarations. Isn't it always that beautiful?

In some cases it's not.

What it takes, to log?

Now let's try to add some logging to the programs above. Assume we actually want to hook into the loop, and make it print the square result it has calculated to a logging stream.

Let's take a look at the code again.

C:

Ruby:

Ewww! Why does our Ruby code looks like C? Where is our one-liner? Where is the expressiveness? How come we're explicitly introducing a temporary variable just like in C?

I spotted such patterns, "compute-log-return" in a lot of places in my practice. For i instance, the BLAST codebase I maintain, contains these here and there, because it's written in an expressive functional language (OCaml), and uses logging as a primary debugging mechanism. Because of it, the code that should look beautiful looks as an old prostitute trying to mask their weary skin with tons of makeup, and I'm not happy with this.

Trying to get asleep on a plane, I suddenly realized that this logging could be achieved with a better code. Since "assign-debugprint-return" is a recurring pattern, let's define a helper function

Then, the resultant Ruby code will look like this:

Now our Ruby code looks like Ruby again.

Of course, this is not limited to Ruby. Though less versatile, the sane function may be implemented for the C code as well.

And this brings us back to the original comparison with 1 line in Ruby and five--in C.

Discussion

while you may write such function for a less expressive language (as seen above), it will "stand out" in a large C code sheet and look like ugly duckling. What may benefit most are languages that mix functional and imperative coding styles, such as the aforementioned Ruby and OCaml. The reason why this helper might be useful is that it's relevant to the functional languages in two different ways.

First, functional language allow the programmer to define anonymous function inline, which makes them be frequently passed as parameters to other functions. In C, defining a separate function is, inmost cases, uglier than just adding a convenient logging shortcut.

Second, logging an intermediate result contradicts the functional level paradigm. In functional languages you define a computation as a whole rather than specify concrete sequence of actions. In case of logging, you have to violate a chain of functions with a side-effect operation. Of course, you can try to make it look nice, as we did, but the very reason we need the kludge is that we're doing something a bit wrong.

But it's not limited to functional languages. Most expressive languages that do not require from user a strict specification of type of every identifier can. For example, we might want to debug Perl's regular expression substitution with using a similar function:

(we couldn't use log because it conflicted with the mathematical logarithm function that bears the same name).

***

In any case, the "compute-log-return" function that encapsulated the debug print is a widely useful pattern. And it's especially useful when you're writing a program in an expressive language. Do not let the programming cruft conceal the actual idea of your code, use loggers that hide all the details.

Read on | Comments (0) | Make a comment >>


The Most Stupid Mistake You Can Make with Ruby

Programming languages have strict and mostly very compressed syntax. The origins of that are twofold.

First, actively used programming languages have a considerable history, and were started decades ago. Back then, screen space was limited, and, before that, there was little storage available for source code. Second, programmers have technically-oriented minds and like to endow abstract symbols with complex abstractions instead of using words to describe them. That's why, I guess, many programming languages heavily depend on correct usage of individual characters. Even as few characters as one may make a difference in program behavior as well as in whether program compiles.

Such caveats in different languages include:

  1. C++ templates. You can't write map<int,vector<int>> when you define a mapping from integers to arrays of integers, because C++ parsers thinks that >> as a bit-shift operator in an improper place. A correct program differs by a single space between the < signs.
  2. Makefile tabs. A rule body in the Unix Makefiles should be indented with a Tab. Spaces do not work. Being a software with more than 30 years of history, make had it fixed only a year ago.
  3. CSS delimiters. When you define a cascading style sheet, amd want to define the same block of style attributes for a certain class inside a certain tag, you write the selector as tag .class. It's just a space away from tag.class that defines the styles only for tag elements of class class.
  4. C-style equality in conditional statements. If you're a seasoned professional, you should have already forgotten about this. In many languages, including C, Ruby, Python, Javascript, C# and others, if (a = 1) is always true, since it's an assignment of 1 to a, followed by checking the a's value for truthfulness. The correct version of that has one extra equality sign: if (a == 1). More confusion is created by languages, where the first version is the legitimate way to compare values, such as Pascal.

Imagine how surprised I was when I realized that I made a variation of mistake #4 in my Ruby code today! Here's what it was about.

Ruby hash niceness

Named parameter is a way to specify function arguments at call by name (rather than by order, as in the standard function call notation).

Here you may find how the Named Parameter is implemented in various programming languages.

To emulate a Named Parameter Idiom, Ruby uses hashes and some syntax sugar. The last parameter of a function may be a hash, which maps from parameter names to values. Syntax sugar allows a programmer to write

This sugar is not unique to Ruby; Perl also supports it.

instead of

The :name notation specifies a symbolic constant, which effectively is an immutable string that is defined by only one ancillary character.

Such hashes and symbols seem as a very useful feature. It allows you to emulate DSL-s; here's an example of Ruby on Rails web framework routing configuration:

Until one day, after an hour of debugging you find yourself having written something like this:

See what's wrong here? Indeed, here's how the code of the function called might look like:

So, options[:attribute_check] should evaluate to false boolean value, but... :false is a totally different thing; it's an immutable string of five characters that evaluates to true instead! Just one colon that lurked into the code, and it made it behaving very wrong way.

Just like in a C-style typo, some expressions that are evaluated as true in boolean context look like those that are evaluated as false, and you should be careful with the borderline.

New named attribute definition style in Ruby

New named attribute passing style was not designed to address this problem. However, the abundance of colons in such a code makes it look worrying in case there is a mistake like the above:

You see that the mistake is easy to spot, because the conjunction between the name and the parameter value is so ugly, that it immediately draws attention. However, if you actually need to specify symbol as a value, then you'll have to look ugly with this style.

Moreover, you can't erase the space between the parameter name and value, because for this code:

Ruby parser will think that it's false method in an attribute_check object, as :: is a scope resolution operator, just like in C++. Space matters again, as in typo #1 desccribed above.

People say that this style resembles that of C# or JSON. So, maybe, it is a good idea to migrate to it. Only two things prevent me from doing this so far: it's not portable to previous version of Ruby, 1.8 (though it slowly becomes obsolete), and I find the old style look much more cute :-)

***

This was yet another typo that makes our programs behave differently than we expect them to. And again, it was just one character that breaks the expected behavior of if statements that implicitly convert the condition to boolean type. Unfortunately, while the new Ruby named parameter syntactic sugar could help, it sometimes looks even worse to me.

I hope this post will help you avoid the similar mistake if you code in Ruby.

I would like to end with a joke about a mythical C++ programmer leaving hist last commit after having been fired:

Happy debugging to you, indeed!

Read on | Comments (0) | Make a comment >>


Searching for the declarative language

I'm really tired of telling these dumb machines every instruction they should perform! I know that they only pretend to be dumb, and they can act clever if they're taught to.

After I implemented yet another graph traversing algorithm, after I wrote "if it doesn't exists, create, and only then proceed" for the million times, after I had troubles with parallelization of totally independent tasks just because I didn't write the program to be parallel, I felt that I'm done. Most programs I wrote had a similar scheme: loosely connected computations at different nodes with some dependencies between the results of them.

But over and over again I had to implement the exact sequence, while it really either doesn't matter, or is easily computable. I started looking for a language that takes this burden off me, for the language that allows me to declare my intent and build the sequence of actions on its own.

I discovered that there's a programming paradigm devoted explicitly to this: declare intent rather than steps. And, not least, I should be able to use it at work, to write production code with it! However, the languages I knew back then didn't support this paradigm. So I started looking, both for new languages and inside the things I already knew to spot declarativeness in them.

I was disappointed. Nearly nothing out there was really what I want. Here's what I've discovered so far.

Markov algorithms

What is Markov algorithm?

Markov algorithm is one of the basic forms of algorithm notations. It expresses a computations as a series of string rewrites. Given an input—a string—execution of Markov algorithm is a prioritized substitution of one substring with another, the substitutions being prespecified and immutable; the list of substitutions forms an algorithm. Here's a program that converts binary to unary:

|0 → 0||
1  → 0|
0  → 

You can see why this works on the Wikipedia page.

You can simulate a Turing machine with Markov algorithm. Just draw a "caret" on the string and express the algorithm as the caret movements. Inside the caret you may store and modify the "state" of a Turing machine. Here's how a Turing machine's rule "if a caret reads A and is in the fourth state, write B, change state to 10th and move right" would look like:

|4|A → B|10|

That is, Markov algorithms form a Turing-complete language.

This is a declarative language, about which almost everyone who studied computer science knows. (Well, at least, in Russia; due to national origin of its inventor it may be more widespread there). However, nearly no one could name it when asked about declarative languages. Sad that people don't establish connections between what they learn at the university and what they call "real life"...

Back to the topic, Markov algorithm is an ordered set of statements A → B, each of which really means "there can't be sequence A in the resultant string! But if there is, it should be replaced with B"

It's nevertheless questionable if Markov algorithm is declarative. Sure, it doesn't tell exactly how the computation should occur. Of course, the exact calculations can be easily inferred, but it is true for any declarative language.

The problem with it is that Markov algorithm is a mathematical abstraction. Although, following the notation of Markov algorithms, some useful languages were designed (one of them is the Refal language), they still resemble mathematical abstractions, and I can't see how I could use them in production.

Make

What I could—and do—use in production is Make, the language of Linux makefiles. It doesn't have a specific name, actually, so I'll call it "Make" here.

Basically, a structure of a Make program is a list of "rules". Each rule specifies a target (or a target pattern, like %.cpp), and a list of prerequisites. A rule means that in order to accomplish the target, you must

  1. get done with prerequisites
  2. execute commands specified within the rule.

The commands usually take prerequisites as input, but they don't have to. This creates an oriented graph of dependencies, which is walked until you reach the target. Sounds like a cool concept, which should have been backed up with some kind of mathematical theory and notation!..

In reality, it's not. Make evolved from (and, perhaps, still remains) a build system. And here go some gory details. The targets and prerequisites denote files in the filesystem, and the timestamp of these files (i.e. the time they were last modified) is used as a measure of whether you're "done" with prerequisites.

The rules are implemented in one of the shell languages you choose. Technically, they could be written in any language, but shell scripts are chosen because they're tied to work with files as input—and files are the primary objects Make works with. This fits building application perfectly, but you can go over this domain if you treat file system as just a key-value database.

However, the biggest limitation of Make is that it can't modify the dependency graph after the execution is started. There are techniques to overcome this restriction, but they're not generic, and are too tricky and fragile to be used in production.

Another thing is that Make is just...too old. Programming has changed a little since 1978, you know... Make is not flexible, has problems with debugging, and doesn't evolve much: its computational model is already exhausted.

Whenever

An esoteric language I was introduced to in a programmers.SE question is Whenever. Basically, a Whenever program consists of an unordered list of clauses. A clause can contain an operator. An operator is executed if the clause that contains it is in the to-do list, and if some conditions apply. Operators may add and/or remove clauses from the to-do list, or print text (and we know that printing text is no different from any useful work, actually).

Memory cells are implemented as possibility to have a clause several times on the to-do list and as expression that returns this number. The conditions mentioned above can refer to this expression.

Fairness is a property of a nondeterministic system, which can repeatedly and infinitely face a multiple choice, to not allow behavior when a certain item is never chosen past any moment of time. I wrote a big article about fairness, check it out for more details.

The language is called "Whenever" because its main feature is that it's not clear when a certain clause could be executed! Each clause in to-do list is considered for execution with uniform probability. (I think the equality of probability could be safely replaced with enforcing fairness). Sadly, the author was so excited with the concept of such "absence of urgency", that he overlooked the declarative nature of the language.

Here's a sample program, which is just a "while" loop. It should print A ten times:

1 defer (2) 3;
2 print ("A");
3 3;
4 defer (1 || N(3)<=10) -3#N(3),-5;
5 again (5) defer (1 || N(3)>10) 2,1;

Play with the language yourself! You can read more about the available commands, and download Java (sic!) interpreter at the Whenever language homepage. It seems that David Morgan-Mar is its author.

Whenever language's key difference from Make is that it has control over its own dependency graph. It can impose and drop dependencies as the clauses are executed. However, the weakness of this language is its esoteric nature coupled with absence of a developed mathematical model (it resembles several concepts at once, but not any particular one).

Still not what I want

So, my results of searching a declarative language, which could fit into production, are still not great. The languages I observed here are either too conceptual (Markov algorithms and Whenever), or too limited (Refal, Make). I believe this gap will be getting closer over time. Or, perhaps, I'm just searching in the wrong place? Maybe such languages as Haskell and Prolog can already do what I want, but I'm just not aware of it? And do I really know what I want?..

Well, this only means that there'll be another blog post when I get to it!

Read on | Comments (0) | Make a comment >>


More posts about languages >>