Four years later, GNU Make 3.82 is released!

Gnu Make is not one of those products, website of which you frequently check for updates. One of the reasons is it's leisurely development pace: version 3.80 was released in 2002, 3.81—in 2006, and 3.82—in 2010. So each new release is breaking news.

Why so conservative?

Why's that? Well, I can't know for sure, but let me introduce some facts about the product.

First, Make is actually a profound tool. It contains a lot of very basic features, which makes nearly every build scenario you can imagine plausible to be implemented as a Makefile. Make's even Turing-complete. I think that this made the majority of people who develop Make think, that Make is already a perfect tool to do its job.

And then we see CMake, which is something that compiles into Make, just like Make was an assembly language of sorts...

What Make it actually lacks is a "standard library", something that would make using Make more convenient to writing complex stuff. If not that, there's nearly nothing to add to Make. And nearly nothing is added indeed.

But is it really conservativeness? What could characterize a conservative design? Perhaps a pursuit for stability and compatibility among releases?

However, this doesn't seem the case either. Some new changes (I'll comment on them later) introduce backward incompatibility—i.e. some old makefiles wouldn't be processed by Make. Some of them broke the build of one of the basic component of GNU/Linux system, Glibc (C runtime library)! This Glibc bug was filed only five days after the release of Make 3.82. Not before. This means that this backward-incompatible change, while forward-compatible, was not addressed, or checked, or noted, by anybody (and not by Make maintainers in the first place)! Is this called conservative?

An only conclusion I can make is that Make has reached all its capabilities. It's maintained, you know, not by the most stupid people in the world. And even if they can't foster its progress then who could? Perhaps, no one, and that's the point?

So what's new?

But alright, let's observe the changes GNU Make 3.82 brings us. If you go to the official site of GNU Make, it will kindly redirect you to the NEWS file, to get which you should download and unpack the release. But I'll release you from this burden, and quote the file here. Fully. Look:

Version 3.82

A complete list of bugs fixed in this version is available here:

http://sv.gnu.org/bugs/index.php?group=make&report_id=111&fix_release_id=104&set=custom

* Compiling GNU make now requires a conforming ISO C 1989 compiler and
  standard runtime library.

* WARNING: Future backward-incompatibility!
  Wildcards are not documented as returning sorted values, but up to and
  including this release the results have been sorted and some makefiles are
  apparently depending on that.  In the next release of GNU make, for
  performance reasons, we may remove that sorting.  If your makefiles
  require sorted results from wildcard expansions, use the $(sort ...)
  function to request it explicitly.

* WARNING: Backward-incompatibility!
  The POSIX standard for make was changed in the 2008 version in a
  fundamentally incompatible way: make is required to invoke the shell as if
  the '-e' flag were provided.  Because this would break many makefiles that
  have been written to conform to the original text of the standard, the
  default behavior of GNU make remains to invoke the shell with simply '-c'.
  However, any makefile specifying the .POSIX special target will follow the
  new POSIX standard and pass '-e' to the shell.  See also .SHELLFLAGS
  below.

* WARNING: Backward-incompatibility!
  The '$?' variable now contains all prerequisites that caused the target to
  be considered out of date, even if they do not exist (previously only
  existing targets were provided in $?).

* WARNING: Backward-incompatibility!
  As a result of parser enhancements, three backward-compatibility issues
  exist: first, a prerequisite containing an "=" cannot be escaped with a
  backslash any longer.  You must create a variable containing an "=" and
  use that variable in the prerequisite.  Second, variable names can no
  longer contain whitespace, unless you put the whitespace in a variable and
  use the variable.  Third, in previous versions of make it was sometimes
  not flagged as an error for explicit and pattern targets to appear in the
  same rule.  Now this is always reported as an error.

* WARNING: Backward-incompatibility!
  The pattern-specific variables and pattern rules are now applied in the
  shortest stem first order instead of the definition order (variables
  and rules with the same stem length are still applied in the definition
  order). This produces the usually-desired behavior where more specific
  patterns are preferred. To detect this feature search for 'shortest-stem'
  in the .FEATURES special variable.

* WARNING: Backward-incompatibility!
  The library search behavior has changed to be compatible with the standard
  linker behavior. Prior to this version for prerequisites specified using
  the -lfoo syntax make first searched for libfoo.so in the current
  directory, vpath directories, and system directories. If that didn't yield
  a match, make then searched for libfoo.a in these directories. Starting
  with this version make searches first for libfoo.so and then for libfoo.a
  in each of these directories in order.

* New command line option: --eval=STRING causes STRING to be evaluated as
  makefile syntax (akin to using the $(eval ...) function).  The evaluation
  is performed after all default rules and variables are defined, but before
  any makefiles are read.

* New special variable: .RECIPEPREFIX allows you to reset the recipe
  introduction character from the default (TAB) to something else.  The
  first character of this variable value is the new recipe introduction
  character.  If the variable is set to the empty string, TAB is used again.
  It can be set and reset at will; recipes will use the value active when
  they were first parsed.  To detect this feature check the value of
  $(.RECIPEPREFIX).

* New special variable: .SHELLFLAGS allows you to change the options passed
  to the shell when it invokes recipes.  By default the value will be "-c"
  (or "-ec" if .POSIX is set).

* New special target: .ONESHELL instructs make to invoke a single instance
  of the shell and provide it with the entire recipe, regardless of how many
  lines it contains.  As a special feature to allow more straightforward
  conversion of makefiles to use .ONESHELL, any recipe line control
  characters ('@', '+', or '-') will be removed from the second and
  subsequent recipe lines.  This happens _only_ if the SHELL value is deemed
  to be a standard POSIX-style shell.  If not, then no interior line control
  characters are removed (as they may be part of the scripting language used
  with the alternate SHELL).

* New variable modifier 'private': prefixing a variable assignment with the
  modifier 'private' suppresses inheritance of that variable by
  prerequisites.  This is most useful for target- and pattern-specific
  variables.

* New make directive: 'undefine' allows you to undefine a variable so that
  it appears as if it was never set. Both $(flavor) and $(origin) functions
  will return 'undefined' for such a variable. To detect this feature search
  for 'undefine' in the .FEATURES special variable.

* The parser for variable assignments has been enhanced to allow multiple
  modifiers ('export', 'override', 'private') on the same line as variables,
  including define/endef variables, and in any order.  Also, it is possible
  to create variables and targets named as these modifiers.

* The 'define' make directive now allows a variable assignment operator
  after the variable name, to allow for simple, conditional, or appending
  multi-line variable assignment.

Oh... My... God. Four years we've been waiting for this??? Three changes a year??

But let's observe the most important changes.

A lot of backward incompatibilities

Gentoo is a Linux distribution. I use it on my home machine, and find it highly convenient for a Linux used who likes to see how its system works. The pleasure of having such a system is of the same kind as watching clock with a transparent dial ticking, or keeping your PC without a cover.

One of its peculiarities is that by default every software is builtat the user's site. Package manager downloads sources and invokes ./configure; make; make install or whatever is relevant for the package. Since most of the Linux software uses Make to build, incompatible changes in Make's behavior is painful for such a system. That's why I refer to Gentoo bugzilla as to a good reference to bugs introduced by these changes.

By the way, you can get Gentoo at its homepage.

GNU naming convention is betrayed, a +1 to the minor version "81" breaks backwards compatibility. One issue with Glibc was noted above. Is there more? Sure. Here's a Gentoo bug that comprises all 3.82 regressions. Currently there's 40 breakages, and the list will most likely grow.

And if this change wasn't for nothing, I'd even understand this. Unfortunately... well, read along.

Shortest-stem makes Make more declarative!

* WARNING: Backward-incompatibility!
  The pattern-specific variables and pattern rules are now applied in the
  shortest stem first order instead of the definition order ...

You can define a "generic" rule, and then declare "amendments" to it by specifying, for example, prefixes to the filename, or the folders your target should be in. Now you can do it in arbitrary order, which makes the language more declarative. A good improvement, but, unfortunately, it really introduces nothing what couldn't be done before.

Linking change behavior

* WARNING: Backward-incompatibility!
  The library search behavior has changed to be compatible with the standard
  linker behavior. ...

That's the kind of change that contributes to "standard library" behavior: more flexibility with basically the same computational model. I could cheer upon this...if I used it at least once.

You can use something else instead of TABs

* New special variable: .RECIPEPREFIX allows you to reset the recipe
  introduction character from the default (TAB) to something else. ...

First, too late: Make is already notorious for its stupid tabs. Second, what other single character but tab could be used for this purpose? That's not even a regexp!

Of course, you can specify plain space character as this prefix. Then if your lines start with several spaces, the first would be recognized as prefix, and the rest would be ignored by the shell Make invokes. But then TABs won't work... So I doubt that this option would be used at all.

One shell for the whole rule

* New special target: .ONESHELL instructs make to invoke a single instance
  of the shell and provide it with the entire recipe, regardless of how many
  lines it contains. ...

First I thought, at last! I'm tired of writing multiline rules with these \ characters at the ends of the lines (that's especially painful if you have loops or conditionals inside the shell commands). However, I noticed that this is a global setting. So you can't just turn it on for certain rules; if you gonna turn it on, you have to write the whole makefile with these &&-s shell operators to mimic old behavior.

The rest

The rest is syntactical sugar and attempts to comply with the standards that evolve faster—and I actually couldn't imagine anything what evolved slower than standards—until this release, of course.

Conclusion

So the new release of Make is just a fucking disappointment. Yes, we certainly need something new, everyone says that (here's a link to deleted Tuomov's rant about Make, for example; thanks to QDiesel for that). More precisely, we have been needing it for several years. And I hope that in my search for the declarative language, I'll find something that could supersede this obsolete Make tool.

Searching for the declarative language

I'm really tired of telling these dumb machines every instruction they should perform! I know that they only pretend to be dumb, and they can act clever if they're taught to.

After I implemented yet another graph traversing algorithm, after I wrote "if it doesn't exists, create, and only then proceed" for the million times, after I had troubles with parallelization of totally independent tasks just because I didn't write the program to be parallel, I felt that I'm done. Most programs I wrote had a similar scheme: loosely connected computations at different nodes with some dependencies between the results of them.

But over and over again I had to implement the exact sequence, while it really either doesn't matter, or is easily computable. I started looking for a language that takes this burden off me, for the language that allows me to declare my intent and build the sequence of actions on its own.

I discovered that there's a programming paradigm devoted explicitly to this: declare intent rather than steps. And, not least, I should be able to use it at work, to write production code with it! However, the languages I knew back then didn't support this paradigm. So I started looking, both for new languages and inside the things I already knew to spot declarativeness in them.

I was disappointed. Nearly nothing out there was really what I want. Here's what I've discovered so far.

Markov algorithms

What is Markov algorithm?

Markov algorithm is one of the basic forms of algorithm notations. It expresses a computations as a series of string rewrites. Given an input—a string—execution of Markov algorithm is a prioritized substitution of one substring with another, the substitutions being prespecified and immutable; the list of substitutions forms an algorithm. Here's a program that converts binary to unary:

|0 → 0||
1  → 0|
0  → 

You can see why this works on the Wikipedia page.

You can simulate a Turing machine with Markov algorithm. Just draw a "caret" on the string and express the algorithm as the caret movements. Inside the caret you may store and modify the "state" of a Turing machine. Here's how a Turing machine's rule "if a caret reads A and is in the fourth state, write B, change state to 10th and move right" would look like:

|4|A → B|10|

That is, Markov algorithms form a Turing-complete language.

This is a declarative language, about which almost everyone who studied computer science knows. (Well, at least, in Russia; due to national origin of its inventor it may be more widespread there). However, nearly no one could name it when asked about declarative languages. Sad that people don't establish connections between what they learn at the university and what they call "real life"...

Back to the topic, Markov algorithm is an ordered set of statements A → B, each of which really means "there can't be sequence A in the resultant string! But if there is, it should be replaced with B"

It's nevertheless questionable if Markov algorithm is declarative. Sure, it doesn't tell exactly how the computation should occur. Of course, the exact calculations can be easily inferred, but it is true for any declarative language.

The problem with it is that Markov algorithm is a mathematical abstraction. Although, following the notation of Markov algorithms, some useful languages were designed (one of them is the Refal language), they still resemble mathematical abstractions, and I can't see how I could use them in production.

Make

What I could—and do—use in production is Make, the language of Linux makefiles. It doesn't have a specific name, actually, so I'll call it "Make" here.

Basically, a structure of a Make program is a list of "rules". Each rule specifies a target (or a target pattern, like %.cpp), and a list of prerequisites. A rule means that in order to accomplish the target, you must

  1. get done with prerequisites
  2. execute commands specified within the rule.

The commands usually take prerequisites as input, but they don't have to. This creates an oriented graph of dependencies, which is walked until you reach the target. Sounds like a cool concept, which should have been backed up with some kind of mathematical theory and notation!..

In reality, it's not. Make evolved from (and, perhaps, still remains) a build system. And here go some gory details. The targets and prerequisites denote files in the filesystem, and the timestamp of these files (i.e. the time they were last modified) is used as a measure of whether you're "done" with prerequisites.

The rules are implemented in one of the shell languages you choose. Technically, they could be written in any language, but shell scripts are chosen because they're tied to work with files as input—and files are the primary objects Make works with. This fits building application perfectly, but you can go over this domain if you treat file system as just a key-value database.

However, the biggest limitation of Make is that it can't modify the dependency graph after the execution is started. There are techniques to overcome this restriction, but they're not generic, and are too tricky and fragile to be used in production.

Another thing is that Make is just...too old. Programming has changed a little since 1978, you know... Make is not flexible, has problems with debugging, and doesn't evolve much: its computational model is already exhausted.

Whenever

An esoteric language I was introduced to in a programmers.SE question is Whenever. Basically, a Whenever program consists of an unordered list of clauses. A clause can contain an operator. An operator is executed if the clause that contains it is in the to-do list, and if some conditions apply. Operators may add and/or remove clauses from the to-do list, or print text (and we know that printing text is no different from any useful work, actually).

Memory cells are implemented as possibility to have a clause several times on the to-do list and as expression that returns this number. The conditions mentioned above can refer to this expression.

Fairness is a property of a nondeterministic system, which can repeatedly and infinitely face a multiple choice, to not allow behavior when a certain item is never chosen past any moment of time. I wrote a big article about fairness, check it out for more details.

The language is called "Whenever" because its main feature is that it's not clear when a certain clause could be executed! Each clause in to-do list is considered for execution with uniform probability. (I think the equality of probability could be safely replaced with enforcing fairness). Sadly, the author was so excited with the concept of such "absence of urgency", that he overlooked the declarative nature of the language.

Here's a sample program, which is just a "while" loop. It should print A ten times:

1 defer (2) 3;
2 print ("A");
3 3;
4 defer (1 || N(3)<=10) -3#N(3),-5;
5 again (5) defer (1 || N(3)>10) 2,1;

Play with the language yourself! You can read more about the available commands, and download Java (sic!) interpreter at the Whenever language homepage. It seems that David Morgan-Mar is its author.

Whenever language's key difference from Make is that it has control over its own dependency graph. It can impose and drop dependencies as the clauses are executed. However, the weakness of this language is its esoteric nature coupled with absence of a developed mathematical model (it resembles several concepts at once, but not any particular one).

Still not what I want

So, my results of searching a declarative language, which could fit into production, are still not great. The languages I observed here are either too conceptual (Markov algorithms and Whenever), or too limited (Refal, Make). I believe this gap will be getting closer over time. Or, perhaps, I'm just searching in the wrong place? Maybe such languages as Haskell and Prolog can already do what I want, but I'm just not aware of it? And do I really know what I want?..

Well, this only means that there'll be another blog post when I get to it!

Comments imported from the old website

How's the search two+ years later?

You're right, you're not really clear what you want, but I love this area.

Here are some things you don't specify:

  • Pattern match?

    How do you want to identify the things you want done? Can you produce some task description object and give it a simple ID? Is the description something you could hash? Does it need to include wildcards? Is it a predicate calculus expression with unification and maybe even ands and ors of subexpressions?

  • Persistent results?

    Are the tasks going to produce data that can be reused by multiple further tasks, or are they "eaten" by the task that asked for them?

  • Modifying dependencies???

    Are you sure you know what declarative means? :-) If so, how can the dependencies of a task change? In particular, how can they change without invalidating what's already been done with the graph? Are you just talking about something simple, like an OR, where finishing one task makes others unnecessary (at least for that OR)? Or a piece of code that, once it gets run, may change its mind about what subtasks it needs done?

  • Concurrency

    How much control over allocation of tasks to processors? Do you want the control and coordination to be distributed or centralized?

Those are just points that struck me as important to nail down.

One idea is just to set up a module to make dynamic programming (or caching) easier in whatever language. Then the language for goals is just to call the function that computes the result you want. I know there's a simple module like that in the Python Package Index. I don't mean that caching is all you need, but that using a function interface to each task-type lets you start with a single-threaded program and incrementally add caching, threading, fine-grained scheduling, etc..

The Linda "coordination language" is a language with requests and triggers based on pattern-matches to assertions in a database. Very clean, and it interfaces to multiple languages. In contrast you could look at the CONNIVER manual for more complex and hairy ideas.

There seem to be many dataflow languages. Looks like some research would be needed to find out which ones might be useful to you.

There is a make system called SCons that is written in Python and is extensible to new target types. Its way of expressing dependencies and actions is somewhat different from make's.

You can steal my tiny, Python, single-threaded clone of make, myke. It's a little messy but it's so small you can turn it into your kind of mess.

There is Flat Concurrent Prolog. What's "flat" about it is that it's designed to give you control over the parallelism, within the Prolog paradigm, rather than spawning an arbitrary tree of processes on an ordinary Prolog program. Here's one free implementation.

Here is a link to some old discussions of FCP vs. Linda vs. the E language, a distributed object-oriented language.

Pavel Shved on 31 May 2013 commented:

Hello, fellow programmer!

Thanks for your comment, I learned some very interesting things from it. So, here we go.

The search has stopped without succeeding for what I believe are good reasons. But let me start from the beginning.

First, the questions you ask are straight to the point. "Modifying dependencies" was a plea to have this limitation of Make removed when results of previous computations will not affect what rules will be applied afterwards. The way to combat it is to write an overcomplicated makefile that generates another makefile or calls itself recursively, and other questions are perfectly valid.

I simply wanted to find a language/paradigm/something that addressed them--and many others--for me :-)

I skimmed Wikipedia pages of dataflow languages at the time I wrote this post; I remember I reasoned that they are completely different from what I wanted to achieve, but details have been forgotten.

At the time of writing, I also didn't know about Prolog, and in your comment I first encountered Linda! This looks very promising, especially if I consider more interesting matching implementations. I hope I'll find time to play with these things.

Now to the reason why I deliberately stopped searching. As I wrote more practical programs, I realized that the model I had in mind is rarely used--at least, in the work I've been doing daily.

  • Persistence of results of intermediate computations is rare, and for a good reason, because most of them are rarely reused, and should perish quickly. Where persistence makes sense, it can be solved with caching just fine.

  • Coordination usually constitutes very small part of a product, and re-use of readily available "imperative" coordination systems (message-passing and shared memory of all flavors) is simply good enough. Improving a small part with great effort is a violation of Amdahl's Law.

  • We still will need to solve translation of outputs of one component into inputs of another. This is a problem that separate language, as opposed to bindings, solves, but this is even less practical.

So this search transformed from practical problem to theoretical issue hence lost my focus. While small, the coordination part can be made seamless, but I don't have enough vision to formulate the requirements at this point; I can only keep dreaming.

Cross-compiling ARM kernel on x86

ARM is one of the newest and most quickly emerging architectures. The pace of its development and expansion promises bigger funding, so no wonder that we're considering to adapt our tools to it.

And one of the sub-tasks was to compile ARM kernel on our x86 machines.

So I started compiling. Compile ARM binutils, compile glibc, compile gcc, recompile glibc, recompile gcc, yeah. It's all that easy, but at each comma you should insert several-hours patching and googling.

And when I encountered yet another problem, and, exhausted, was googling for a solution, I found this:

It could be failing because you don't have the ARM version of the headers.

Just use buildroot. You are just duplicating their (really hard) work.

mpapet said at linuxquestions.org

Confirmed. Buildroot is an amazing and easy-to-use cross-compiling tool. It compiled the ARM toolchain in just a couple of hours. And it also supports other architectures, and a lot of other cross-development stuff I didn't dig into too much.

Don't reinvent the wheel. Use wheels invented by others, and reinvent a bicycle on its basis!

Scrum: our three iterations

Yes, Scrum is now 100% buzzword. We learned about it in our university studies, we heard about it from corporate software engineers and consultants at the conferences, we see it in enterprise (see v1), we read about it in job postings, and some GSoC projects even impose Scrum to one-man "teams"!

However, it was not until this summer when I learned that it's actually an effective engineering technique. It revitalized the development of one of our projects (I'll later refer to it as "the Project").

Actually, the idea of adoption of Scrum was mostly inspired by one of our customers. A couple of our specialists worked in collaboration with another company on their product. The customer used Scrum, and held meetings in the room where our group sits. Our managers liked it, and we decided to try this framework.

So, we adopted Scrum for our project as well. When the idea emerged, I made a blog post referring to this idea as of a thought of a bored manager.

To make things less boring, I volunteered for a ScrumMaster role, and we spent three sprints (before we took a pause since the majority of our team went on vacation).

So, I've learned some lessons, and I want to share my thoughts on what Scrum is and how was it applied.

How I understand what Scrum is

First, what is Scrum. I won't repeat all definitions (you can find it on Wikipedia (frankly, quite a bad article) or here), just my perception of what it's really all about.

Our burndown chart

Here's our burndown chart of the 3rd iteration. It's created with use of VersionOne online agile project management tool (it's free for small teams with up to 10 developers!).

(To read the rest of the article, you should know base definitions of Scrum.)

Scrum values:

  • permanent formalized evaluation of the work done
  • early identification and discussion of problems
  • frequently changing local goals (as an Agile tech)
  • orientation to business values

The most important thing is the permanent evaluation. In the beginning of the post I mentioned that Scrum was used in a GSoC project. Of course it hardly was anything else but a disciplinary measure.

Early discussion of problems is also a valuable feature. During daily scrum developers are to speak about problems that prevent them from achieving their future goals. This is a required part of daily scrum, and it appears to be a valuable measure to keep things on time.

Frequent changing of local goals is a natural feature of any agile framework. Scrum's not an exception.

Orientation to business values is provided by separation of those who do the actual work (programmers) from those who make decision of how the product will look like (product owner). When we started doing Scrum, I noticed that nearly all items in my TODO list were replaced by the more valuable ones.

Scrum implementation

Scrum is not easy! Formal definitions say what should be done, but not how it should be done. I volunteered as a ScrumMaster, but only on the 3rd sprint I succeeded in leading a proper sprint planning. Scrum looks more like an interface, in the frame of which you can vary the actual implementation.

I can't explain every single nuance of our implementation. We have only established our way just moderately. And even if we would, it's a material for a 100-pages long book. Books is what you should pay attention to. The book that influenced our scrum the most is "Scrum from the Trenches", you can find the link at the "Acknowledgments" section below.

However, here's a couple of the most valuable notes.

Planning poker

Here's an image of Planning Poker deck (taken from a site of a "Crisp" company, which sells them).

We had printed similar cards and play them every sprint planning. The reasoning why they should be used is based on the fact that an expressed opinion actually affects what other people consider their own opinion. When people are forced to document their estimates independently, it usually leads to more productive discussions, into which the whole team is involved. You can read more about it at the site I already linked.

It appeared that planning poker is really useful during the sprint planning. Contrary to what you may think, it makes sprint plannings faster, as it brings in a formal pace of how the meeting should flow.

Some people suggest that sitting at the table and walking through the backlog is not effective. However, for our 4-man project, it's, in my opinion, the best way to hold it. As we play planning poker, I (as ScrumMaster) make notes about separation things to tasks, and after we select enough stories for the current iteration, we go through the notes, and form the actual tasks.

Downsides and lessons of Scrum

Responsibility

The first, quite a non-obvious consequence of adoption of Scrum, is that it makes developers even less responsible than they usually are :-). People start doing things not because they should, but because it was planned, because there's an action item, and you can't close it unless you complete it.

Motivation of developers

As I mentioned, Scrum teaches to talk about future problems, not about of the solved ones. However, this leads to underestimation of effort the developers made. If someone has successfully solved a tricky problem, and thus made the underestimated story completed on time, it's not possible to detect it in a usual workflow. This makes developers less motivated. To overcome this is a task of the managers, and Scrum doesn't help here.

Inflexibility

Scrum silently assumes that estimation of how much it takes to complete a task can be completed within minutes. To estimate all stories and tasks, (usually 10-20 stories) only 4-hour meeting is allocated.

It is probably true for most well-developed commercial projects. However, our Project was partly a research one, and it was unclear how much time an implementation of something that wasn't really done by anyone else before would take. For example, evaluation of one of the features in our project took a whole week!

We solved this problem by

  1. making sprints shorter (2 weeks long)
  2. introducing "research stories"

When the "product owner" required to implement such a story, which needed additional research for its estimation, we first asked him to add "research story". An outcome of a "research story" is not an implementation of the feature, but rather a report that contained evaluation of several ways to implement the feature. Of course, several approaches to the problem have to be prototyped, but the quality of such prototypes shouldn't make the developers spend too much time on them.

The report is analyzed upon a completion of such a story, and one way to implement the feature is selected by the product owner. Then implementing of this approach is added to backlog, and is to be addressed in the nearest sprint (or the owner may decide that the feature is not worth it).

Of course, to increase the pace, we made the sprints short (2 weeks is commonly understood as a minimum value of sprint length).

Lack of code quality criteria

Tasks and stories are required to speak business terms. However, what if there's a need to refactor code, to set up a new logging system, or to fix the architecture with no obvious visible outcome?

Focus factor (also known as "fuckup factor") is a multiplier for the available man-hours of the team. If your focus factor is "70%", it means that the team can only promise to deliver the amount of stories that takes 70% of its total man-hours. The rest of the working hours is to be spent on doing the "backend" work, which doesn't reflect concrete user stories.

70% is a "recommended" value for the focus factor, but its actual value may vary for different teams and projects. For our team it's 80%, because the Project doesn't have many users, so few bugs are filed.

The standard way is to decrease the "focus factor", and let developers decide how to spend the available time. If there's a great need for refactoring, there's a limited amount of time to be spent on it.

Prespecifying focus factor also makes the team choose wisely what to refactor, as they know they can't spend all the time on that.

Who fixes the bugs?

Some bugs are planned, but some are not! Some bugs get reported by critical customers, but--surprise!--all your working hours are already allocated between different stories. Who fixes the bugs then?

The obvious solution would be to allocate some more time on fixing critical bugs into the "focus factor" mentioned above. Non-critical bugs with lower priority can be made stories for the next sprint.

Scrum requires robots

Integrity and uniformity of the team is required for Scrum to work well. Days off are local catastrophes, and a sudden illness of a developer is a great catastrophe, which may result in a failed sprint (we're experiencing this right now).

It's obvious that Scrum would work for robots better, than for humans. However, if such a problem occurs, it requires really non-formalized, "human" thinking to resolve it.

Acknowledgments

The most fruitful source of information on how to do Scrum was "Scrum from the Trenches" (free pdf version). Shyness aside, having read Wikipedia definitions and this book, I became not the worst ScrumMaster in the world.

And, of course, experience matters. Our managers, Vladimir Rubanov and Alexey Khoroshilov, helped the team a lot to facilitate Scrum efficiently.

Conclusion

Another interesting thing I noticed, is that working without any formalized process is more boring than with it. Some time ago I assumed that it's like having sex in a new position, and after I tried that (Scrum, not sex), I realized that it was a very exact guess.

Though not ideal, Scrum appeared to be a very effective framework. It comprises both agile values and what commercial development wants, thus being the most popular agile framework. Its bureaucratic procedures (maintaining a backlog, doing scrums, drawing burndowns) are totally worth the net positive effect on development speed and quality.

Limiting time and memory consumption of a program in Linux

Without further ado, the 14 kilobyte standalone Perl script that limits a program's CPU time and memory consumption on most Linux systems can be found at pshved/timeout GitHub project. The script is compatible with older versions of Perl, and it has extra features such as hangup detection and resource usage reporting. The sources are licensed under Apache-2.0. An older version can also be found here.

So here's how this script came about.

Why do we need to enforce limits?

Most programs we use daily are actually pretty good. Years of development result in stable versions, bugs found and fixed. Good software usually does its job in minimal and predictable time. And if the software is to be left running for a long period of time, then its memory consumption over time is usually well-managed... well, at least, the developers aspire to do well by your RAM :-)

However, in certain domains, software is not like that. Software that is not yet as mature, or is solving a narrow research problem, can consume too much RAM, too much CPU, or just hang without making progress.

It's understandable though. Such software usually attempts to do something really complex, like solving an NP-complete problems using heuristics (and we know it's a good thing). For many such algorithms, neither time nor memory consumption have known theoretical limits, let alone practical ones. Usually such algorithms work well for most cases, and in these cases the limits are reasonable. However, for certain cases, which can't be easily distinguished, they can easily go off of these limits.

This doesn't seem to be a big problem if your aim is just to do some research and throw your code away (it's sometimes even referred to as "throwaway code"). However, if you're trying to ship something of production quality, you can't accept such immaturity. If the program works with a particular case for too long, it may block providing service to the other customers. And if it consumes all the memory available, it may throw the whole machine to the swap hole until reaped by OOM killer.

Now imagine it's 2009, you're a junior developer, cgroups are hot off the presses and aren't widely known, and you need to run research software at scale.

Practical Resource Limiting

So I needed to enforce time and memory limits of some black box, non-adversarial program, terminating upon violation. These limits should be set manually (in order to enable tweaking, and allow setting memory limit lower than the available maximum). The most obvious command-line interface would look like this:

        timeout -t 600s -m 1000000k command arguments ...

The commans is executed with arguments, and is constantly monitored. As soon as one of the limits is violated, a relevant message is printed to STDERR, and the program is terminated.

The biggest problem with setting these limits is that the processes we run may be "black boxes". I.e. we should be able to limit the processes, to source code of which we don't have access. We can't install signal handlers to them, or affect their code in any other way.

Another hard problem is that these processes spawn children, which do considerable fraction of the overall work. Some of these children are mature and well-written programs, while the others are the very same "research soft", which can eat all the memory with as great appetite as its parents have.

This means that we should count the time and memory consumed by all processes spawned by the one we watch for.

"But what's the problem?" you might ask, "Linux provides enough standard utils to solve this problem!" However, not all of them fulfill all the requirements to a good limiting tool. I'll enumerate them after the explanation of how the time limiting script is implemented.

What works

The following solution to the problem is a result of several months of experimenting with more complicated technology.

It looks fairly dumb and simple. Just poll each process in the tree the process spawned for time and memory consumption several times per second, and kill them on violation. However, it was no simple task to make this actually work correctly.

Tracking memory consumption is very simple. At each moment of time, every process contains a valid information about its virtual memory. The total memory is no greater than the sum of current virtual memory values for all the controlled processes in the tree. If the sum of memory stamp sizes exceeds the pre-specified limit, just terminate the controlled processes in the tree, and print the relevant message.

However, it's not that simple for CPU consumption. Time consumption accounts for the history of the process tree evolution, as well as for the snapshot in a specific moment of time. By examining the CPU time for each process in the tree, we can learn for how long they had been running, but we don't know how much time was spent there in the processes that had already finished. The total time is greater than the sum of such snapshots. Even if we manually track which processes die, and store the sum of their times, we miss very short processes that are born and finished between two out polls (see Figure 1).

I found out that Linux kernel calculates, for each process, the cumulative time of its reaped children (like in man 2 times). If a process has been terminated then its total runtime is accounted in the cumulative children time of its parent; if the parent is also finished, then the time's added to its grandparent, et cetera. This way, the total CPU time of the already finished children of the processes we watch for is just a sum of their cumulative user times.

Thus, we may learn the CPU time of all the processes currently running, and of all the processes that have already finished. All we need is to sum these two metrics, and to compare the result with the specified time limit. If the limit is violated, the message is printed, and the controlled processes are terminated.

We may fetch these values accurately fetched via /proc even for the black-box processes.

Implementation — the timeout script

Of course, the bare text isn't very inspiring. It just has to be accompanied with a script. And here it is, 14kb of Perl code. The sources are licensed under Apache-2.0. They were a part of the project we develop at work, but we decided to fork this particular script as a separate project.

Being seemingly stupid, that's the only thing that works. At first glance, any person, who knows something about signals, children, wait calls and all this crunching, will say that the solution is quite dumb and inefficient. However, I couldn't find anything better, and it will be much appreciated if you suggest any improvements. Here's the list of the things I tried

What doesn't work

РЖД totally looks like PID!
We started speaking about Linux processes, and I suddenly reckoned that the logo of Russian Railways company (РЖД) totally looks like pid (Process IDentifier)! See it for yourself!

Ulimit

Ulimit (the console tool, not the system call) is the first thing a Linux user would think about. You launch it in the shell, set "soft" and "hard" limits for each resource, and children, which are invoked in this shell, are limited in memory and time.

Its limits are imposed like this:

  • Time limit is set by setrlimit system call. When the soft limit is met, a SIGCPU is sent (which kills by default). This signal can be handled by a process; for example, it can decide to limit its own resource consumption on receiving it. On hard limit violation, a simple SIGKILL does its dirty job without any possible impediments.
  • Memory limit is set via the same system call. Memory limit is enforced by returning ENOMEM from a violating malloc() syscall. This usually leads to an unhandled exception of sorts, and the program (or its interpreter) is forced to terminate.
However, this doesn't meet all requirements. The thing is that
  • setrlimit() call only limits the process, in which it's invoked. Children of the process inherit the limit, but their accounting "starts over," whereas we are interested here in limiting sum of child and parent resource consumption.
  • You can't determine if the reason to terminate the process was a limit violation. When a memory limit is violated, we should print a message about that, but for that we should know all possible outcomes of a no-memory exception. Shell, Perl, Ruby, Python, C and other languages print message about lack of free memory in different, non-portable ways, and we can't possibly enumerate them all.

    However, setrlimit notifies the process when it exhausts its CPU time limit by sending SIGCPU. This, theoretically, can be handled, if we have the sources of a controlled process available, which we don't have—all processes are "black boxes". We also can't create a wrapper around the program, and catch this signal in a wrapper: the wrapper's time would be limited then, not the target's.

Sleep && kill

Another "easiest" solution to track time consumption is to sleep for a time equal to the limit and kill the controlled process afterwards. This only limits real time, while it's more interesting to limit CPU+sys time.

This may sound an overkill, but believe me, you will have to measure CPU time. Hardware develops in such a way that a performance boost may only be achieved by employing parallelization. And parallelization involves competition for a shared CPU between multiple threads and programs, and when that happens, the "astronomical" time elapsed becomes much greater than the CPU time, the time the process had actually been running for.

Coroner's way

A special wait-like call, wait3, can harvest information about a deceased child. It can return CPU time spent by the child. This would present precise information about the runtime of deceased processes.

However, we can't reap grandchildren, and such information should be harvested by each process in the hierarchy then. Glibc has a capability of making program send SIGHUP to parent when it dies. However, this function should be invoked from a process that spawns children. But the processes we watch for can't be modified. So this option can't be the solution.

One more thing is that if a child process hangs up in an infinite loop, and never terminates, we can't wait for it, hence we can't detect it, and watching for such a limit will fail.

Thus, this way of tracking is not close to the solution for limiting time. Let alone that this way won't allow us to track memory consumption: it's an "instantaneous" property, not a cumulative one.

Using process groups to control children

One of the previous versions of the script used process groups to control the children, just like the shell does!. It put the spawned process into a separate process group, and its children were assigned to that group automatically. It was easy to kill them all with a single kill command, which is capable for sending a signal to the whole such group.

However, the issues arisen by that "just like the shell does" part were too severe. If your black-box process contains a shell script somewhere inside (it may be a system()-like call or a mere Bash script), it would unconditionally interfere with your process group setting, and all the processes spawned through such a shell would lurk out of our control.

The current version of the script tracks process trees by default (but you may switch to tracking process groups). OF course, it will miss the children explicitly detached from their parents (such as daemons spawned). But in an only such a case we noticed in our project (an ssh process spawning to serve a network mount), we did not actually want to track such children! So we switched to trees over process groups without hesitation.

What could be improved

Of course, the sources are not ideal. They contain several problems, which are already identified, but not fixed (due to YAGNI principle). Here's the list of them.

Terminating several processes in a group correctly

Let's first elaborate the case with process groups. The Linux command that sends signals, kill, is capable to sending signals to the whole process group. I.e., after such sending, each process in the group receives the specified signal.

However, the processes in the group we track may have internal dependencies. Assume that two processes are connected by a pipe, and the group receives SIGTERM. If there is an interval between delivery of these signals, then, after death of one process, the other may receive SIGPIPE. This signal may cause reporting an error that would look like there's an internal error in that black box. However, this problem was actually caused by the resource controlling script.

I asked a question on Stack Overflow about that, and an only solution close to the truth was to pause the process group, then send term signal, then continue execution of the processes in the group. I.e., here's the part of the script:

If there's a better solution, or it's proved that after kill syscall the next instruction in all processes would be the signal handler, please, let me know!

Terminating several processes in a tree correctly

Process trees are even more tricky to terminate. Here's how we do it:

  1. Get a list of all PIDs in the controlled tree
  2. Stop all these processes by PIDs
  3. Repeat 1 and 2 for the processes spawned between getting list of PIDs and sending signals
  4. Repeat 3 until no new processes is spawned
  5. Send TERM signal to all the processes in the tree
  6. Get list of PIDs in the tree
  7. Send "continue" signal to them

If this solution can be simplified, please, let me know.

Error due to insufficient frequency

The case when a grandchildren is born and finished between two measurements, was mitigated by taking cumulative children time into account. However, the controlled process is still free to go insane between two wakeups of the polling script. This is especially important on an overloaded system, where timeout script has to compete for its CPU share with other processes. Perhaps, making the controlled process more nice would help.

Perl quirks or lack of knowledge?

The actual source contains more tricks undisclosed here. For instance, Perl has issues with ALARM signal, and I had to work this around the way I found in the article linked.

However, I'm not sure if I worked this around correctly, especially in the part of terminating the processes after the script detects limit violations. It involves the usual sequence of SIGTERM-sleep-SIGKILL, but that sleep sometimes never ends, leading to hangups in the slave's SIGTERM handlers sometimes.

Conclusion

In this small article, I shared what I experienced during the development of a script that limits time and memory consumption of a "black-box" process and its children. The result was a "dumb" but a sophisticated solution implemented in Perl, and reasoning why there's not many options to choose from. Further directions, in which this script could be improved, are also listed. I also linked the GitHub project for the timeout script.

I hope, that will help you in your affairs. Apache license allows you to use the script in commercial products in return of just mentioning the usage of the script and its original copyright owners.

Comments imported from the old website

Pavel Shved on 03 October 2010 commented:

We recently encountered a problem that one of our Java programs wasn't waited for properly by waitpid, and its call returned -1, while it should not. This doesn't happen each call, only in 20-30% of invocations.

We're currently investigating this issue; I'll edit the post as soon as we'll figure out what's wrong with it.

Marian (id) on 27 January 2012 commented:

Could it be that line 171 of the time.pl sript print STDERR "${id_str}MEM $timeinfo->{total}\n"; should actually be print STDERR "${id_str}MEM $meminfo\n"; ? Otherwise you print out the execution time instead of memory.

Pavel Shved on 27 January 2012 commented:

Marian, thank you for the bug report! It has already been fixed, you may check the latest version here or at GitHub.

OpenID as a public authentication mechanism

A lot of people use OpenID as a universal login to multiple sites. In fact it's what it officially aims.

OpenID is a safe, faster, and easier way to log in to web sites.

OpenID official site

Too boring to be the best use for such a cool system. However, many people see this as its primary usage. (For example, Bill The Lizard, Stack Overflow moderator, expressed it in his edited--with abuse of moderators' powers, of course ;-)--comment here).

If the system is used as a universal login, then revealing your OpenID is not secure, since compromising it leads to very unfortunate consequences: you may lose control over a lot of services at once. So the OpenID should stay private and do its job to verify your identity amongst the other records in a database of a particular service.

Authentication via OpenID

According to Wikipedia, "authentication" is "confirming the identity of a person, tracing the origins of an artifact, ensuring that a product is what its packaging and labeling claims to be, or assuring that a computer program is a trusted one".

In more simple words, it's an act of verifying that it was you. That it was not George W. Bush, not any other guy, but only you who logged in to a particular site, and left a particular content there.

In fact, "login" is also an authentication. But it's key feature is that it's private, while sometime you might want to publicly announce yourself as an author of the content (comment, post, etc).

And this "public authentication" is quite natural with OpenID. You log in to the site that supports it; the site makes a promise to display the very same id you logged in with near your real name. Given that this promise is kept, all the content you leave is signed with your OpenID.

Note that such a mechanism is not possible with just using OpenID as a "universal login". When using it as a universal login, the OpenID stays private and is not revealed to public. But then it can't be used for verification of the user profile you have on such a site. Within this scheme you can try to "sign" your content by one of the following:

  • Add OpenID to a freetext field in your profile. Usually a profile on a social site (forum, social network etc) contains some fields (named like "userinfo") where you can put anything you want. So you could put your OpenID there to authenticate yourself. But that is not reliable, since anyone can put a link to your OpenID to his or her profile, and one can't determine if any of these profiles is true.
  • Put a link to your profile to an OpenID page. Usually OpenID providers make a promise to display a certain page if someone uses your OpenID as a web address. These pages usually also have a freetext field, into which you can enter links to the profiles on the other sites that you own. But then anyone could put a link to your profile, and it's not possible to determine who is actually correct.
  • Do both of the above. But then your login is compromised anyway.
  • Do both of the above, but secretly use another OpenID to log in. But then you don't need the original OpenID at all!

The shortcoming of "add to userinfo" approach described above is that you have to list all the places, where you left something, on your OpenID page. All comments to blogs, all profiles you own--maintaining such a list is tiresome. However, if all engines, which support OpenID, revealed them, then doing this just wouldn't be necessary.

So, having analyzed the above ways to refrain from publishing your true OpenID, I thought that OpenID should become more than just a mean for identification. It could be also used for authorization, and just displaying it would suffice.

OpenID and coldattic.info

This is how I use OpenID in my blog. When you comment, the engine displays the OpenID (I have made a proper warning in the description of this blog, but I think I should make it more visible). And if you trust my engine, you can trust that all the comments left here are made by the very same persons that own the OpenIDs.

Of course, the promise is not backed with anything, and I can display random OpenIDs in the comments to my blog. But I'm no villain. And anyway you have to trust your OpenID provider--so why can't you trust a blog either? :-)

Comments imported from the old website

Felix H. Dahlke on 10 September 2010 commented:

I mostly agree with you, but I don't think that keeping your OpenID (which is basically a username when used for authentication) secret is beneficial to security. I don't even think it's possible. Relying on a secret username sounds like "Security by Obscurity" to me - I prefer to rely on a strong password.

Pavel Shved on 16 November 2010 commented:

I actually don't see any big difference between "username" or "password". If we approach the issue from the point of what should be secret, there's really no difference: well, assume your username is secrent, and password is not; what would change then? But keeping username secret is treated as "obscurity", and keeping password secret—is not; why?

There's no answer. Because that's the definition. By definition, an open part of your authenticity token is "username", which is kept open, and the secret part is "password". And actually, sometimes "password" is referred to as "secret".

This also proves that, by definition, others should know your username, or OpenID.

Performance metrics and parallelization

Some day you suddenly realize that a powerful PC is not cool anymore. Everyone works in clusters today! Clusters make our programs more fast, and if you goal is to fasten quite a long computation, they're of help.

Clusters of independent "computers" (in a broad sense) can scale differently: from several thousand tiny units on your GPU to thousands of different machines. And after you discover that you have a cluster available--or just several free PCs noone uses--you also meet the problem.

Programs suffer from insufficient attention paid to design them to run their parts concurrently. Probably, it's the right way to go, since the business value of early parallelization is not apparent. So the programs grow and their workflow ossifies. And when you finally find a parallelized medium to deploy the program onto, it's already too solid.

You have to cut your workflow to smaller pieces that may be computed independently. But where should you cut?

Performance metrics

To answer the question "how to break?" you must first answer "why?"

"To make my programs fast!" you answer, but speed may be understood differently! Here are two different "speeds", which one you pick?

The terms "throughput" and "response time" were adapted by me from different domains, and I'm not really sure that they're applicable here. If you know better terms for the concepts described, please, let me know!

  1. Given a lot of time, and having run the program a lot of times, maximize the number of fully completed tasks per time unit. I call this speed metric "throughput"
  2. Having run the program once, get final answer as fast as possible. I call this metric "response time"

Different tasks need different metrics. You should explicitly think about which one is the most suitable.

These metrics are very different, and hence they need different programs. Here's an extreme example. Assume you have N processors and your task can be divided into N independent sub-tasks of length t. Ideal distribution for response time looks like this:

The time needed to complete the task is t, and you can't make it any less.

However, here's a not any less ideal task distribution for throughput:

This distribution is not suitable for response time. Ideal distribution (shown by teal rectangle) yields response time nearly twice as fast.

However, for throughput it's not a problem, since we measure it for many runs of program. And for many runs this schedule is packed quite tight:

After we understood the difference, let's talk about response time metrics.

How to maximize response time

Each task of total length T has an unachievable ideal distribution across the N parallel machines. It was shown earlier by a teal rectangle; time it takes to get a response from such a computation is t = T/N. We can't make it in less time.

Assume that we created a task scheduling algorithm that completes in r time. The estimate of how bad our scheduling is could be estimated as (r - t)/t. We assume that our task has the first priority among others (i.e. the algorithm tries to finish it as soon as possible), and than the chunks are independent. These assumptions are not so uncommon, most systems that first compute a long list of independent tasks, and then execute it, fall into these assumptions.

Since the task is of first priority, then (r - t) < Δt, where Δt is the maximal length of the independent task (proof: if the inequality wasn't true, the last finished task should have been scheduled earlier). Here's the pic that demonstrates what we're talking about:

So, again, the badness of our algorithm is estimated as Δt/t; we're to minimize it. And there's just one straightforward way: to achieve the best response time, we should make independent tasks as small as necessary. We can stop dividing into chunks after we reach some acceptable badness (10%, for example); it's neither obligatory nor sensible to make the chunks as small as possible.

Conclusion

That's all I wanted to say. Perhaps it seems kinda obvious. But what shown here is the way to devise a good plan of action based on just setting a sane goal and doing small bit of maths (and maths do help in reasoning about real life). Actually, in the meeting I didn't draw formulæ, I just drew the pictures you see above--and we reached the agreement after that.

In further posts I'll pour upon the other performance metrics.

Comments imported from the old website

David Cary (id) on 16 August 2010 commented:

There seems to be a subtle flaw in your argument. I'm not sure if it affects your conclusion.

Let's call the last subtask to execute L.

Concrete example: Starting from the "old schedule" in the last image, make a "new schedule" by moving the short subtask to the left of L to the top processor, leaving a gap behind, and then slide L left into that gap. Let's also assume that new schedule makes L take a few percent longer time to execute.

The result is a new schedule that has slightly better response time, than the original schedule.

However, the "badness" approximation deltaT/T seems to indicate that this new schedule is worse than the original schedule.

And so picking the one of these two schedules that minimizes deltaT/T makes response time worse.

Generalize: Certainly chopping subtasks up into smaller subtasks -- as long as each task still does enough work that the per-subtask overhead is insignificant -- doesn't hurt response time. But if you don't change what gets assigned to the critical processor -- the processor that executes the L task -- then it is not possible to improve the response time, no matter how tiny you chop up the subtasks on the other N-1 processors.

Good Tetris players know some good heuristic solutions to this problem.

David Cary (id) on 16 August 2010 commented:

You asked for another word for "response time". Some people use "latency" to mean more or less the same thing.

"It's the Latency, Stupid" by Stuart Cheshire, May 1996.

Pavel Shved on 16 August 2010 commented:

Yeah, there's definitely something wrong with my post, and your comment makes me start to understand it... First thing is that I was very inaccurate with the terms I used, and this whole mess indeed looks unprofessional to me now. I'll rewrite my post in a couple of weeks, and I thank you for an inspiring comment!

Here's a "small" fix to this flaw.

First, Δt/t is not a property of scheduling. It's a property of partitioning of the whole work into chunks. It's therefore meaningless to say that "this schedule has better Δt/t than that schedule". However, this doesn't affect validity of your further reasoning.

It's true that, according to the metric, the first partitioning you describe should be "better" than the second. However, what does "better" mean? I didn't explain this in my post, while I should. A partitioning is "better" than the other if the maximum response time of all possible "sane" schedules is less that that of the other.

(A "sane" schedule is the one that doesn't make a processor stall in presence of tasks that haven't yet been started on any other processor.)

I.e. we can just implement a quick-and-dirty first-come-first-served algorithm for this NP-hard problem (and it's OK, since this component is not mission-critical). This metrics will make us assured that when we hit a "bad" case, it will be better when the partitioning is "better".

Moreover, the reason is not just in NP-hardness. In the problem we tried to solve the tasks:

  • actually had some dependencies between them (not much, though);
  • had unknown lengths; any task could just unpredictably terminate. However, the lengths had a common bound: "no task could be longer than Δt".

I think that Δt/t provides quite a good, practical assessment of badness, and, most importantly, it guides us to making a better partitioning. The post was mainly about practice than about theoretical issues, actually. However, I failed to demonstrate the connection between these formulæ and practice, let alone to follow my own earlier advice to refrain from using home-made quick-and-dirty heuristics for NP-hard problems...

Pavel Shved on 16 August 2010 commented:

Thank you for the term "latency" (and the title of the article sounds like a harsh reminder :-D).

Today I wrote my first unit test...

As you probably know, I work in the software engineering department of a scientific institution of Russian Academy of Sciences, and my research and area of interest is software engineering. According to tag cloud, the "software-engineering" tag has reached the popularity of the "fun" tag; it does demonstrate my inclination.

But it was today when I wrote my first unit test.

And I loved it. Seven hours of ceaseless coding, brains not involved. Framework that runs it, hack to build system that assembles test application, mock objects controlled by environment variables -- that's how a bash/perl application is unit-tested, and that's totally not interesting. However, it was fun. I discovered two bugs in my application just by writing some unit-tests. Otherwise, I would delay the work, when we would encounter them later.

Unit-testing and automated test systems development

My whole career was devoted to developing automated testing systems. I was totally aware of why the tests are needed--it was the source of my income, actually. But why I never wrote unit-tests?

The answer to this question is simple. We, testing systems developers, get tests for free.

Usually, when you develop a testing system, you have a customer, whose software will be the primary target of it. Most likely, this software has several versions. And these versions differ dramatically from the viewpoint of a tester: the next version has less critical bugs to detect than the previous one.

And here's the test data. Just load the stuff you are to test! And here they are: a lot of regression tests for your tools.

The tests are essentially for free: you don't even need to think of test cases: the comparison with testing previous version yields enough information to decide whether your release if OK. And if you have another level of indirection, such as "test each version under each major Linux distribution", then you're likely to satisfy any vendor with your level of QA.

So if you happen to purchase an automated testing system (or merely have it developed), ask developers what kind of testing they made. If they somehow happen to have no regression tests, then it means that something's really wrong there. Because having such tests for free is an opportunity experienced testers wouldn't miss.

Availability of software development keeps increasing (SYRCoSE 2010 notes)

Recently I visited a conference for beginner scientists and engineers entitled Summer Young Researchers Colloquium on Software Engineering, and presented my paper On Reasoning about Finite Sets in Software Model Checking. My talk wasn't great--mainly because the research didn't yield the results it was expected to. However, the post is not about this.

Most of talks weren't very inspiring. However, some of them revealed interesting ideas that concern availability of programming.

Service-oriented development

The first talk, devoted to service-oriented approach to programming, was given by invited speaker Dr. Prof. Tiziana Margaria-Steffen from Potsdam University.

The main idea of her report was that software development is unnecessarily complex. Software Engineering, the discipline that studies the craft of creating software, mostly addresses issues of how it should be made, not what should appear at the end. However, if we switch focus to the services that software should provide, we can make the development process different.

jABC's business logic editor screenshot

She later passed on to the system that demonstrates the success of such a focus shift, jABC. Its main purpose is to separate business logic from implementation of the concrete components, and have a nice graphical editor of the former (see side note).

Of course you heard about such systems a lot of times. Of course, I couldn't stand trying to find out what's so special. According to Tiziana, that system has been developed for over 15 years, and still isn't dead. The reason is that the business logic was constrained to having absolutely no synchronization between concurrent processes; an only parallelism supported is fork-join parallelism: spawn a constant number of threads and wait for them to finish.

But of course, expressiveness isn't the main feature of such systems. The main one is that it allows non-programmers to construct programs. In my opinion, however, if a person is capable to create a runnable flowchart, he already has enough skills to do it without any "flowchart-driven development". Tiziana's answer to my question about that was that to create a program, a certain amount of skills is indeed required. But if you need to slightly modify the program already created by someone who has enough skills, you don't have to bear them as well.

After I poured upon that, I agreed that it's possible. Perhaps, the reason is that minor modifications require the skill of reading programs, not the skill of programming, which is necessary to creating them. And it is such "flowchart" systems that make this possible. And it means that more people will be programming.

Not-so-natural language

Another report that interested me was the one given by Andrey Klebanov from SpB ITMO. It presented yet another system that verifies automata-based programs, but it had a minor issue which is, in my opinion, more profound than the rest of the work.

The specification for the checker should be written in English language, constrained by a specific grammar. The specification language, which defined some kind of a temporal logic, had a grammar which contained natural-language sentences! So the requirements would look like this, literally: After the red button was pressed, in the finite amount of time coffee will be made. (instead of a temporal logic formula "Button -> ◊Coffee"). And what is important, these requirements will be processed by a machine.

Anyone who's familiar with maths, and who is a specialist in software engineering would argue whether it's convenient. Such enormous amount of unnecessary text in conjunctions and pronouns would anyway make specification writers auto-generate them from a more succinct format.

Well, that's what availability means. And I think that it is a boon.

But I later realized that this system has an important benefit. It increases the availability of specification writing, lowers the bound of education necessary to write a machine-readable spec. If a highly-qualified specialist is to write such a spec, it will convert it from a succinct notation--but such conversion would anyways have been made. Abstracting away from the natural language, writing temporal logic requirements in mathematical notation requires certain skills, which not everyone possesses. So you could hire a less qualified programmer to to the specification writing job. A "natural" language notation should make writing requirements in a formal way more available, more cheap, and increase demand for verification tools.

Are the programmers still needed?

Of course, the increase of availability of development arises sad questions. Will we, the programmers, be needed by mankind in the future? Will we be the new Luddites?

I think, we totally will. However, someone has to write and maintain programs that will replace us. Someone has to make a relevant research. Communications of the ACM May 2010 issue presents an article about the (still research) prototype of a genetic algorithm that fixes bugs in programs. Today it makes most of us smile. Soon the smile will change to grin, and then--to panic.

However, I will not worry about that. It's natural, at the first place. Also, the first parts of the process that will be overtaken by robots and by low-qualified workers are the most tedious aspects of programming. Fixing bugs by digging into stacktraces, writing integration code, writing tests, creating specifications for complex obsolete programs will be the first to automate and outsource. What will left is the pure art of programming, that small piece of our work we all love it for. There'll be enough of that for the rest of my life.

And my kids will be lawyers.

Comments imported from the old website

silky (id) on 09 June 2010 commented:

Interestingly, I think the law profession would be more easily adaptable to a programming (automated) solution than programming. Or at least, around the same ease.

After all, if you can specify the "job" properly, (and have a sufficient database) I would imagine a program to be far more efficient at constructing arguments than a real lawyer.

Pavel Shved on 10 June 2010 commented:

Totally impossible. Lawyers talk to people. People are no XML data. Hence lawyers can't be automated.

Last year I had an interesting discussion with a bored Russian Migration Service representative at "Softool" exhibition. We discussed a possibility of automated processing of human fingerprints. The conclusion was that though the process could be automated--in an ideal world. But here we need to persuade judges and society that this man is guilty, and that is not. That can't be done by computers alone. People don't trust computers. But people do trust lawyers. Perhaps, that's because the lawyers are homo sapiens as well, but computers are not.

So before the lawyer profession will be automated, (by nice and shiny and smiling robots) my kids will already be dead.

Let alone that specifying the job properly is the most difficult part of the process. As for theorem proving tools, they showed some success several years ago, and I think that there'll be a major breakthrough in less than ten years. Lawyers still don't have to be worried though :-).

silky (id) on 11 June 2010 commented:

You're clearly a smart guy but your logic in your first statement is not sound. Lawyers do talk to real people; that is a component of their work, but the primary component of being a 'lawyer' is being familiar with rules and constructing arguments. This is programmable. So while there may not strictly be a computerised lawyer, there may be software able to do the primary job of a lawyer. So you may end up with lawyer-bot-operators. Clearly less-skilled work than a true lawyer, and - in an ideal setup - significantly more accurate/productive.

Pavel Shved on 12 June 2010 commented:

I don't agree that analyzing existing data, in order to construct arguments based on them, is the primary job of a lawyer. I currently work with lawyers on some personal stuff, and here's what I see them doing:

  1. Gathering data relevant to case. My whole life (and the lives of others) just can't be formalized, and investigating what's important can hardly be automated.
  2. Checking if it's possible for me to do something that can affect these data.
  3. Deducing arguments and planning appearance in court. Construction and verification of this can be automated.
  4. Actually making appearance in court.
  5. Talking to people and persuading them to perform specific actions.

One out of five. At most, there can be a program that automates (or assists in doing) the third step. But it's not the most difficult part, is it?

But what it see in movies and in media demonstrates that a successful appearance requires more than just a logical argument. A masterpiece example of an attorney's work, portrayed in The People vs. Larry Flint movie, is less of a logical argument than of the way he spoke. In most case arguments are much more boring and automatable, but there surely is a lot of cases, which not the argument wins.

silky (id) on 12 June 2010 commented:

One of five yes, but the most skilled One.

Lets not forget that most law is not anywhere near as interesting as movies portray.

Nevertheless, I'm going to assume both sides of the argument are now visible; we've both made our points and I'm sure the general theme of the comments is getting through.

Why Git is treated as so complex?

Perhaps, many of these ten readers, who check my blog, are aware that there exists such version control system as Git. There is said a lot about Git in comparison to other tools (here's some StackOverflow threads). Mostly it breaks down to these points:

  • Git is powerful
  • Git is complicated to use
  • Git is badly documented
  • Git is incapable to checkout (lock) our Word documents!

It is fairly clear (especially from the last bullet) that no manager would use Git as revision-control tool. Plain old SVN or even CVS would seem a better choice. Perhaps, one of the reasons is that the managers learned it back then, when they were developers yet. But I also think that there is another reason.

Joel and pointers

Joel Spolsky is famous for some of his insightful and somewhat flattering speeches about how software development should be organized. There is one his quote of particular interest, because, in addition to flattery, it is also true.

I’ve come to realize that understanding pointers in C is not a skill, it’s an aptitude. <...> For some reason most people seem to be born without the part of the brain that understands pointers. Pointers require a complex form of doubly-indirected thinking that some people just can’t do, and it’s pretty crucial to good programming.

Joel Spolsky, The Guerrilla Guide to Interviewing blog post, v3.0

I can't say the same with the same confidence level--still don't have enough experience. But, from scientific point of view, to be fair, I haven't seen anything that contradicts it: the best programmers are those who understand pointers.

Git and pointers

And now, what has it to do with Git? Well, everything.

The thing is that the structure of a Git repository is a topologically sorted graph with some pointers to its nodes. And this graph is surely not represented with adjacency matrix, but with pointers as well. Here's a graph of how my local clone of working repository looks like:

(the graph is created with aid of this one-liner)

Each commit has a number of parents, which are pointers, unpainted arrows along the edges drawn (and the convention is that parents are below children). Each tag is a fixed pointer to the commit, and each branch is a pointer that is designated to be constantly moving. The scary rebase thing is just changing a couple of pointers (and commit IDs).

Of course, you most likely would only understand these at the speed of reading if you already knew this. But I noticed also that explaining it this way, though it takes longer time, leads to better understanding and less mess in the repositories, to which less trash is committed. Many seemingly unrelated concepts of Git are better understood if you don't try to elide complexity of repository structure. After a failure to explain this verbally, I drew these pictures to explain what's the difference between pull and pull --rebase (those were referred to in the summary you read in your feed or may read at the top of this page):

And I think it worked: after that the developer that saw them started to produce much less cruft (contrary to the other guy :-)).

So I think, Git is a rare case when attempts to keep things simple lead to worse understanding. Instead, programmers' "aptitudes" of understanding pointers well should be utilized to provide better though more complex explanation. Eventually, they'll be picked up.

Joel and Git

From this perspective, the fact that Joel actually likes Mercurial, not Git, and has even written a very relevant tutorial on it should have rendered my claims unsound.

Mercurial is said to have the same architecture and be based on the same principles as Git. However, its advantage (personally, I think that it's a disadvantage) is that it smooths sharp corners when developer has to interact with these strange pointers.

Perhaps, that's why Joel picked Mercurial as a basis for the product he promotes with that tutorial (yes, there's an ad in the corner; take a closee look ;-)). If it's simple, it has better acceptance, since more people are capable to smoothly understand how to work with it. As to understanding principles, well, is it really necessary for software to be sold?..

Git and you

The summary is that, if you're a developer that possess a somewhat brain-damaging skill of understanding how these queer pointers work, Git's learning process would be much less hard than it's depicted. No matter what you do with it, even if you don't develop, but just utilize Git to backup config files on your machine, Git's a tool that speaks your language. And as a return of a modest time investment, you'll get a powerful tool that will smoothly bring the way you used to thinking about things into your version control.

Comments imported from the old website

Pavel Shved on 12 May 2010 commented:

By the way, I followed Alik Kirillovich's advice about blogs and added summaries that are displayed in feeds and are separate from the content. Don't you think that they look stupid, unnecessary, repulsive? (I.e. I would appreciate some feedback :-))