Recent blog updates

Shown here are only posts related to stackoverflow. You can view all posts here.

Are women worse programmers than men?

The idea behind this blog post has always been to make peace between two things. One of these is my honest belief that sexism is unwise and disgusting, and all human beings are equally smart regardless of their gender. The other thing is dismal amount of women in programming profession I observed at that time and place. In 2010, a StackOverflow question (it provoked a flamewar and has been deleted, but you can view it if you have 10k+ rep there). finally gave me the idea how these two things are possible simultaneously, and I wrote a provocative and--as one of the commenters puts it--"needlessly argumentative" post about it. Here it is, slightly, rewritten.

Are women worse developers than men?

There are slightly more women than men among humans. However, there's just a small amount of women in software development, as observed and acknowledged by us daily. I observe this in the place I work, and I hear a lot from others that they observe the same at their workplaces. If you're a software developer, take a look who surrounds you: how much of them are women? Not much I guess, much less than 50%.

So does this mean that women are somehow inherently worse in programming? No.

I used to dismiss participation rates in countries that do not completely satisfy the definition of "free" (Qatar, Singapore, Malaisiya.) But women are disadvantaged in such countries, and forcing women into one of the toughest and well-paid STEM-related occupations doesn't really scream oppression.

A survey of participation of women in computing over the world demonstrates that it varies dramatically from country to country: from 6% in Denmark to 55% in Thailand (anecdotal evidence reports 60%+ participation in Qatar). If somehow women were worse at this, these rates would have much more similarity.

The thing, however, is that whether women are worse or better at it is completely irrelevant for most situations that occur in practice. That's right: even if you are somehow sure that women are inherently worse at programming, this does not warrant any kind of different attitude towards them in the workplace, and here's why.

Why shouldn't one just fire all women if one thinks they are inherently worse at programming?

Short answer: because conditional probability. Longer answer follows.

Assume you are doing an interview, and assume also that you are sure that women are worse programmers than men. Does that mean that the probability that a woman who comes to your interview is a good programmer is lower than the probability that a man in similar position is a good programmer? No, and the reason is that the group your interview candidates come from is no longer "all women" or "all men." Instead, these are "all women who chose their careers in computing" and "men who chose computing".

Conditional probability demo

Demonstration that the skill of job applicants or of those women who chose software development as their career had nothing to do with how good of a programmer an average woman is.

The probability that "a woman is a good programmer" (A) given that "a woman chose programming as her career" (B) is not equal to the probability that "a woman is a good programmer" (A). This probability can be both greater or less than this, according to conditional probability definition (see illustrations):

P(A given B) = P(A and B) / P(B)

The definition of conditional probability.

And something tells me, that this very particular group "women self-selected as professional programmers" will have more good programmers than general population. In any case, no matter how small the ratio of good programmers among all women you believe is, it tells you nothing about women in the profession. Any kind of unfair treatment of female peers can't be justified by reasoning about general population, and understanding of conditional probability gives us a mathematically solid proof why.

Why is there a debate about whether women are good at programming in the first place?

Because many people believe that participation of women in software engineering and the skills of women in general are correlated, while in reality these numbers have absolutely nothing to do with each other. The application of basic probability theory to this debate finally settled it down for me, but many are deluded by less relevant questions.

Some see low participation of women in engineering as a sign of their inferior skill, and therefore reason that women should not be hired, or should be not taken seriously. This line of reasoning is mathematically wrong, as shown in the previous section, even if it somehow turns out that women are indeed worse programmers, which I do not believe is true.

Others believe that women and men have equal programming abilities (which is a completely justified belief), and make a conclusion that they must be equally represented in the workplace regardless of other factors, thus interpreting the disparity we observe as deliberate diminishing of women's programming skills. This line of reasoning is as unjustified as the previous example because, again and again, skill and participation may have nothing to do with each other.

Errors in math observed in both sides make the debate much more heated than it should because everybody makes illogical conclusions from the wrong math, which receive justified critique. Instead of trying to eliminate bias in prior perceptions, which can't be eliminated completely as well as may have nothing to do with the debate, I would like to see more focus on fixing what is easier to fix, and is not mathematically irrelevant.

There still exists this vicious cycle when female programmers are considered uncommon, and this wards off women by itself, making them uncommon, again. People stop caring about women because there are none on their team, and there's simply nobody to care about; they forget--or even never learn!--that programming, like many things in this world, is a two-gender endeavour. Feminist websites like this have a lot of advice to offer to mitigate this.

***

Once when we were celebrating an event at work, I made up a toast. I said that women are not welcome only in the most awful things in the world, namely the war. And that I wanted to lift a glass for the women currently there who justify our domain, software development, as something that is worth being.

So, to you and to all of the women in the field!

Post Scriptum

I'm kind of tired of being used as an example of sexism in programming. You probably come here via a link on some website claiming that there's yet another chauvinist hampering participation of women in computing or whatnot. This is not true, and has never been true. The gist of both new and old versions of the article is that a lot of debate around "women in computing" revolves around badly applied mathematics. Refresh your knowledge about conditional probability, please.

By the way, none of the visitors by such links, totalling to 4000 throughout several years, have sent me a message or made any attempt to comment on this post and correct something. Not a single one! I can only explain this that a demonstration of sexism you can link to is much more welcome out there than a person who corrects other' math.

For those interested in getting the old version of the article, which you may need to point a finger to, I'll spare you from going to webarchive and getting it from there. The one you see here is an updated version.

Read on | Comments (5) | Make a comment >>


StackExchange 2.0: evil grin of "Software As A Service"

As everyone knows, StackOverflow founders, Joel and Jeff have recently received mysterious VC funding that should help to support StackExchange platform and the Trilogy sites.

Meta had a number of threads where people tried to guess, jokingly and seriously, what kind of dirty trick the Trilogy crowd will encounter. As far as I remember, none of the guesses were unobvious. But the dirt wasn't novel either.

Defense of the Ancients and the "Clan DCE" affair

When I was at my first year in the university (that was five years ago), and didn't have enough time and money to have a good entertainment, I played online computer games. One of them, which I sometimes play even now, was Defense of the Ancients (DotA), a Warcraft III modification (but virtually it was a separate game implemented as just a map for Warcraft). Five players on each side, two-hour battle and a lot of fun and adrenaline.

This game has one peculiarity. It requires ten people to pay continuous attention to the game that lasts two hours. This means that if a player leaves the game, it is virtually ruined. However, it doesn't affect the status of that player in any way, so he could join another game and keep ruining them, destroying all the fun.

No wonder that the most crucial long-term goal was to find or create a good community, where each member respects others and doesn't spoil anything. That was achieved by rating systems, such as "Dota-league", and private channels (such as "Clan DCE", which gathered around a professional dota clan).

I participated in "Clan DCE", was a player of moderate skills, but I never ruined games by leaving. In fact, I even sometimes was late for university studies because I had to keep playing. But as that community grew, it became full of incompetent leavers, and the clan heads made a decision.

The decision was, exile everyone except the players "with invites" (which was effectively a randomly selected 20% of the community) and don't allow anyone in without two vouches from insiders. So most of the people who invested their time into building the community were exiled and didn't have a chance to return. That was a tough lesson of my Internet life, and now I pick communities to join way more carefully. The lesson was that those who own the platform own the community. Literally own. And can do everything they wish, regardless of what other think, and people have little to do with it.

StackExchange 1.0

StackOverflow, as any kind of such a site, is only successful if it has a strong community. When I was joining StackOverflow, I paid special attention to how the community is formed. It became apparent that community leaders and platform owners are separate people (due to what I saw on meta). Later I realized that SO is a jewel in Jeff's crown, and it won't do any harm to it ever. So I relaxed and didn't think much about it.

StackExchange is a Software-as-a-Service (SaaS) "make-stackoverflow-like-site" platform. It was bought by about 200 people, and even if they fell suspicious about buying a SaaS, Jeff's and Joel's reputation, I suppose, made them not bother about it. These guys built the communities, published their own ads... and some succeeded! But then VC funding came into play.

StackExchange 2.0

StackExchange 2.0 has a very hypocritical introduction blog post. I'll summarize it briefly:

  1. StackExchange will not require any monthly payments!!!!1 LOVE US, GUYS! (and we hope you won't read the rest of blog post)
  2. Opening a StackExchange site will be more like a game, not just a pay-to-get process. LOVE US EVEN MORE! (we know you like games)
  3. All unsuccessful StackExchange sites will be purged.
  4. All your bases successful StackExchange sites are belong to us. We will now get revenue from the ads. No matter that this revenue only exist because the current owners created and parented the community -- it's all ours now.

And if the community owners will try to move to another website (and some alternative stackexchange clones have already announced migration services), not many people will follow them, I suppose. Because why bother?. A current site is stil running, and is still free, so why bother moving anywhere?

There already is a thread with critics on StackExchange 2.0.

The critics summarize to "you're cutting admins off", which is unfair, and leaves the sites headless, run by bureaucratic committees instead of committed individuals. And, of course, taking their ad revenue, and, most important, what they bred.

The "stolen sites are still free" is Robin Hood-ish of sorts. For me it's plain simple: if it's a Robin Hood stealing, it's still a theft. But many people don't think that way, and they have other values. So, competing with a free site with full-blown support by platform owners will be incomprehensibly hard for those who really were the creators of the communities.

I bet just a few sites will even try to migrate, and the communities, torn apart, will survive in those parts that didn't move anywhere. Though, only time will tell.

Conclusion

Perhaps, it's not that bad. A quick whois querying shows that domain names seem to belong to the real owners of the sites, so if they just migrate to open source platform, they'll still have everything. Bu that domain name stuff is really out of my competence.

But anyway, I think that VC funding won. They developed a neat model to steal stuff from the people who created it, and they seem to gain support for a scenario "we will steal best of you, and destroy others". In Real Life that's generally called "marauding". And Internet Life teaches the same lesson again. Those who own the platform own the community.

And another lesson: Software-as-a-Service model, even before it became popular, already shows its evil grin: it's not you who own the software and tell it what to do, it's software owners who tell you what to do. Sad that it happened with the most open community I ever encountered.

Read on | Comments (0) | Make a comment >>


Code stealing prevention advices

Recently I answered this question on StackOverflow, but I'm afraid they'll do something with my post, so I'll save it here.

The question was about code-stealing prevention features. It specifically mentioned if Visual Studio Team Edition contained some features to aid repressing the developers. Well, here's a couple of my advices:

My advice on how to prevent code-stealing by employees

Pokrovsky sobor

The first approach is to force programmers to only know interfaces of other components, so that each one can only steal a small part of the whole software. This approach can be borrowed from footwear production. One transnational corporation, to prevent stealing by employees, arranged its factories so that each factory produced only left or only right shoes. You could do the same with your code: some programmers only write lines with odd numbers, and the others--those with even numbers; provided that they can't see the work of each other! That's sometimes referred to as "pair programming".

Some organizations force employees to sign a non-compete agreement. That's the kind of agreement that prevents programmers to work for competitors after they leave your company. This technique is best combined with job postings like "Looking for senior programmer with 5 years of experience in the similar field".

To prevent your programmers from stealing, you can do harm to them as soon as they finish the software. The method proved itself as the most efficient, and has been used for centuries. For example, Russian Tzar Ivan The Terrible burned eyes of the architect that designed a beautiful church at the Red Square, so the one designed remains the most beautiful ever. You can do something like this to your architect. I heard, latest Visual Studio contains some features...

Nowadays, however, it's more humanistic to hire already blind and already dumb people that lost their hands, so that they can't look at your code to memorize it, can't tell anyone about your code and can't type it again. The advantage is that this will help you dealing with labor agency in your country, which watches that your don't discriminate disabled programmers.

And yes, this post is a sarcastic joke, which criticizes the idea of any code-stealing-prevention measures.

"Sorry, couldn't help posting it." said I to StackOverflow, but here I'm not sorry, since it's my blog.

My on-field NDA experience

By the way I signed a non-disclosure agreement once. (And now I realize that if I disclose the information about it I will violate the agreement. So, I'm going to skip some text with logical reasoning, and proceed directly to the conclusion). The conclusion is simple: code stealing protection measures are just useless.

Read on | Comments (0) | Make a comment >>


"NP-complete!" as a lame excuse

Some time ago I bumped into a usual Stackoverflow question. A guy asked for a C# algorithm that could pick elements from array so that their sum is equal to a specified number. A usual NP-complete knapsack problem. But the answer made me think about an interesting matter. Let me screenshot the answer completely:

At a first glance, the answer contains everything an ideal answer should contain: a correct information, a certain bit of succinctness, and, a key to success, an XKCD comic. No wonder it was so highly upvoted.

However it was totally unhelpful.

Complexity classes

In Computer Science a concept of complexity class is used to define a class of problems, for which there exists an algorithm that runs on specified kind of abstract computing machine and uses specified amount of machine-specific resource. For example, famous NP class is defined as "set of problems that can be (a) solved on non-deterministic Turing machine (b) with use of polynomial number of steps with respect to the length of input". P class is the same but its abstract machine is a deterministic Turing one. The famous question of whether each NP problem is also P is still open.

There is a lot of more tricky classes (PSPACE, for example, requires polynomial "memory"--maximal length of line of a Turing machine), which can even be parametrized (PCP(n,m) (probabilistically checkable proof), for example). The relationship between various classes is being studied, and aids the relevant research; here's a picture with some of them:

complexity classes relationship diagram

(pic from «Эффективные алгоритмы и сложность вычислений» book by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license))

It clearly shows that the knowledge about complexity classes made its way into the minds of casual programmers. But instead of a challenge, it became a lame excuse.

What a casual programmer thinks when he encounters an NP-hard problem is, "Oh, it's NP-hard, so no known polynomial solution exists!". Some programmers even try to quickly make an algorithm that might solve something in specific cases, which they don't even realize. While what should be thought of is, "Oh, it's NP-hard, so no known polynomial solution exists! Let me derive constraints, and search for an approximate solution that fits it best!" Complexity theory statements and theorems should aid the solution, not substitute it!

Funny approximations for NP-hard problems

Honestly, I was among the programmers of the first type, as defined above. I used to being proud that I know that much of maths and was glad that I can blow the hopes of solving their problem. However a year ago I've taken an advanced complexity course (by guys from one of research groups in ISP RAS -- "group of Discrete Optimization Algorithms at Department of Mathematical Methods" (in Rus)) that actually revealed to me several amazing facts. Some complex NP-complete problems appear to have good approximations! Let me list several of them.

  • Edge-dominating set (wiki) -- assume we want to penetrate our adversary's network (which is an undirected graph). We should pick several channels (edges) to insert a virus into, and the hacking software will capture all information that passes through any of each channel's ends. We should cover the whole network with such "hacked" channels, so that each unhacked one is plugged into a node with a hacked channel. But at the same time, we should do it with minimum cost, i.e. minimum number of channels to hack. See figure A at the side for examples.

    Figure A

    Examples of edge-dominating sets from wiki page. Red edges are infiltrated channels. You may see that each non-hacked edge is touched by at least one hacked edge at joint router.

    The task of picking such a minimum set is NP-hard. Brute-force? No, there is a better idea. We don't necessarily need minimum (although it would be nice), but instead we can hack some extra channels--the budget is not fixed anyway. We could use some clever algorithm to pick such edges, but instead... let's just hack arbitrary channels that aren't adjacent to already hacked ones!

    What amazed me is that such a "solution"... will never exceed the real minimum number multiplied by two! "That's still too much", might the Greedy Manager have said, and he would be right. But the thing is that we even didn't try to solve anything: the dumbest algorithm yields a good, constant-factor approximation for a problem with yet unknown efficient exact solution. Hell, it's even easier to implement than a brute-force!

  • Knapsack problem (wiki) -- assume a burglar infiltrated a house. Unfortunately, he has a relatively small knapsack, into which he can't put everything of value. Possessing a discriminating eye, he can immediately determine how much the fence will pay for a thing. His aim is to maximize the cost of the stuff carried away, the overall weight being less than his knapsack may bear. But unfortunately he's also limited on time and has to do it quick.

    Complexity theory proves that such task is NP-complete. So the burglar might pack every combination of items into his sack until he finds out the best. He would have been caught if he did that, because the owner would have returned by the time he completes the enumeration. So he chooses another solution. He uses a "greedy algorithm" and pick first a thing with best price/weight ratio, among those that still fit his sack.

    A fool! He didn't know that it was the house of computer scientist, who knew that thieves are dumb and greedy, and who intentionally bought a piece of jewelery with best price/weight ratio in the house, but that piece doesn't allow any other thing to be put into the sack due to overweight. For arbitrary N, this trick can make the burglar take away N times less stuff than he could at most!

    But a simple adjustment to its algorithm can be made to limit the possible loss to "at least 1/2 of maximum value". The burglar should do the same, but at the end compare the value of picked best price/weight items with the value of the most expensive item he can carry away. If it turns out to yield more profit take this thing and run away.

    This is an example that a small hack can guarantee a good approximation.

  • Longest path problem (wiki) -- assume we want to make our competitor lose money, and thus we try to delay shipment of a valuable package. We can make a janitor agent into the dispatcher's room, so he can see the map of the area (an undirected graph) and use the special phone to guide the courier by a longest path between source and destination (without cycles, of course, since the courier will suspect something and report it). He must do it quickly, but, unfortunately, picking such a longest path is an NP-hard problem.

    We've seen approximations by constant factors before, but this task, unfortunately, can't have an approximating polynomial algorithm for each constant factor. Of course, that's true unless P = NP. But the fact is that its polynomial inapproximability by constant factor is proven unless P = NP. So we'll have to teach the janitor to yield worse approximations if we want him to succeed. And conclude that sometimes NP-hard problems just do not have simple hacks that solve them well.

There are more facts that concern randomized algorithms, which have a certain probability to hit a good solution; facts about more tricky, non-constant approximations, but they're far less fun and are more like boring math crunching, so I won't mention it here.

Conclusion

Albeit whether P=NP is still an open question, complexity theory contains many valuable achievements on solving everyday problems.

Do not make complexity theory just a source of lame excuses of not solving the problem or writing brute-force algorithms and garage-made heuristics that can guarantee noting. Some programmers call it all "clever cheats", but many hand-made "cheats" are of no help in approximations (as shown above with the first, most obvious "greedy" algorithm for the burglar). Instead of cheating, do not hesitate to perform a quick googling or consult experts in this area if you encounter an NP-complete/hard problem--a fast solutions with proven and limited weakness, and suitable for your specific task may exist, and may have been studied well.

Or, if you're curious enough, and possess certain bit of math education, you can study it on your own. English-speaking readers may check online pages of their favorite universities for book recommendations and, maybe, online lectures. I can't recommend anything in English since I haven't read such books. To a Russian-speaking reader I can recommend the book «Эффективные алгоритмы и сложность вычислений» by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license), which we used during our studies at the university.

Read on | Comments (3) | Make a comment >>


More posts about stackoverflow >>