Recent blog updates

Shown here are only posts related to stackoverflow. You can view all posts here.

Are women worse programmers than men?

As a student of a university with 90% male alumni, and with regular attempts of women to abuse their "girl power", I participate in "female programmers" flamewars quite often.

In these flamewars I usually try to stick to "logical" side, trying to analyze things without any bias to political correctness or male sexism. However, this position is not appreciated by both women (they disregard all but that "we are all equal" stuff) and men (half of them thinks it's too offensive, and the other half thinks that I'm too soft). But now I have the place without moderation, where I can tell everything I want.

FAQ

I'll start with a small FAQ, which explains the position I stand for in this post.

  • Do you think that women are worse programmers than men?
    Yes.
  • Does it mean that we should give preference to male humans when hiring?
    No.
  • Why is there a debate about whether women are good at programming in the first place?
    Because answers to previous two questions are different.
  • I know a famous programmer/computer scientist/developer Y, and she's a woman!
    So? I don't say that there's no good female developers, I just say that there's fewer of them.

So I'll try to elaborate these questions.

Are women worse developers than men?

There are slightly more women than men among humans. However, there's just a small amount of women in software development. I observe this in the place I work, and I hear a lot from others that they observe the same at their workplaces. If you're a software developer, take a look who surrounds you: how much of them are women? Not much I guess, much less than 50%.

I consider this experimental data. Sociology? Psychiatry? Who cares?! I'm partly a physicist, and experimental data are superior: if there's a smaller amount of female programmers in our more or less free world, then it means that they are just worse at it. Women either don't have a skill to be a programmer (and it's obvious that to enter the profession you should have a higher amount of skill) or don't have enough motivation. I don't really know what's happening there, but the result is obvious: male are more confident in becoming programmers. It's a fact.

Moreover, we usually observe that things are so bad, that for every female programmer (either "good" or "bad") there's usually at least one "good" male programmer in the office.

On this basis, it's not wrong to say that women, in general, are worse programmers than men, since a few of them bear the skills and will big enough to choose it as a career.

Let's formulate it in more mathematical manner. It means that if you compare a random woman and a random man you met in the street, a male will have better average result than a female subject. Or, if we discard probability distribution cruft, that P(W and G) < P(M and G), where W and M are events of male and female becoming programmers, and G is the event of someone becoming a good programmer (P(x) being the probability). The above statement is true, just because P(W and G) < P(W) is true by definition, and P(W) < P(M and G) is true by observation.

Why can't we then think that a woman we're going to hire is worse than a man we're going to hire?

That's simple. The above statement means that a random woman is worse than a random man when it comes to programming. But when you're making a hiring decision (or when you just evaluate your colleague), it's no longer a "random" woman.

Now it's a woman that explicitly chose one of her (possible) careers as software developer. So she already became a programmer; good or bad--we're yet to decide. And now her skills have to be evaluated in the same manner as men's, since there's no more bias.

What's the probability of her being a good programmer? It's a conditional probability: P(G | W), i.e. "someone is a good programmer given that it is a woman that became a programmer". The probability of a male being good is similar: P(G | M). So now we should compare P(G and W)/P(W) and P(G and M)/P(M). We shown above that P(W and G) < P(M and G), but it's also true that P(W) < P(M). So maths is of no help of us in determining which is bigger: P(G | W) or P(G | M)?

I don't have the answer to this question. In my causal observations I didn't see any evidence that female programmers are worse or better than male programmers. And this post doesn't try to solve this contradiction.

What it shows is that even since a random woman is a worse programmer than a random man, this does not mean that women who chose programming as their career are worse programmers than male developers. This really depends on context.

So, programmer girls object to the improper treatment, programmer males think that girls are worse programmers, non-programmer girls just don't care, and all of them are right--in a proper context. But it's understandable why it's incorrect to make a professional assumption based on gender in the office. It's just based on nothing, and observations made in previous section are no longer true when it comes to programmer women.

Why is there a debate about whether women are good at programming in the first place?

Because not many people are familiar with the concept of conditional probability. And because it's very hard to believe both that women are worse than men, and that it's not the basis to consider your colleague Jane a worse developer.

But there's one more thing. I suppose that some women do not tend to avoid discrimination, trying to abuse it when it's in their best interest. And even if they don't try to abuse it, it sometimes looks like they treat men and women differently in the first place.

The most notable example would be a quote from the deleted SO question (you can only view it if you have 10k+ rep there):

I do agree men and women think differently

Ekaterina, female programmer who started the flamewar at StackOverflow

So if you agree that you think differently, why do you not agree when you're treated differently? And why do some of you not object to that treatment when it's convenient to some of you?

It also seems that some people are like they just want to be discriminated. An extreme case is when a female programmer puts her gender into her nickname or domain name of her homepage... I encountered it in a discussion in the Google Summer of Code 2009 private mailing list (yes, we had a flamewar of this kind there as well). Can't cite the discussion (the list is private), but I think I can share that some people cited she.geek.nz, and the most ardent female custodian of equality had her homepage at... latvialinuxgirl.blogspot.com!

Well, your domain name is sometimes the first thing anyone knows about you in the Internet, and is it really necessary to show the you're a girl in it? And if you think that it is, then why do you think that your gender is important to your identity? And how does it stack with your equality speeches?

These are rhetorical questions. I know the answer, and you do too. To reach equality, everyone should stop demonstrating gender as something that's important to their career and professional identity. For example, my website title aims intergalactic political correctness--you can't know that I'm a human, let alone a male one.

I know a famous programmer/computer scientist/developer Y, and she's a woman!

Cool! I would like to re-cite a deleted StackOverflow answer by richj:

If you know zero of them you most likely know nothing about programming.

Once when we were celebrating an event at work, I made up a toast. I said that women are not welcome only in the most awful things in the world, namely the war. And that I wanted to lift a glass for the women currently there who justify our domain, software development, as something that is worth being.

So, to you, to all of the women cited above!

Read on | Comments (4) | Make a comment >>


StackExchange 2.0: evil grin of "Software As A Service"

As everyone knows, StackOverflow founders, Joel and Jeff have recently received mysterious VC funding that should help to support StackExchange platform and the Trilogy sites.

Meta had a number of threads where people tried to guess, jokingly and seriously, what kind of dirty trick the Trilogy crowd will encounter. As far as I remember, none of the guesses were unobvious. But the dirt wasn't novel either.

Defense of the Ancients and the "Clan DCE" affair

When I was at my first year in the university (that was five years ago), and didn't have enough time and money to have a good entertainment, I played online computer games. One of them, which I sometimes play even now, was Defense of the Ancients (DotA), a Warcraft III modification (but virtually it was a separate game implemented as just a map for Warcraft). Five players on each side, two-hour battle and a lot of fun and adrenaline.

This game has one peculiarity. It requires ten people to pay continuous attention to the game that lasts two hours. This means that if a player leaves the game, it is virtually ruined. However, it doesn't affect the status of that player in any way, so he could join another game and keep ruining them, destroying all the fun.

No wonder that the most crucial long-term goal was to find or create a good community, where each member respects others and doesn't spoil anything. That was achieved by rating systems, such as "Dota-league", and private channels (such as "Clan DCE", which gathered around a professional dota clan).

I participated in "Clan DCE", was a player of moderate skills, but I never ruined games by leaving. In fact, I even sometimes was late for university studies because I had to keep playing. But as that community grew, it became full of incompetent leavers, and the clan heads made a decision.

The decision was, exile everyone except the players "with invites" (which was effectively a randomly selected 20% of the community) and don't allow anyone in without two vouches from insiders. So most of the people who invested their time into building the community were exiled and didn't have a chance to return. That was a tough lesson of my Internet life, and now I pick communities to join way more carefully. The lesson was that those who own the platform own the community. Literally own. And can do everything they wish, regardless of what other think, and people have little to do with it.

StackExchange 1.0

StackOverflow, as any kind of such a site, is only successful if it has a strong community. When I was joining StackOverflow, I paid special attention to how the community is formed. It became apparent that community leaders and platform owners are separate people (due to what I saw on meta). Later I realized that SO is a jewel in Jeff's crown, and it won't do any harm to it ever. So I relaxed and didn't think much about it.

StackExchange is a Software-as-a-Service (SaaS) "make-stackoverflow-like-site" platform. It was bought by about 200 people, and even if they fell suspicious about buying a SaaS, Jeff's and Joel's reputation, I suppose, made them not bother about it. These guys built the communities, published their own ads... and some succeeded! But then VC funding came into play.

StackExchange 2.0

StackExchange 2.0 has a very hypocritical introduction blog post. I'll summarize it briefly:

  1. StackExchange will not require any monthly payments!!!!1 LOVE US, GUYS! (and we hope you won't read the rest of blog post)
  2. Opening a StackExchange site will be more like a game, not just a pay-to-get process. LOVE US EVEN MORE! (we know you like games)
  3. All unsuccessful StackExchange sites will be purged.
  4. All your bases successful StackExchange sites are belong to us. We will now get revenue from the ads. No matter that this revenue only exist because the current owners created and parented the community -- it's all ours now.

And if the community owners will try to move to another website (and some alternative stackexchange clones have already announced migration services), not many people will follow them, I suppose. Because why bother?. A current site is stil running, and is still free, so why bother moving anywhere?

There already is a thread with critics on StackExchange 2.0.

The critics summarize to "you're cutting admins off", which is unfair, and leaves the sites headless, run by bureaucratic committees instead of committed individuals. And, of course, taking their ad revenue, and, most important, what they bred.

The "stolen sites are still free" is Robin Hood-ish of sorts. For me it's plain simple: if it's a Robin Hood stealing, it's still a theft. But many people don't think that way, and they have other values. So, competing with a free site with full-blown support by platform owners will be incomprehensibly hard for those who really were the creators of the communities.

I bet just a few sites will even try to migrate, and the communities, torn apart, will survive in those parts that didn't move anywhere. Though, only time will tell.

Conclusion

Perhaps, it's not that bad. A quick whois querying shows that domain names seem to belong to the real owners of the sites, so if they just migrate to open source platform, they'll still have everything. Bu that domain name stuff is really out of my competence.

But anyway, I think that VC funding won. They developed a neat model to steal stuff from the people who created it, and they seem to gain support for a scenario "we will steal best of you, and destroy others". In Real Life that's generally called "marauding". And Internet Life teaches the same lesson again. Those who own the platform own the community.

And another lesson: Software-as-a-Service model, even before it became popular, already shows its evil grin: it's not you who own the software and tell it what to do, it's software owners who tell you what to do. Sad that it happened with the most open community I ever encountered.

Read on | Comments (0) | Make a comment >>


Code stealing prevention advices

Recently I answered this question on StackOverflow, but I'm afraid they'll do something with my post, so I'll save it here.

The question was about code-stealing prevention features. It specifically mentioned if Visual Studio Team Edition contained some features to aid repressing the developers. Well, here's a couple of my advices:

My advice on how to prevent code-stealing by employees

Pokrovsky sobor

The first approach is to force programmers to only know interfaces of other components, so that each one can only steal a small part of the whole software. This approach can be borrowed from footwear production. One transnational corporation, to prevent stealing by employees, arranged its factories so that each factory produced only left or only right shoes. You could do the same with your code: some programmers only write lines with odd numbers, and the others--those with even numbers; provided that they can't see the work of each other! That's sometimes referred to as "pair programming".

Some organizations force employees to sign a non-compete agreement. That's the kind of agreement that prevents programmers to work for competitors after they leave your company. This technique is best combined with job postings like "Looking for senior programmer with 5 years of experience in the similar field".

To prevent your programmers from stealing, you can do harm to them as soon as they finish the software. The method proved itself as the most efficient, and has been used for centuries. For example, Russian Tzar Ivan The Terrible burned eyes of the architect that designed a beautiful church at the Red Square, so the one designed remains the most beautiful ever. You can do something like this to your architect. I heard, latest Visual Studio contains some features...

Nowadays, however, it's more humanistic to hire already blind and already dumb people that lost their hands, so that they can't look at your code to memorize it, can't tell anyone about your code and can't type it again. The advantage is that this will help you dealing with labor agency in your country, which watches that your don't discriminate disabled programmers.

And yes, this post is a sarcastic joke, which criticizes the idea of any code-stealing-prevention measures.

"Sorry, couldn't help posting it." said I to StackOverflow, but here I'm not sorry, since it's my blog.

My on-field NDA experience

By the way I signed a non-disclosure agreement once. (And now I realize that if I disclose the information about it I will violate the agreement. So, I'm going to skip some text with logical reasoning, and proceed directly to the conclusion). The conclusion is simple: code stealing protection measures are just useless.

Read on | Comments (0) | Make a comment >>


"NP-complete!" as a lame excuse

Some time ago I bumped into a usual Stackoverflow question. A guy asked for a C# algorithm that could pick elements from array so that their sum is equal to a specified number. A usual NP-complete knapsack problem. But the answer made me think about an interesting matter. Let me screenshot the answer completely:

At a first glance, the answer contains everything an ideal answer should contain: a correct information, a certain bit of succinctness, and, a key to success, an XKCD comic. No wonder it was so highly upvoted.

However it was totally unhelpful.

Complexity classes

In Computer Science a concept of complexity class is used to define a class of problems, for which there exists an algorithm that runs on specified kind of abstract computing machine and uses specified amount of machine-specific resource. For example, famous NP class is defined as "set of problems that can be (a) solved on non-deterministic Turing machine (b) with use of polynomial number of steps with respect to the length of input". P class is the same but its abstract machine is a deterministic Turing one. The famous question of whether each NP problem is also P is still open.

There is a lot of more tricky classes (PSPACE, for example, requires polynomial "memory"--maximal length of line of a Turing machine), which can even be parametrized (PCP(n,m) (probabilistically checkable proof), for example). The relationship between various classes is being studied, and aids the relevant research; here's a picture with some of them:

complexity classes relationship diagram

(pic from «Эффективные алгоритмы и сложность вычислений» book by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license))

It clearly shows that the knowledge about complexity classes made its way into the minds of casual programmers. But instead of a challenge, it became a lame excuse.

What a casual programmer thinks when he encounters an NP-hard problem is, "Oh, it's NP-hard, so no known polynomial solution exists!". Some programmers even try to quickly make an algorithm that might solve something in specific cases, which they don't even realize. While what should be thought of is, "Oh, it's NP-hard, so no known polynomial solution exists! Let me derive constraints, and search for an approximate solution that fits it best!" Complexity theory statements and theorems should aid the solution, not substitute it!

Funny approximations for NP-hard problems

Honestly, I was among the programmers of the first type, as defined above. I used to being proud that I know that much of maths and was glad that I can blow the hopes of solving their problem. However a year ago I've taken an advanced complexity course (by guys from one of research groups in ISP RAS -- "group of Discrete Optimization Algorithms at Department of Mathematical Methods" (in Rus)) that actually revealed to me several amazing facts. Some complex NP-complete problems appear to have good approximations! Let me list several of them.

  • Edge-dominating set (wiki) -- assume we want to penetrate our adversary's network (which is an undirected graph). We should pick several channels (edges) to insert a virus into, and the hacking software will capture all information that passes through any of each channel's ends. We should cover the whole network with such "hacked" channels, so that each unhacked one is plugged into a node with a hacked channel. But at the same time, we should do it with minimum cost, i.e. minimum number of channels to hack. See figure A at the side for examples.

    Figure A

    Examples of edge-dominating sets from wiki page. Red edges are infiltrated channels. You may see that each non-hacked edge is touched by at least one hacked edge at joint router.

    The task of picking such a minimum set is NP-hard. Brute-force? No, there is a better idea. We don't necessarily need minimum (although it would be nice), but instead we can hack some extra channels--the budget is not fixed anyway. We could use some clever algorithm to pick such edges, but instead... let's just hack arbitrary channels that aren't adjacent to already hacked ones!

    What amazed me is that such a "solution"... will never exceed the real minimum number multiplied by two! "That's still too much", might the Greedy Manager have said, and he would be right. But the thing is that we even didn't try to solve anything: the dumbest algorithm yields a good, constant-factor approximation for a problem with yet unknown efficient exact solution. Hell, it's even easier to implement than a brute-force!

  • Knapsack problem (wiki) -- assume a burglar infiltrated a house. Unfortunately, he has a relatively small knapsack, into which he can't put everything of value. Possessing a discriminating eye, he can immediately determine how much the fence will pay for a thing. His aim is to maximize the cost of the stuff carried away, the overall weight being less than his knapsack may bear. But unfortunately he's also limited on time and has to do it quick.

    Complexity theory proves that such task is NP-complete. So the burglar might pack every combination of items into his sack until he finds out the best. He would have been caught if he did that, because the owner would have returned by the time he completes the enumeration. So he chooses another solution. He uses a "greedy algorithm" and pick first a thing with best price/weight ratio, among those that still fit his sack.

    A fool! He didn't know that it was the house of computer scientist, who knew that thieves are dumb and greedy, and who intentionally bought a piece of jewelery with best price/weight ratio in the house, but that piece doesn't allow any other thing to be put into the sack due to overweight. For arbitrary N, this trick can make the burglar take away N times less stuff than he could at most!

    But a simple adjustment to its algorithm can be made to limit the possible loss to "at least 1/2 of maximum value". The burglar should do the same, but at the end compare the value of picked best price/weight items with the value of the most expensive item he can carry away. If it turns out to yield more profit take this thing and run away.

    This is an example that a small hack can guarantee a good approximation.

  • Longest path problem (wiki) -- assume we want to make our competitor lose money, and thus we try to delay shipment of a valuable package. We can make a janitor agent into the dispatcher's room, so he can see the map of the area (an undirected graph) and use the special phone to guide the courier by a longest path between source and destination (without cycles, of course, since the courier will suspect something and report it). He must do it quickly, but, unfortunately, picking such a longest path is an NP-hard problem.

    We've seen approximations by constant factors before, but this task, unfortunately, can't have an approximating polynomial algorithm for each constant factor. Of course, that's true unless P = NP. But the fact is that its polynomial inapproximability by constant factor is proven unless P = NP. So we'll have to teach the janitor to yield worse approximations if we want him to succeed. And conclude that sometimes NP-hard problems just do not have simple hacks that solve them well.

There are more facts that concern randomized algorithms, which have a certain probability to hit a good solution; facts about more tricky, non-constant approximations, but they're far less fun and are more like boring math crunching, so I won't mention it here.

Conclusion

Albeit whether P=NP is still an open question, complexity theory contains many valuable achievements on solving everyday problems.

Do not make complexity theory just a source of lame excuses of not solving the problem or writing brute-force algorithms and garage-made heuristics that can guarantee noting. Some programmers call it all "clever cheats", but many hand-made "cheats" are of no help in approximations (as shown above with the first, most obvious "greedy" algorithm for the burglar). Instead of cheating, do not hesitate to perform a quick googling or consult experts in this area if you encounter an NP-complete/hard problem--a fast solutions with proven and limited weakness, and suitable for your specific task may exist, and may have been studied well.

Or, if you're curious enough, and possess certain bit of math education, you can study it on your own. English-speaking readers may check online pages of their favorite universities for book recommendations and, maybe, online lectures. I can't recommend anything in English since I haven't read such books. To a Russian-speaking reader I can recommend the book «Эффективные алгоритмы и сложность вычислений» by Nikolai Kuzyurin and Stanislav Fomin (pdf, site; published under OPL license), which we used during our studies at the university.

Read on | Comments (3) | Make a comment >>


More posts about stackoverflow >>