28 Sep 2005 bwh   » (Master)

Modularization is the key to OSS success

Yesterday I argued that huge userbases aren't all they're cracked up to be. The key takeaway point being that if you have way more users than developers, the developers get overwhelmed. Users are valuable insomuch as they contribute patches, useful bug reports, documentation, and so on.

In Inkscape we've got a great userbase, they are very active in making contributions. We love our users.

In this post I'm going to switch around and talk about having too many developers.

Now, thankfully this isn't a problem with Inkscape. I think we could easily scale up double our number without problem. But it's something I've run into with past projects; it's a not too uncommon malady with many really annoying characteristics. Others have written at length on all these sociological conditions before so I'll try not to be *too* long winded...

Analysis Paralysis

With enough developers and not enough pressure to get something out the door, you often find the developers frittering away time arguing, "Well, what's the *best* approach?" Usually it's better to just get something good enough, release it into the wild, see how it fares, and fix it up in the field. At Inkscape we use the phrase, "Patch first, discuss later." Other projects have adopted similar proverbs.

Too Many Cooks

Another issue is when you have more workers than work to do. They start bunching up, with several people working on the same area of the code, but not communicating too well; inevitably there are cvs conflicts, heated arguments, bad feelings, cutlery being thrown about... not a pretty scene.

Perfect Ain't Good Enough

It's often been noted that folks passionate about doing OSS development often have a strong perfectionist streak. It's common to find people resisting the release of their code. It's human nature to want things to be "just right", but the time needed to get to 100% can be completely impractical. By the time its perfect, it may well be irrelevant.

I suspect (but couldn't prove) that the resistance to release increases with the number of developers in a project. When it's just you, by definition you release as soon as you damn well feel ready. With 2 people, now you have to check and see if the other guy's ready. Maybe he's not, so you start working on something else, and by the time he's ready, now you're not. With 10 people, this gets worse; now as a minimum you need to have someone to sort of coordinate the release and schedule things so people wrap up their work at roughly the same time.

One of the advantages that OSS has over commercial software, as I see it, is that we can be much more nimble and speedy. There's very little _real_ consequence in putting out a buggy piece of open source software; it's not like you're going to lose sales. Marketshare may be a concern for certain projects where there are a lot of different options - I've changed window managers and distros simply due to quality issues, for instance - but by and large users select based on overall quality, not the quality of one particular release. Reputation may be impacted, particularly if you are known to continually release problematic software; thankfully reputation is important enough among OSS developers to be a more than adequate motivation to put out quality work.

Churn

Another issue with too many developers is the repeated rewrite of portions of code, or "churn". You'll see a given feature or functionality get reimplemented multiple times. Asked why, you'll get a response such as, "It sucked." ;-)

In reality, the causes could be several. First, if it is a complex but necessary module, someone may want to rewrite it simply for the sheer challenge. I think this is why you see frequently see 14 different clocks or whatever. Coders get to a stage in their education where they need some particular challenge, and they think, "Boy, *I* could do that better." And off we go with the 15th clock.

Second, it can occur because the person that wrote the code in the first place has wandered off, and someone new has to maintain it but doesn't understand it. If you ask someone why they're rewriting a piece of code and they reply, "It was too hard to understand," then you know they're a novice. ;-) The result is predictable - the new code loses all the hard-earned domain knowledge gained from the first implementation, it takes four times longer than predicted, and the end result is just as problematic, just in new and more exciting ways.

Third, churn can occur when the developer is over their head; they'll focus on things that they understand, like the about box or rewriting the config file loader for the 14th time, because they really are uncertain how the internals should be done.

Fourth, it can happen when someone is just a *bit* too much of a perfectionist, coupled with a big ego, and a lack of something better to do. This is the classic "Not Invented Here" syndrome. It got imported wholesale from the commercial software world. ;-)

The best solution to churn is maturity. I doubt any programmer can turn into a good guru experienced programmer without having gone through at least some of these phases; the trick is to not get stuck, but to see the flaw in yourself, correct, and do better next time.

At a project level, there may be some techniques to help discourage churn. Keep a good roadmap, so folks have some perspective on what longer term objectives they should shoot for. Putting a strong focus on the importance of delivering regular, frequently releases probably also helps. And probably most important is having some senior guys in the project so simply set good examples for everyone else to emulate, and mentor the newbs to help them develop these skills themselves.

Modularization

Okay, finally to get around to my point... Given all the myriad issues that can occur from having too many developers, how can you scale up a project to have plenty of devs without it turning into flamewars and chaos?

The solution is very simple and embodied in the UNIX philosophy itself: Be small. Avoid trying to be the be-all and end-all of software programs, but instead focus on doing one particular, specific thing, and be the best at it. Or if you must do a lot of things, try to split it up into multiple modules, each of which does just one specific thing.

Recently on the Inkscape mailing list there's been a thread about the pros/cons of Inkscape's file format converter dependencies.

This is a feature Ted and I put together way, way back in Sodipodi days. It was originally only intended to be a quicky hack to be able to scripting with Sodipodi. I've always been impressed with how UNIX is able to do so much with a bunch of tiny little programs that read from stdin and write to stdout, by just chaining them together with pipes. So I figured if I could make Sodipodi do stdin/stdout stuff, it'd be reasonably easy to code up in C, and then I could just do whatever I needed in a shell script or perl or whatnot, treating it as just a filter.

Later, Ted figured out a clever way to add new extensions at runtime easily through simple little XML files. Quickly we found that there were ALL SORTS of things we could plug onto Inkscape this way. Lots and lots of programs have been written with this pipe metaphor in mind, and Inkscape was able to gain file format support for a range of different formats. We hooked to Imagemagik and got the whole slew of bitmap formats. We tied into scripts from xpdf and ghostscript. We even found that sketch had some good input/output conversion functions. It seemed inelegant to hook into a competitor, but it was butt-simple to do. We got Dia support the same way. Today we're encouraging Scribus to implement commandline import/export options so we can tie into their most excellent PDF capabilities.

Most notably, just recently Xara got into the game, by sponsoring Eric Wilhelm, who had gotten involved with Inkscape when we started looking at adding DXF support. Now Xara is funding him to create a converter between the XAR and SVG formats.

To me, the amazing thing here is how much power and capabilities we've gained from so little work. We don't spend time re-implementing a given converter in order to work with our wizz-bang patented DLL-based plugin system. We don't have to rewrite it because it's in Python and our program only supports InkScript. In many cases, we don't even need to allocate a volunteer to maintain the code, since often the project that created it is still active.

In my post yesterday I pointed out that value in open source projects comes from contributions to the project. Obviously, gaining a lot of developers can help drive up your number of contributions. But think about if you modularize your project, and adopt standardish interfaces that allow you to simply reuse existing code, isn't that effectively the same thing as gaining the labor of all those people who put that other program together?

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!