Advogato's Number: The Economics of Software Complexity

Posted 7 Mar 2000 at 23:30 UTC by advogato Share This

This week, Advogato takes a look at a familiar feature of the software landscape, complexity. Why does it happen? Specifically, what are the economic forces that drive it? How does free software differ from proprietary in this regard?

Advogato has no formal training in economics, but has long been interested in the approach, perhaps due to the influence of a beloved uncle who was economic advisor to a high-powered family business. It's always seemed to me that you can use classic economics concepts to analyze the practice of software, but few economists have dared to tread there (a wave of the paw to Michael Masnick for pointing out an exception: Alan McAdams).

Most of us are familiar with the progression: a 1.0 release which is fairly simple, but usually fairly sparse in functionality. If the project is a failure, the 1.0 release is the last. Otherwise, even though it's a success, the authors are never happy with it, and put out a succession of new releases. And, predictably, each release is larger and more complex than the previous.

This pattern is so routine that most of us take it for granted, both in the proprietary and free software worlds. But does it have to be this way? After all, when an author of a book finishes writing it, for the most part it's done. Why is software so different? Why can't we just start with version 6.0 and skip the hassle of all the preceding versions?

I think the concept of investment sheds some light on the question. The cost of undertaking a new software project is large, and much more so for a version 6.0 than for a version 1.0. In fact, complexity is probably the single best predictor of the cost of a software project. Further, software projects are inherently very risky. A fair number of them fail outright, and a much larger fraction deliver, but disappoint. Thus, an initial investment in a lower complexity project is a much lower risk. And since the success of earlier versions is a fairly good predictor of the success of future versions (second system syndrome not withstanding), the larger investment required is a more reasonable risk.

But why are the new versions always more complex? Shouldn't it be possible to just make them better without necessarily increasing the complexity? In theory, this sounds nice, but in practice there always seems to be a reason to need more complexity.

Fred Brooks identified "accidental complexity" as one source in his essay, "No Silver Bullet". I personally prefer the term "needless complexity" to emphasize the idea that choices made by the authors do have some effect on the complexity.

However, not all complexity is needless. Many of the problems solved by software today are inherently fairly complex. In particular, integrating with other programs is a major source of the complexity. You want programs to integrate, otherwise you're much more likely to face a situation where things Just Don't Work. And, as these programs also grow in complexity themselves, the cost of the integration goes up. If you don't track the changes, you face bit rot.

Even in the area of integration, there are some choices that can make it easier or harder, such as paying attention early on to good standards. Bad standards (almost by defition) are one of the main sources of complexity.

Modern applications have generally moved from being command line based to GUI, which is quite a bit more complex. What we see here is conservation of misery - things get easier for users, but harder for the developers.

So, coming back to economics, some complexity is necessary to deliver software that meets the needs and desires of users, but some of the complexity is needless. Yet, we see both types in abundance. If the latter increases costs so much, why is it not rooted out at every turn?

In the proprietary software world, I think the major reason is to raise the barrier to competition. Since complexity is the major factor in cost, by raising the complexity required to implement a certain set of features, you make it much harder for your competitors. Since you don't have control over the internal complexity of their software, you just make the interfaces complex.

Ideally, you minimize the cost for yourself by adding the complexity incrementally, ie treating the work you've already done as free. Even though the total complexity of the next version is large, the relative cost is quite a bit lower. This is a strong economic force in favor of cruft.

Free software is better at resisting complexity when not needed, and there are few better examples than the sockets API for networking. It basically hasn't changed much since Bill Joy first implemented it in BSD about 20 years ago. Yet, that simple API is what interfaces virtually all applications to the Internet.

Not that there haven't been attempts to ratchet it up a version. Winsock 1.0 was a fairly straightforward adaptation of the sockets api for the Windows platform. So naturally there's now a Winsock 2.0 that includes all kinds of really complex stuff for quality of service and so on.

The sockets protocol has also gained competition from consortia. The well-loved Open Group has been pushing XTI for some time now. As far as I know, it still isn't implemented for Linux, and if it weren't for its inclusion in W. Richard Stevens' books on Unix Networking Programming, probably would be completely unknown.

For proprietary standards, there's usually a carrot and stick approach: if you want to use this cool new feature, you have to put up with this whole new API. But the free software world is pretty good at adapting something that already works.

Even in the proprietary world, the carrot has to be sweeter than the stick is sharp. How many people use Group 4 fax machines? Better yet, how many people use them over ISDN connections? On the other hand, companies such as Adobe are very skilled at taking a technology that is simple enough to be implemented by anybody (the original PostScript) and adding features (color, CJK font support, searching, links) to make it a very successful standard. Even though PDF is an open standard with relatively little intellectual property protection (the LZW patent in particular), it is difficult for other people to handle the entire beast. Thus, today Adobe dominates the PDF marketplace with their Acrobat products.

Within free software, increasing the barrier to competition is not a motivation for added complexity, but the issue of incremental investment certainly is. A classic case is autoconf and make. This is a system with a lot more complexity than is really needed, but it's not hard to see how it got there. Instead of rethinking building from scratch, the designers of autoconf probably said to themselves, "we've already got a make tool, why reinvent the wheel?" So the incremental complexity may have been lower, even though this had very negative consequences for total complexity.

Standards bodies are also very bad about treating the complexity of existing standards as zero. It is very inexpensive (for the writers of the standard) to include a whole new specification by reference, even though it might be extremely painful for implementors. SVG is an extreme example of this source of complexity. The SVG specification itself is not all that complex, but it includes by reference XML, XMLns, XPath, XLink, XPointer, CSS2, XSLT, DOM, JavaScript, sRGB, ICC, Panose, PNG, JPEG, gzip, and probably one or two others I missed. Right, SMIL Animation.

The IETF, in its foresight, avoided this kind of problem by requiring standards to be based on working, interoperable implementations. In particular, they generally require two independent implementations, which makes it much harder to sweep incremental complexity under the rug by leveraging existing integration work. Thus, in the IETF, the cost of complexity matches much more closely what it would be in the real world. In Advogato's opinion, standards bodies such as the World Wide Web consortium would do well to learn from this wisdom.

Most of the arguments I've put forth here seem just like common sense to me. However, I haven't seen them clearly articulated anywhere else, so I'll post them here and see what happens.


XTI, posted 8 Mar 2000 at 00:53 UTC by alan » (Master)

Opengroup have discovered sockets.

Finally

Prototyping, posted 8 Mar 2000 at 02:14 UTC by doylep » (Journeyer)

One could look at it this way: the issue is not so much why version 6.0 is so complex, but why the complexity increases. The answer: because version 1.0 was so simple. Just like other engineering disciplines, you start with a proof-of-concept, then build a prototype, and finally produce the real thing. The proof-of-concept is version 0.0, the prototype is 1.0, and from there it's a slow progression to a bona fide final product.

I don't look at it this way, but one could.

Re: Prototyping, posted 8 Mar 2000 at 02:54 UTC by macricht » (Journeyer)

The proof-of-concept is version 0.0, the prototype is 1.0, and from there it's a slow progression to a bona fide final product.

But you DO get to the final product, or else you've just wasted a wad of cash.

In engineering disciplines (at least from my years sloghing around in chemical production facilities), you usually have a very well defined final goal, and vary very little from it during the construction phase, then when it's done, it's done (sans some maintence).

I've never had the opportunity to design a production facility to produce HDA, and then decide to produce benzene, then car wax, then Topps baseball cards....all from the same unit.

One might argue that there is a difference between software and my chemical plant example. However, is there really a difference? Both were assembled to do a job (the unit makes goo, and the program pushes electrons), but one doesn't change much after it's put into the ground. Why should the other?

Does software need to grow more complex? In some cases, to improve end user functionality, as Advogato suggests, is necessary (keeping the parallel comparison, the push in the chemical industry to go from continuous->batch to meet more dynamic demand). However in many cases (like I learned when mucking with SVG and playing with parts of Excel 2000/W2K), that I feel people have forgotten the KISS rule.

What I would like to see is more integration and cooperation between software to perform a task, instead of having one big MegaApp hogging resources and boggling my mind...

Complexity or Generality?, posted 8 Mar 2000 at 03:41 UTC by graydon » (Master)

I can think of 2 things contributing to this problem. The first is a broad cultural issue, which is that everyone thinks of computers as sort of embodying scientific growth and thus not confined to any individual problem domain. So writing software is sort of considered an ongoing progression, rather than a simple satisfaction of a set of goals.

But I think there's another issue, which is the tradeoff between generality and simplicity. At the core of most technical flamewars is a small disagreement over where on the spectrum of generality and simplicity a program should go, and a lot of time spent with a customer or in a design meeting is delineating exactly what level of abstraction to tackle a problem at. To a large extent I think that the progress through versions is a sort of gradual creep towards generality, even within a well defined problem domain. Everyone recognizes that general solutions are "better" in a completely abstract sense because they are capable of handling more problems with fewer specialized cases, but overgeneralization quite simply kills a program before it ever sees the light of day.

So when you release 1.0, you have frequently specialized it a lot in order to make it on your budget, whether that budget is money or time or just your own intrest in solving a local problem. You know it doesn't generally solve the whole problem domain. Even if by some miracle it does, it's probably specialized internally, and there's some factoring and tidying you can do. That's why you go back at it. To make it the more perfect, abstract, elegant, efficient, minimal piece of code you set out to make. It doesn't always get smaller, but sometimes it does. I've had it happen that 2.0 is half the size and twice the speed of 1.0, because of a key insight we found after 1.0 came out.

I also feel the author's comments here not quite right about books: authors do release second and third editions, and they do change things inbetween them. More often in technical volumes, reference works, etc. A new encyclopedia comes out now and then. A new medical reference. A new textbook. Even my programming books are Nth edition for N > 1. Fiction occasionally gets re-released with new chapters too, a bit of a tidy-up, more author's notes, an epilogue, a sequel, etc. Likewise musicians and even film makers re-release improvements: remixes, live recordings, extra studio sessions, director's cuts, etc.

Re: Prototyping, posted 8 Mar 2000 at 04:28 UTC by mjs » (Master)

Componentization and integration is well and good, but as Raph was just saying, integration is one of the top sources of complexity. In Nautilus that's certainly the case.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page