Advogato's Number: Conservation of Misery in Software Complexity

Posted 29 Mar 2000 at 07:53 UTC by advogato Share This

Continuing on an economics kick, Advogato this week attempts to adapt the theory of Risk Homeostasis to software complexity. The theory is used to explain why XML hasn't made your life any simpler.

Part of the point of this essay is to try to demonstrate that economics does have some relevance to understanding of software in general, and free software in particular. I'm writing this after having read "The Simple Economics of Open Source" by Josh Lerner and Jean Tirole (available from Prof. Lerner's publications page), which is basically shit. This paper is little more than a rehash of Eric Raymond's theory of ego-boo in more complex language, with numerous technical gaffes. I'll be happy to post a detailed critique if there's interest shown.

This week, though, I want to present an actual interesting result from economics, and adapt it to the issue of software complexity. This is the theory of Risk Homeostasis advanced by Gerald Wilde. This theory, also known as "conservation of misery," predicts that making things safer (for example, adding anti-lock brakes and airbags) has much less impact on the actual risk people take on, because people are much more likely to take on additional risk when there is the perception of safety.

The movement to reduce the risks of things seems particularly strong in America, certainly more so than in France. In a widely quoted comment on this trend, Mary Shafer of NASA Ames has said, "Insisting on perfect safety is for people who don't have the balls to live in the real world."

The theory of risk homeostasis predicts that the risks taken when seen at a global level are almost invariant with respect to changes in the local risk. Wilde posits a cybernetic model with negative feedback to explain this, loosely analogous to a heating system controlled by a thermostat.

In this essay, I propose that an analogous model can be applied to software complexity. As argued in a previous essay, software complexity can be roughly factored into two components: complexity actually needed to solve problems for users of the software, and needless complexity, often arising from needs to be compatible with previous versions of the software and standards. One particular axis worth looking at is generality. A highly specialized piece of software to meet a small, specific set of needs is likely to be much simpler than one which attempts to be all things for all people. Yet, factoring software into small modules, while tempting, is not a cure-all because of the difficulty of integration.

The theory of risk homeostasis adapted to software complexity predicts that if a piece of software successfully meets a set of needs with lower-than-average complexity, then there is strong temptation to add features, generalize, or otherwise make the software more complex. This is because of the lower cost of dealing with low-complexity software. Conversely, if there's a piece of code that's working but is a convoluted rat's-nest, there isn't anywhere nearly the same pressure to generalize it to other areas.

This theory predicts, among other things, that if a software component manages to buck the usual trend and become simpler than a previous version, it is unlikely that this simplification will prevail for long. In my opinion, a spectacular example of this effect is the evolution of SGML and XML.

SGML (or Standard Generalized Markup Language) has a long history and reputation for being a powerful representation for structured documents. Yet, it has also earned a richly deserved reputation for being too complex, due to lots of optional features, strange interactions with whitespace rules, and the dependence on a proper DTD even for low-level lexical parsing. It is sometimes said that any secretary can input SGML, but to get it back out again requires a consultant.

Thus, SGML acquired a strong niche in specific areas where structured documents where badly needed, and the cost of SGML's complexity could be tolerated. Perhaps aircraft maintenance manuals are the archetypal example of SGML's application space. However, the complexity kept SGML from being applied more broadly.

The XML project was born from the appreciation of the power of SGML and of structured documents in general, coupled with a realization that much of the complexity of SGML was needless and could in fact be eliminated by a standards body composed of a small number of smart, motivated people. The original goal was an extreme simplification, so much so that a BS-level programmer could implement the spec in a week. That didn't quite happen, but even so the XML spec was a dramatic improvement. XML gives you most of the goodies of SGML, but with much less complexity. XML was used successfully as the basis for many projects even early on, such as CML, or Chemical Markup Language.

What happened then should be quite easy to predict (especially in hindsight). First, the applications of XML were generalized far beyond the original space of structured documents, all the way to becoming a generic tree structured datatype for interchange. Second, many opportunities were seen to solve more problems that the XML 1.0 spec itself did not directly address, including the management of collision free namespaces, hypertext linking, pointers into the middle of an XML document, schemas to make up for some of the limitation of DTD's, style sheets, and even a specification for how to break loose of the confines of tree structure and make XML a suitable interchange data structure for graph structured data. The W3C has made a veritable industry of adding new features to XML (I've only listed about half of the extensions I know about), and at this point the complexity of implementing all of XML-land is probably comparable to that of implementing SGML. Keep in mind that SGML has been successfully implemented in free software, for example in SGMLtools.

It remains an open question whether all (or even most) of the extensions to XML pull their weight by solving real user problems. However, just as the theory of risk homeostasis is based on perceived risk rather than actual risk, the adaptation to software complexity only requires the perception that the new complexity solves important problems for users.

So, what has this theory to teach us? First, that truly simple software is an elusive goal, probably as much so as perfect safety. This is troubling news for those who hope to build secure systems, as complexity seems to be one of the more difficult roadblocks for security.

However, the theory doesn't mean that simplifying software is fruitless, any more so than adding airbags to cars. While airbags may not reduce the number of fatalaties, they do allow people to gain the benefits of riskier driving (namely, getting from point A to point B faster). Similarly, the widespread adoption of XML has allowed lots more people to gain the advantages of SGML, and there is now much more hope for universal interchange of complex data formats than existed previously. It is merely the local simplicity of XML that has not reflected itself globally into the simplicity of XML-based software.

Truly Simple Software for Truly Simple Problems?, posted 29 Mar 2000 at 10:08 UTC by darius » (Journeyer)

Hmm.. well I think that the quest for 'software simplicity' is flawed if you are writing it for a complex problem.

Unless your problem is special its going to be difficult to write a simple solution when it's complex.

Also, I think its hard to quantify software complexity without the ability to quantify the complexity of the problem.. A rather nebulous thing IMHO :)

Simple is harder than complex, posted 29 Mar 2000 at 18:42 UTC by imp » (Master)

It is hard to write simple software. Why? Because to write simple software, one must understand the problem set. One must understand the relationships between components. One must understand where the common components really are. One needs to have a good knack for fission lines in the problem set. If you don't have all of these things, then you might be able to produce simple software. It takes a lot of thought.

I've had problems making coworkers understand this sometimes. They tend to think that there is more need for special cases than there really is. They tend to undergeneralize because they cannot see the big picture. Often times to make something easier is paradoxically much harder.

For example, I was recently reviewing some code that looked for a number of items. The code was given a limit of items to look for. The programmer in question created a special value to mean unlimited (he chose -1). This lead to a number of problems in his code. He had to check against unlimted as well as max units. However, a look at the underlying problem set showed that there were physical limts to the things one was looking for (this was a 20 slot ISA bus device driver enumeration routine). The software became much simpler when it defined the unspecified number as the max supported (in this case 100, since we thought that maybe some damn fool would daisy chain these isa buses together). It eliminated 2 or 3 tests in the code. The unspecified case was identical to hitting a maximum number of units and could be folded together.

I could go on about orthoginal issues that affect complexity. I could talk about solving the wrong problem, solving the problem too completely, an unwillingness to accept certain risks to make the code much simpler. There are some people that seem so risk averse that they make the code riskier because it is more complex. A simpler solution would have been less risky overall, but they wanted to solve this or that edge case which resulted in an overall increase in both the risk and the complexity.

XML as the New ASCII?, posted 29 Mar 2000 at 21:32 UTC by Ankh » (Master)

I don't think it's bad that people are building complex things on top of XML, although I happen not to like some of them!

If you tried to do a complete implementation of everything that used ASCII, including C++ compilers and sendmail configuration files and sock-knitting-machine drivers and TeX, you'd go insane.

I do have problems with some of the W3C standards, because they are written by vendors of complex software who start out by saying, "given the components we already own, what can we build?". I think that gives you thinks like a 2D graphics language that requires commercially licensed software and has an API that was designed to hide a linked list data structure in a web browser.

I also regret that XML was released before XLink, XPointer, XPath, XSchema and namespaces. Namespaces in particular were rushed through for political reasons, and not completely worked out. If we'd had XSchema in place early on, the XML DTD syntax might bever have happened; if we'd had XLink and friends, "system identifiers" could have been replaced uniformly by links, as could the silly (and non-DOM compliant) processing instruction for associateing documents with style sheets. In the same way, if SVG and CSS/XSL were integrated better, perhaps I'd be able to draw an HTML table with a partially transparent background and rounded corners, or render bullet lists as cloud diagrams.

The standards should be simpler, yes. XML is more complex than we'd wanted it to be. But there are more parsers available than we expected, so some of that complexity doesn't matter as much as we feared. And it's good enough to use.

I'm actually working on SGML aircraft manuals as it happens, where SGML features are used that were removed to make XML. I'm using absurdly complex software, too, to put revision bars in printed manuals at the right place, handle replacement pages, make links between part numbers and their definitions, all sorts of database and content management that SGML and XML enable.

We need more experience using structured text to interchange non-document objects, I think, so that we can learn when not to do so.

But don't blame XML itself for the complexity of things using it. I agree there's a problem, but I think it's elsewhere. Neither can we blame the commercial environment; the GNU "hello world" program with a built-in mail reader might have been a joke, but free software is generally known for being feature-rich. It's not because of cost in that case: it's emotional attachment. People like working on a particular piece of code, and so they add features to it.

Simplicity comes at a great cost of self-discipline and effort. It is valued most by the experienced. Therefore, it is no surprise that it is rare.

Conservation of Misery, a myth?, posted 29 Mar 2000 at 23:10 UTC by macricht » (Journeyer)

From my experiences in a field where risk is a very real consideration, I would tend to argue that risk is not a "conserved" quantity. It is, IMHO, something that you CAN reduce. However, what you end up doing is spending gobs of manhours trying finds way to reduce the risk (so it's conserving misery, from a different point of view). To make things worse, the more you do to mitigate risk by adding complexity in a design (i.e. more complex control systems), you run into not only Murphy's Law (there's now more stuff that can go berserk) BUT you also have a higher future maintenance cost.

The trick is to provide a careful balance between acceptable risk (or in software, non-usefulness) and the misery you (the programmer) want to experience. The best balance between the two is, in my experience, the simplest (or most elegant) solution. Now finding that solution, is a bit harder. I have also found that simplicity, at least from what I've learned, is a necessity. Unnecessary complexity 90% of the time increases your cost, impacting your bottom financial line in a bad way.

So what does this mean for software? We should be looking for ways to decrease misery (porgrammers and end users) by trying to look harder at the problems we want to solve, and looking for simpler solutions. In the long run, I believe that programmers and users would waste less time than with something bogged down with needless complexity.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page