Encapsulation, Inheritance and the Platypus effect

Posted 7 May 2000 at 03:46 UTC by Talin Share This

Why I'm no longer an OO fanatic: an essay on some of the pitfalls of object-oriented programming.

About four years ago, I was an object-orientation zealot. I had gotten over my initial "larval" phase of clumsy class hiearchies several years earlier, and had learned to bolt together frameworks and design patterns into a seamless elegant architecure. I knew how to make efficient templates without code bloat, and I used them pervasively.

But then something changed. It started when I noticed that some other game companies (I was in the games industry at the time) began to abandon C++ and go back to C. Other companies were adopting the policy of using C++ as merely "a better C", abandoning C++'s advanced object-oriented features. I noticed that some of the higher-profile open source projects had also elected not to use C++.

Why the regression? After all, object orientation is supposed to help programmers be more productive, write code that is more reusable and bug-free. But a lot of programmers will tell you that they've had bad experiences with large scale object oriented projects. Some of these are clearly the result of inexperience with object decomposition and analysis. Networks of "spaghetti objects" or other antipatterns have contributed to a number of high-profile project failures. But that's merely the result of applying the technique improperly, I thought.

However, I began to sense that some of the problems encountered are inherent in the object oriented paradigm, and are not merely a result of applying it improperly. I soon began to meet other programmers who felt the same way - who had gone heavily into object orientation, and then felt something lacking.

I also began to notice that there were some application domains where object orientation seemed to provide little if any benefit. For example, while I have experimented with object-oriented parser generators (see Antlr for a good example) I have yet to realize any positive result from "inheriting" one language or one production rule from another.

The platypus effect.

A number of programmers have described their class hierarchies as being "brittle". Class hierarchies are often used to represent taxonomies. In the "real world", the term "taxonomy" refers to the system of biological classification into phyla, genus, species and so forth. In the software domain, the term is sometimes used to refer to a hierarchical categorization of a diverse set of objects. An example would be the various flavors of widgets in a moden GUI environment.

However, real object collections aren't always hierarchical. In the biological world, for example, we see that scientists are constantly re-arranging the overall category structure for unicellular organisms. And as it turns out, the unicellular organisms can't be arranged into a strict tree-like hierarchy, because apparently there has been a lot of "cross-branching" as organisms have borrowed and incorporated genetic material from one another.

Similarly, in the software world, collections of objects which appear to be hierarchically arranged are often only superficially so. As one programmer put it: "You have your 'isa' hierarchy all thought out - let's say you have a "mammals" class and a "reptiles" class and so on - and you start to implement it, and along comes a platypus, a fur-bearing, egg-laying, duck-billed creature, which doesn't appear to fit in any of the classifications you've created. So what you often end up having to do is rethink your entire hierarchy, refactoring into a different set of basic categories, or maintaining several categorizations along different axes. A lot of your thinking ends up getting thrown out, as well as any implementation you've done up to that point."

In other words, this refactoring has caused exactly the kind of massive code ripple that proper object design was supposed to save us from.

Part of the problem with brittle design is due to overgeneralization. Good programmers tend to like to factor out the common aspects of their code, incorporating widely-used functionality into a single subroutine or class. The inheritance mechanism encourages a high level of factoring, allowing common functionality to be placed into a superclass. But it also encourages a tendancy to overfactor, to create a single mechanism which solves a number of apparently similar problems. These kinds of mechanisms tend to break when a platypus is encountered.

Object orientation is not reality.

Another problem that I have noticed is the widespread (and in my opinion false) belief that class hierarchies are somehow "natural". I've heard a number of experts proclaim that the reason for the widespread popularity of OO design is "that's how the world really works." But the piece of paper on my desk doesn't have discrete methods. If I decide for example, to burn it for fuel, or fold it into a paper airplane, does that mean that there is a "burn" or "fly" operation that's somehow built into the paper, and that it inherits these operations from a superclass of "flat things"? Nonsense. There are an almost infinite number of things I can do with a simple piece of paper, none of which may have been anticipated by the creator of that paper.

Object Orientation is more of a statement about how our minds work than it is about how the world works. One of my favorite Buckminster Fuller quotes is "There are no things", which means that the division of the world into discrete "things" is due to the way we parse our visual input stream.

Alternative Strategies

Fortunately, solving problem like these doesn't mean regressing all the way back to structured programming.

In this era of tight deadlines and "living on internet time", it's often useful to adopt a decomposition strategy that leads to a more plastic and flexible architecture. Some programmers have discovered ways of doing this while remaining within the object-oriented paradigm, by using strategies such as adopting "flattened" class hierarchies, or developing late binding dispatch and dynamic typing mechanisms. Others go beyond hierarchies, using "delegation" models, where an object's behavior is "assembled" from various components, rather than being "inherited" from a superclass.

But these are not well-supported by the C++ language, and there is a run-time cost. Many programmers prefer to fall back on non-object-oriented methods of construction, or to create their own object-oriented semantics implemented in C.

Encapsulation considered harmful?

Encapsulation has been the foundation of the object-oriented paradigm from the beginning. It's been the one principle which seemingly every computer scientist can agree on. However, while encapsulation is very good at hiding complex issues away from those that don't need to know about them, it's also good at hiding things that they often do need to know about.

In my 24 years as a professional programmer, I've noticed this: Nobody ever documents performance. I don't think I've ever seen in any code comment, manual or book the phrase "this routine is very fast, you can call it a lot if you need to" or anything like it. Generally, what is documented is semantics, a detailed description of what the subroutine or class does. But never have I seen a description of how fast it does it.

This is compounded by the fact that performance characteristics are often non-intuitive. I've seen many cases where a programmer will call a particular function pervasively throughout their code, thinking that the routine is fast, when in fact it is slow.

The C++ operator overloading feature is especially good at tricking programmers into mis-estimating performance impacts. For example, I've seen "smart pointer" classes which do mind-bogglingly large amounts of work (like acquiring/releasing a semaphore for thread-safe code) each time a pointer is assigned or a dereference is made. And I've seen other programmers who blithely use these smart pointers as if they were actually real C++ pointers, unaware that the cost of each assignment or dereference is orders of magnitude more expensive than the single machine instruction that they were expecting it to be. (I remember one incident where, after much cajoling and persuasion, I managed to get one programmer to sit down with me and step through his own code in the debugger. After seeing the legions of instructions that were invoked by a "simple pointer assignment", he said "oh my God" and was then very depressed.)

These problems are compounded by the fact that the traditional profiling and "hotspot" optimization is often ineffective when faced by these kinds of systemic inefficiencies. Since the performance bottleneck is pervasive throughout the code, it means that optimization cannot be confined to just a few subroutines.

Nothing is "private" to the debugger

Debugging is another situation in which the protections of encapsulation need to be violated on a regular basis. For example, in a debugging session the programmer must often "pierce the veil" of the encapsulated interface in order to trace the execution of the code and locate a suspected bug. As a result, the full complexity of the underlying implementation, which was formerly hidden, is now exposed to the programmer.

For example, the STL map class is both internally complex and filled with lots of opaque data types, making it nearly impossible to interpret in a debugger, partly because debuggers aren't well-equipped to deal with heavily templated code. Although it's unlikely that the STL containers will have serious bugs, there is still the need to look inside the container data structures to see if the objects are correct, or if a memory error has corrupted part of the container itself. Unfortunately in the case of STL, what is seen in the debugger is a great deal of seemingly obfuscated complexity.

My general point is this: Encapsulation cannot be relied upon hide details, because no matter how generalized or bullet-proof a software component is, there will always be occasions in which it is neccessary to look inside of it. If the false sense of security provided by encapsulation encourages programmers to make their underlying implementation more complex, it can seriously impair the debugging process.

The dual-duty problem

The Wright brothers weren't the first inventors to come up with a viable design for a heavier-than-air flying machine. Even Da Vinci's bird-like contraption would have worked had he been able to obtain a sufficient source of power. The Wright brothers were the first to succeed because they crossed a performance threshold: They built an airplane which was lighter, and an engine more powerful, than anybody had done before.

As mechanical systems, aircraft are very sensitive to performance issues. Rockets are even more so -- too heavy, too slow, too weak, and they simply won't work at all. In order to maximize performance, aerospace designers have developed techniques to increase the efficiency of aircraft systems. One of these techniques is to have some components of the aircraft serve more than one role. For example, the wings of the aircraft are the primary lift and load-bearing subsystems. They also contain the primary control surfaces. In addition, the wings usually contain the aircraft's fuel tanks. Running lights are placed on the tips of the wings, and engines are often mounted there as well. If each of these functions had to be performed by a seperate component, the aircraft would not be efficient enough to fly.

Unfortunately, making dual-duty components often requires breaking encapsulation. A familiar example is the "divide" instruction, which on most CPUs returns both a quotient and a remainder. But the C "divide" operator returns only a quotient, while the "mod" operator returns only a remainder. This is sufficient for most needs, but there are a number of high-speed applications where both the quotient and the remainder are needed, and there isn't time to do the expensive division instruction twice. So the only course is to resort to an inline assembly block, or some other penetration of the high-level language barrier. In the same vein, there are a number of clever assembly-language algorithms which assume that the processor's carry bit is accessible to the programmer; These algorithms are difficult to express efficiently in high-level languages.

There are examples of dual-duty at higher levels as well. One example of clever dual-duty design is the Boehm garbage collector for C and C++. In this collector, the memory allocator locates objects in memory in such a way that the address bits contained within a pointer can be interpreted so as to give hints about the alignment and size of an allocation. With this system, a pointer into the "middle" of an allocation can quickly be resolved into a pointer to the start of the allocation. However, this only works if the garbage collector has intimate details of the operation of the memory allocator.

Best-fit vs. general-use solutions.

It's common practice to develop extensive in-house class libraries. However, when one examines the code in a typical class library, one finds a lot of excess generality and functionality which not only makes the code inefficient, but cluttered and hard to read, or even to debug.

For example, I have a personal C++ class library which I've slowly built over the years. One of the things that it has is a hash table template class, similar to the one in STL but predating it. I've been using this regularly, adding to it, making it more efficient, for quite a long time. I recently discovered, however, that it only takes me about five minutes to code a hashtable implementation without using the library; And in doing so, I can almost invariably customize the design of the hash table algorithm to better suit the problem at hand. For example in particular cases I've found it useful to add reference counting, thread-safety, multiple keys or other dual-duty capabilities to the hash entries, while leaving out unneeded features such as growable head lists.

What I've learned from this is that the re-use of the idea of hash tables is far more important than the re-use of actual written code. The primary challenge with reusing code is knowing that there's something to re-use. It's true that without using a generalized library implementation, the customized hash table implementations in my code will not benefit from improvements made to the class library. But on the other hand, the simple, "one-off" hash tables are invariably less complicated and easier to debug in the context of the large application than the massive, kitchen-sink, "do it all" implementation.

Conclusion

Because programs are so complex, programmers often operate with incompete information, making estimates and assumptions about other parts of the code that are sometimes unwarranted. A programmer who wishes to write code which fulfils certain requirements for speed, reliability, security, or debuggability can only use subroutines or classes written by another if they also fulfill those requirements. However, these non-semantic attributes are usually poorly documented, and there is little language support for them. As a result, the programmer must either analyse all of the code in the underlying call tree (impossible with proprietary code, and infeasible with large projects), or they must "guess" based on clues such as naming and usage.

Object orientation, used properly, has a great expressive power to convey this implicit information, but used improperly it has a great power to mislead. Encapsulation can further complicate the issue by discouraging the investigation and understanding of underlying implementations.


Thoughts, posted 7 May 2000 at 08:15 UTC by hp » (Master)

I don't think I would use the term "object orientation" to describe all the things you mention; in particular, template containers such as the STL map class are not particularly object oriented (aside from object.method() syntax). I'd call that "generic programming" I guess, and generic programming is very much founded in guarantees about performance and algorithmic complexity. The SGI docs for STL for example make a point of documenting the performance of each operation (you said you'd never seen docs that mention performance, well here you go). Or read this interview with the STL designer, it's clear he has these issues in mind.

Stroustrup also seems to enjoy pointing out that C++ isn't simply "object oriented", see his FAQ in particular "Is C++ an Object-oriented language?"

Regarding encapsulation: you point out that it can negatively impact performance, for example you might want a combined divide/modulus operator. Of course encapsulation impacts flexibility and performance; that's in some sense the point - there's a tradeoff between the flexibility to do the most efficient thing and access all the details, and portability/complexity. Assembly is completely unencapsulated and unabstract, but it's also far too low-level and complex to get anything done, and because it lets you do everything a given machine supports, assembly code can't possibly run on a different machine. The opposite extreme, say, tying a bunch of COM objects together with Visual Basic, lets tens of thousands of people who otherwise wouldn't be capable of programming at all write fairly complex database-and-GUI applications. On an intermediate scale, encapsulation is an excellent means of complexity control for any programmer (and makes library bugfixes possible - if libraries were fully exposed, then any change would break user code!).

Encapsulation is also the only way to feel reasonably certain a program is correct; for example look at GConfListeners, a totally opaque data type in GConf with only 6 entry points. I have test code and I'm very sure that those 6 operations work correctly, at least, to the extent one can have such a certainty. If I had 27 code sections all manually walking the tree that GConfListeners represents, I would really have no idea whether my code worked or not. Moreover, if someone else is reading the code they can instantly see the 6 things the type can be used for, which is invaluable. And I can fully change the implementation and fix any bug without breaking source or even binary compatibility. These things all minimize interactions between various modules, and keep implementation details simple enough to fit in my brain. I don't have the mental acumen for maximally flexible interfaces. ;-)

So yeah there's a tradeoff between flexible and encapsulated, but it's certainly a worthwhile one most of the time, and nothing keeps you from dropping down to assembly when it's occasionally a good idea. I don't see that as an indictment of encapsulation in the general case. Encapsulation should be the default behavior of any programmer.

Perhaps the root cause of the problems you are mentioning (hash tables with too many features, platypus classes that require a total hierarchy reorganization, etc.) is simply feature bloat (perhaps what you mean by "overgeneralization"). In general an abstract data type (call them "objects" if you must ;-)) should do one thing and do it well, with fixed guarantees about algorithmic complexity, and a clear target set of applications. If you start glomming ill-advised not-very-often-useful operations into a supposedly general-purpose data type, you get screwed very quickly.

Feature bloat maybe stems from too much emphasis on code reuse. If you're writing a hash table that needs custom features, don't junk up the general-purpose hash table; write a new hash table. People often seem to motivate encapsulation primarily as a way to recycle code; thus programmers will try to recycle code at the expense of the other huge benefit of encapsulation, i.e. control of complexity and the resulting code correctness. Cut-and-paste is evil, but not the only evil.

Anyway I guess I would say most code I've seen could use more encapsulation, but encapsulation to me means "carefully control your interfaces," not "encapsulate as much code as possible in each data type."

Roos, posted 7 May 2000 at 09:09 UTC by lkcl » (Master)

i particularly like the example of the tactical simulator where, for a demonstration to some new customers, the management asked the developers to add realistic items such as trees, rocks and kangaroos.

imagine the surprise of the prospective customers at the demonstration - and the pilots of the simulated helicopter gunship - when the kangaroos unslung their surface-to-air missiles and opened fire.

the developers had "reused" the "soldier" object, and forgotten to switch off "attack" capabilities.

i have successfully used object-orientated techniques to provide a very efficient implementation of a graphical windows OS. we built up from a series of basic types: window, to tab-window, to border-window. dials, switches, knobs, sliders. the 2d graph was built up from components starting as point, vector, then line, working up to self-configuring axes which could be passed in a "data-vector-to-on-screen-vector" object, e.g a linear axis or a logarithmic axis (as an experiment i added an cubic axis) and it all worked fine, and i really enjoyed working with it.

yes, we created =, +=, *= etc etc operators for the point, vector etc objects, including an "intersect" overload on the operators -= and += to work out maximum and minimum window-coverage. doing it any other way would have been... counter-intuitive.

to hear of things like overloading [], -> etc and i have even heard of overloading the ! operator on pointers [ operator! { return this != NULL } so that programmers can get lazy and do this: if (!pointer) instead of (if pointer != NULL)] is just... ok, being polite: it's asking for trouble.

hm. i particularly like your point about the platypus. i would love to know how straight c would cope with the platypus, though.

Platypus problems, posted 7 May 2000 at 14:51 UTC by dancer » (Journeyer)

Well, speaking from experience (because, face it, experience represents our interactions with the world - What else am I going to speak from?) I've hardly ever seen a project go through it's life-cycle without encountering a Platypus. The danged things wander in just before you ship, most of the time, or when people start speccing the next upgrade.

The Platypus certainly isn't unique to OOP. It happens any time you have to code a task or feature that doesn't fit the design model. You built a house, and they want it to be a diner as well. You built a hotel, and they want a bank.

Sure you can do it. Well usually. It depends on how efficient your design is. If your design is a really efficient hotel, it's going to make a lousy airport terminal. Generic things (and here, I both contest and support (in different ways) the original article) are often easier to adapt, than something finely honed and tuned

Mostly, though, in my experience (yeah, that old thing) most of the problems described are common enough to beginners in OOP. I'm not talking about beginners in a language-knowledge sense, or a methodology sense. I'm talking about beginners in the rabid-cynicism sense. The sense you get when you're asked to write a module to produce ad-banners or process log-files, and you just know instinctively that it'll end up transferring files by FSP, performing mimetrics through backpropagation neural networks, and calculating prime-numbers. No, really.

A programmers' skill that I find rarer and more precious than gold is the ability to forsee the future. Humans do it all the time. It works quite well out to about 15 or 20 seconds (unless your keys are still in the car). A programmer/designer I prize most highly is the one who has their ear to the wall (in a metaphorical sense) and can guess the top ten perversions that some tie-wearing maniac will want applied to the code before or after it ships, and allows for it.

That sort of thing only comes with a mess of experience (and usually bad ones), but if you keep your brain and your ears open, IMO, and crank your code out every day, day after day, it does come. Then the Platypus becomes a welcome friend, because it fits what you've done, and being able to do it makes you look good.

It doesn't have to be a conscious thing. I modify my designs without necessarily being aware that I do, but the allowances are there. And the duck-billed wonders aren't so much of a pain. But then, I've been in this game for a while.

Besides, a platypus can be beautiful. :)

Some OO observations, posted 7 May 2000 at 15:25 UTC by faassen » (Master)

Some of my OO observations:

  • Composition is more important than inheritance. You need inheritance to support some things you do with composition, but it's composition that's the key concept. Keep your inheritance hierarchies as flat as possible.

  • Interfaces are important. Implementation inheritance is useful and handy at times, but it's all about sharing interfaces. I don't even mean explicit interfaces here, see my comment about dynamic typing later.

  • OO doesn't really shine in C++. No, I take that back. OO really doesn't shine in C++. C++ is too complex. Only use C++ when you really really need it; i.e. you want the raw speed and the hardware access while still being able to use some OO abstraction techniques. You need that far less than you may think. C++ gives people the wrong idea about OO. I know, as C++ was my first OO language too.

  • OO really shines in a dynamically typed language. I myself like Python, but the classical example is Smalltalk. In a statically typed language it's usually much harder to make or change a class; you have to work a lot to get the interface right, it takes lots of time to change an interface, and so on. In a more dynamic language you can do rapid prototyping. For instance, you can easily change an interface or create a new one; you don't have to make interfaces explicit from the beginning. Making a new class shouldn't be that much harder than making a new function.

  • OO libraries are often better when they're thin layers above some procedural subsystem than when they're a thick layer. That's because it's hard to make a good generic OO library, and bad OO libraries probably hurt more than bad procedural libraries.
  • Collections are good. Use them. That is, lists and dictionaries/hash tables. Collections are even better in dynamically typed languages; you basically get the STL for free without any of the complexity. Generic code without templates or explicit interface constructs or multiple inheritance.

I wouldn't want to do without OO. I think the problems cited have less to do with OO than with specific approaches to OO of specific OO languages, though it's true OO has traditionally put more of a focus on inheritance than on composition, which is the wrong way around. I've become a lot more skeptical about statically typed bondage and discipline OO though, as you can notice. Static typing is really not as large a safety net as people consider it to be, in my opinion. Static typing is mostly nice for optimization, and potentially for explicitizing some interfaces. Perhaps though, I'm influenced too much by C++'s particular take on it, and there are statically typed OO languages that do it in a nicer way.

Regards,

Martijn

Documenting performance, posted 7 May 2000 at 17:40 UTC by dreier » (Journeyer)

You write: "Nobody ever documents performance." In fact, C++ is one of the only languages whose specification does document performance. The C++ standard specifies the complexity of operations on STL containers; for example, accessing an entry in a vector<> is required to be O(1), insert on a map<> must be O(log N), etc.

Anyway, the STL comes from "generic programming", not "object oriented programming." Your essay mistakenly conflates the two ideas. It's interesting to note that Alex Stepanov, one of the main designers of the STL, is actually violently opposed to OOP. Generic programming and OOP are completely separate concepts.

bjaarne strousup , posted 7 May 2000 at 19:20 UTC by lkcl » (Master)

i forgot to mention.

1) this book got me the job with the graphic-os company.
2) chapter 12 mentions doing 40% time designing the data structures. It really, REALLY means 40%.
3) if you find a bug in strousup's book, he will send you a cheque for USD$ 1.00. a friend has one -- framed, not cashed.
agree-comments about getting OO designs right, and being able to rapid-prototype in c++: *sigh*, i wish :)

Quick-and-dirty complex, posted 7 May 2000 at 21:36 UTC by lalo » (Journeyer)

I think you got it wrong. In my opinion the best benefits of OOP and a god OOA/OOD (and OO-Documentation with UML or something) are not as much in development, but maintainability. It is a lot easier to fix and extend, and with reuse, you have less things to maintain to begin with.

Your example of having your own hash table template class shows that. If you develop a hash table class for a specific project, you can tweak it to your liking and perhaps even develop faster. Congratulations; now you have two different hash table implementations to maintain and document.

Of course that would not be a problem for code that is not meant to be maintained (the famous throw-away code). But then throw-away code is typically for prototyping, and you could do it faster in a higher-level language suited to the project and your style (either one of Python, Perl or Scheme/Lisp will serve most people).

Biological taxonomy not the same as class structures., posted 8 May 2000 at 01:07 UTC by dalke » (Journeyer)

The comparison to a platypus, and to taxonomy in general, must be considered as a somewhat broken analogy. The class hierarchy used in OO programming does not have the same meaning as taxonomy in biology. Alas, too many OO text books assert that this is true.

The full organism name for platypus, starting at Mammalia is

  Mammalia Monotremata Ornithorynchidea Ornithorhychus anatinus
(copied from http://freespace .virgin.net/g.agnew/details.html).

The Monotremata "comes from the fact that the echidnas and the platypus use the same opening for reproduction and eliminating waste products, which is an attribute that is found in reptiles". The Ornithorynchidea means (if my Latin is right) "bird nosed"; eg, "duck-billed platypus."

So the taxonomy for platypus is "egg-laying mammal." No problems there, other than the assumption that mammals don't lay eggs. Another example you could have used is the naked mole rat, which is "essentially cold blooded"). Don't forget that birds and mammals are just warm-blooded reptiles. Then again, some reptiles are warm-blooded (the leather-back turtle comes to mind; no reference).

It gets even worse. As I recall from an article in Science a few weeks ago (here, I think, but I don't have on-line access), there are species of lizards (or snakes?) which have lost and gained their legs several times over time. And as you point out, some organisms get DNA from other species. Retroviruses add their DNA to humans.

This occurs because a taxonomy does not strictly describe a common set of attributes! The model is that all life derived from a common ancestor whose descendents, over time, branched out became what we see now. The taxonomy describes the set of transformations needed to get from that ancestor to a given species. Natually there are often shared characteristics, but that isn't required.

These transformations are purely human descriptions because every organism (excepting twins, asexual reproduction, and other forms of cloning - and even those, in some cases) is genetically unique. Children are not exactly identical to their parents.

As an extreme example, there are cell lines like HeLa, which started from cancerous human cells and underwent what is called an immortality event. This are now self-sustaining cells (in the lab) and have mutated over time. Genetically speaking, they are homo sapiens, but they don't look like it.

Let's do this another way. In OO programming, Bertrand Meyer says that all derived classes must be usable whereever the base class can be used. Since taxonomies are not based on attributes or functional characteristics, membership is determined by definition. Thus, "give me an instance of a mammal" can't help but give you a mammal. Such a class definition is almost worthless since any taxonomy has that characteristic. (Oh, and in taxonomy, everything other than the organism name corresponds to a pure abstract class since only the organism is instantiated.)

There are equivalents to taxonomy in programming, but it isn't class hierarchies. There's the joke that all C programs are derived from "Hello, World." That joke is partially true, as I've seen programs which were obviously derived from another program. Mapping back to biology, that suggests reproduction is closest to cut&paste programming.

Replies to Replies, posted 8 May 2000 at 03:29 UTC by Talin » (Journeyer)

I probably should point out that my essay isn't intended to call for the abandonment/abolishment of OO, abstract data types, or encapsulation. Rather, it's purpose is to point out ways in which these things can go wrong, and to use that knowledge to avoid those situations. Alexis de Tocqueville, in his book Democracy in America pointed out a lot of the ways that democracy can go wrong (a lot of which have come true in the last century), but he was in favor of democracy nonetheless.

I should also mention that there are some areas where inheritance really does shine. For example, graphical user interfaces (and in particular the Smalltalk Model-View-Controller paradigm) is one area in which the classifications and categorizations are relatively mature. There are differences in detail between one widget set and another, but the basic concepts are pretty much the same everywhere.

What I'm really arguing against here is a dogmatic adherence to certain design principles which are very popular, without consideration that every methodology has a cost.

hp: What I mean by "overgeneralization": The best example I can think of is the Java StringBuffer class. It turns out that whenever you concatenate two strings in Java using the '+' operator, it creates a hidden StringBuffer object (strings are normally immutable in Java, so a special object is needed to do mutable operations.) However, because StringBuffer is also one of the standard Java "container" classes, it's methods are declared as synchronized. This means that each time you concatenate two strings in Java, it has to do a thread-safe monitor lock - an expensive operation - despite the fact that there is no way that any other thread could gain access to the object. Thus, because the StringBuffer is attempting to be two different things (a general container class, and a way to implement the Java '+' operator on Strings) it means that the Java language executes much slower than it really ought to. Why do I call this 'overgeneralization' rather than 'feature bloat'? Because the two different functions of StringBuffer are actually very similar conceptually, and it's only by hair-splitting that the class can be considered to be solving two distinct 'problems'.

lkcl: My feeling on operator overloading is this: If an operator is used a particular way in a math textbook, or other non-computer literature, then it's OK to implement that function as an operator in C++. So vector addition is OK, but using '+' to 'add' a record to a database is not.

lalo: While I agree there are maintenance benefits to re-use of common components, there are also benefits to global simplicity. While a simple interface is easier to understand than a complex one, the complexity of the implementation which lies beneath that interface will occasionally "leak through", in the form of coverage bugs or non-deterministic behavior. Usually the cost/benefit ratio comes down on the site of component re-use, but not always in my opinion. Sometimes weaving the algorithm into the application and it's data structures makes for an easier-to-understand application, rather than attempting to build "adaptor" code which talks to a component which is deliberately kept isolated.

Overloading and performance, posted 8 May 2000 at 04:20 UTC by lilo » (Master)

Talin wrote:

The C++ operator overloading feature is especially good at tricking programmers into mis-estimating performance impacts. For example, I've seen "smart pointer" classes which do mind-bogglingly large amounts of work (like acquiring/releasing a semaphore for thread-safe code) each time a pointer is assigned or a dereference is made. And I've seen other programmers who blithely use these smart pointers as if they were actually real C++ pointers, unaware that the cost of each assignment or dereference is orders of magnitude more expensive than the single machine instruction that they were expecting it to be.
IBM's PL/I language, fairly popular in the early 1970's, lost much of its user base for similar reasons. Operands within expressions were converted routinely from one type to another, in a flexible and intuitive fashion, binary to string to integer to floating point to what-have-you. This encouraged inexperienced coders to use whatever types they wanted and mingle them within expressions. Unfortunately, the overhead in doing so was pretty high, which meant PL/I code tended to run rather slowly....

Fragile Base Class problem, posted 8 May 2000 at 10:42 UTC by davidw » (Master)

This is not entirely on topic, but perhaps it's of interest to people who are interested in Object Orientation, its problems, possible solutions, implementations, etc... MMXX.sourceforge.net is:

An antidote for C++'s Fragile Base Class Problem. It allows C++ applications to be arbitrarily divided into binary code fragments (such as a main executable, shared libraries and/or plug-in code modules) with single-class granularity. Each component can then be substantially revised while maintaining binary backward compatibility with the components that were built prior to the revision, within limits determined by the particular design and revision strategies employed.

A "metadata/metacode gateway" (a.k.a. an "Interface Repository", a.k.a. "Super RTTI") that makes many C++ compile-time constructs visible a runtime. In particular, it makes class properties queryable and methods programmatically invocable. This is especially useful for gateway-type functions, whereby a portion of the program's interfaces are to be exposed to the outside.

humorous illustrations, posted 8 May 2000 at 12:46 UTC by apgarcia » (Journeyer)

I've seen other programmers who blithely use these smart pointers as if they were actually real C++ pointers, unaware that the cost of each assignment or dereference is orders of magnitude more expensive than the single machine instruction that they were expecting it to be. (I remember one incident where, after much cajoling and persuasion, I managed to get one programmer to sit down with me and step through his own code in the debugger. After After seeing the legions of instructions that were invoked by a "simple pointer assignment," he said "oh my God" and was then very depressed.)
I about busted a gut on this one -- not to mention the SAM-hurling kangaroos, an absolute classic.

modularity is not all, posted 8 May 2000 at 19:05 UTC by dan » (Master)

This is an interesting article which says a lot of the things I've been thinking recently. I'd agree with some of the other comments about terminology, and go further in one case by saying that OO != C++ - that the language does not support dynamic typing and late binding is (if true; I don't do C++) an inditement of C++ rather than of OOP per se.

There are two things I really wanted to do here:

One, to echo the point that encapsulation is not everything; "do one thing and do it well" is often not enough. Talin gives the example of an aeroplane wing: I'm thinking of a laptop computer. In a desktop system you can often swap out components that underperform and replace them with faster, smaller, or more capacious items that have the same interface: when introducing the size and battery life constraints that a laptop must satisfy, this approach ceases to produce as good an end product. As an enduser you pay a premium for this in cost and in product lifetime, but it meets your requirements now much more closely. Richard Gabriel talks about this, or something related, in his book "Patterns Of Software".

Of course, software is generally regarded as more malleable than other products, so we tend to place a higher priority on modularity. Still doesn't make it the only consideration, though.

The other is a comment I want to make on dancer's reply above, that a good programmer is able to "guess the top ten perversions that some tie-wearing maniac will want applied to the code before or after it ships, and allow for it". This requires some really clever management to get right: if said tie-wearing maniac grows to expect that you've absorbed the actual requirement by telepathy, he'll never get any better at stating requirements upfront. If you can predict what he's probably going to say, ask him to say it, then ask him to prioritise between that and his original feature set.

These thoughts aren't original and they're not that well expressed. If you want to see what happens when cleverer people pursue this further, though, look at the Extreme Programming methodology. In particular, You Arent Gonna Need It and . The summary would be "write only what the customer has asked for, create extensive tests and use them as requirements specifications, and don't be afraid to rip it all out and start again when the requirements change - the tests will stop you introducing regression bugs". But that sounds rather more negative than is really good: read their words directly rather than just mine.

erratum: "You Aren't Gonna Need It" and ""., posted 8 May 2000 at 19:07 UTC by dan » (Master)

The second link was to Planning Game

Yeah yeah, I should preview.

kangaroo soldier, posted 8 May 2000 at 21:02 UTC by apgarcia » (Journeyer)

didn't bob marley write a song about the kangaroos that lckl mentioned?

kangaroo soldier
in the heart of australia
fighting helicopters
fighting for marsupials...

Make it as simple as possible and refactor when needed, posted 8 May 2000 at 23:36 UTC by pcburns » (Journeyer)

After reading Martin Fowler's book on refactoring, I have become convinced that the best way to deal with the platypus problem is to write the simplest possible code to get the problem solved. By simplest I mean that you should not add extra features to deal with things that you expect to be added down the track. Instead write you code so that it is easy to maintain, so that when you come back to it later, it is easy to add the new features.

Of course this strategy completely ignores optimisation. Then again didn't somebody say that premature optimisation is the root of all evil?

On the other hand if you keep adding new features for too long without a big refactorisation it will turn into a dog. I guess there is need for balance, and some practice at refactoring - I don't think you can avoid it.

Hard times, posted 9 May 2000 at 00:14 UTC by pvg » (Journeyer)

The article describes a number of real programming issues but I don't think they are directly related to OOP. In other words, I agree with the validity of the stated problems but I find the conclusions questionable - a straw man version of OOP is constructed and neatly burned down. Let's take a look at some of the main points

Axiom: Modelling and design are hard

True enough. It is not easy to design and implement systems that are general, flexible, fast, extensible and accomodating of a broad range of unknown future requirements. This is hard with or without OOP.

Corollary 1: It's hard to live off inheritance alone

Design is indeed even harder if one constrains oneself to using a single approach for all modelling - not everything fits in a strict inheritance hierarchy. While inheritance is often overemphasised in introductory OO texts, it is not (nor is it intended to be) a design panacea. It is also not the one thing that defines an OO design - the author mentions composition and other ways to achieve delegation - these are perfectly valid OO techniques and there is no OO law that states reuse through inheritance or design by inheritance are holy grails to be pursued at all costs.

Corollary 2: Achieving durable absraction is hard

The benefits of modularity (encapsulation, data hiding, yyy) and the complexities of realizing them are fundamental issues that have been discussed since well before OOP became popular. See, for instance, the works of Parnas and Dijkstra. OOP attemts to provide some basic tools to express and enforce encapsulation - coming up with a good decomposition is still the (hard) job of the programmer.

Corollary 3: C++ is hard

C++ is a large, complex, powerful language. Among its design goals is providing OO facilities while maintaining the performance and down-to-the-metal freedom of C. This is a tall order and while C++ delivers, it does so at a price - effective use of the language requires what amounts to 40 years of ferocious training in the Arctic with Doc Savage. This is neccessary if you want to or have to get the most out of C++ - but C++ is not the only road to OOP.

But it ain't real

Reality might be spikes in my visual cortex, it might be break-dancing wave-functions. While OOP's roots are indeed in simulation, the approach has shown itself to be useful in modelling a variety of systems. There is no strict requirement to always directly map classes to concepts. It would be naive and restrictive to to interpret and evaluate OOP on the quality of its representation of human perception. It doesn't have to be real - it just has to be useful.

Take a look at Dylan, posted 15 May 2000 at 16:09 UTC by andreas » (Master)

I agree that C++ makes programming hard. But it's not OO that's spoiled, it's C++ itself. I will not go into the gory details of messy syntax, missing bounds checking etc, but let's look at the points where C++ is weak in OO:

  • Not everything is an object. A vector of ints works differently than a vector of my own objects?
  • Classes are not first-class objects. I can't pass around classes in variables and create instances of them.
  • Functions are not first-class objects. This also means no local functions to pass around, and no closures. Think about closures as tying data to a function pointer.
  • Not all functions are virtual. If the designer of the base class didn't think of making the function you want to override virtual, you're screwed.
  • You cannot add methods to classes that you don't own. But you want to do that from time to time, since it prevents you from needless subclassing.
  • No multiple dispatch. C++ only dispatches on the first argument at runtime. There are cases where you want to look at all arguments. The Visitor pattern is a kludge to get that effect for two arguments, but there's no way to do it for more than two.
  • Classes are heavily overloaded. In C++ classes serve multiple purposes: they represent the type of a data storage, they are the unit of abstraction, the unit of access control, and they are a container for the methods declared on that object. This leads to brittle hierarchies and ugly hacks like "friend" classes.

    Dylan addresses all these issues, and a couple more of them. I heavily recommend it to anyone dissatisfied with C++.

Christopher Alexander on a related matter, posted 17 May 2000 at 09:55 UTC by kervalen » (Observer)

Putting on my architecture student hat for a moment, your remarks parallel Christopher Alexander's remarks on pattern languages, 10 or 20 years after he introduced the concept. For those of you who aren't familiar with Alexander's work, he is a mathematician turned building designer and aesthetic philosopher who invented the concept of pattern languages. You can read more about his work here. He invented pattern languages in the hope of replicating some of the extraordinary qualities of some vernacular architecture; there are strong parallels between his thinking and the theories of open-source quality erected by Eric Raymond.

But a decade or so later, he abandoned the approach. I cannot lay hands on the quote at the moment, but the gist of it was that he found that following a well-thought-out pattern language did not produce buildings with the particular kind of timeless beauty that he sought, his "quality without a name." He commented that use of pattern languages was very attractive to designers, and that using them they often produced buildings more like those of Charles W. Moore (postmodern) than Mies van der Rohe (glass boxes), but that the buildings were not what he had hoped for. As I recall, most of your problems with OOP parallel Alexander's remarks on pattern languages.

Alexander has embarked on an attempt to understand that quality more deeply; for a deeper and subtleraesthetic. This is perhaps analogous to the quest for an understanding of what makes quality software. The result of this work is going to be published this year (it is hoped) by Oxford University press as three volumes under the title The Nature of Order. I have seen preprints of parts of it, it is a very subtle work and I think it is likely to be of value. I wonder if the ideas set out in The Nature of Order will have as much influence on software design as the concept of pattern languages.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page