Documentation tools

Posted 3 Nov 2000 at 15:15 UTC by Excalibor Share This

We all know good programmers document the API of their libraries and document their programs. But this is a tedious, error prone, activity that takes time from hacking the code. Documentation tools came to help us, but are they the ultimate tool? What comes after that?

If you are a bit like me, you hate to join in a big project and find that code is uncommented, the libraries APIs undocumented and the project organization overtly poor.

While documenting, for example, the API of a library shouldn't be that difficult, it's time consuming. Some programmers don't do it because it takes time from hacking, others because they are being lazy or a bit irresponsible (IMHO), some because they don't know how to do it easier and take the pains of hand-writing the documentation, inevitably producing documents that are instantaneously outdated or, worse, they don't document a dime.

Documentation tools came to the aid. They seek special comments on the code, analyze the structure of the program and other things to get a document with all those functions, methods, classes, structs and basically everything you care for, readable, browsable, organized...

The first documentation tool I ever saw was a little Lisp program that seeked the "" doc strings in the (defun)'s and generated a list of defun's, their parameters and their doc strings. Then came Javadoc, Doc++, Cocoon, doxygen, KDoc, and may others (and I am surely missing a lot of them in between).

However, documenting an API or a program is far from generating a useful programmer guide and end-user level manual and all that's in between.

If you put examples to show other programmers how to use your methods/functions, you have to worry about updating those examples as the API changes, and you are not following the DRY principle: Don't Repeat Yourself (you see, I try to be a pragmatic programmer :-) besides, that doesn't help to the end-user documentation (usually) which is, as well, time consuming but as important as the API itself...

So, even when documentation tools are really useful, they aren't the last word. Which do you think are the features such a last-word documentation tool should have? What should their filosophy be?

My guesses:

  • It must be generated from the code itself, with the help of as little comments as necessary (eg. explanations should into the comment, but the signature itself should be generated from code)
  • Must generate examples from code itself, maybe from test code or from whatever source that may be using the APIm so it's always up-to-date with the working code
  • Must generate a handful of formats (at least HTML/XML) to be useful and portable
  • Must be as unobstrusive as it's possible, so if someone reads the code, it doesn't have to sort through 3 screenful of comments before reaching to the code being documented
  • It should support a wide range of languages/programming styles or paradigms to fit as many projects as it's possible, so standardization is feasible and programmers don't have to learn a lot of tools (not that it's difficult, but it takes time from hacking :)
  • Free Software
  • Configurable to some extent (don't ask me, but some config file or CLI options or something in that line)


Programs change. Docs do not., posted 3 Nov 2000 at 15:31 UTC by kbob » (Master)

The most important feature the Ultimate Documentation Tool would have is that it would automatically delete documentation that has become obsolete.


documentation: a tutorial and a reference, posted 3 Nov 2000 at 17:42 UTC by stefan » (Master)

I think documentation falls into two distinct parts. A tutorial is something which illustrates the use of a program/library with examples. It is best written by someone with some distance to the development process, as this helps to get a clearer view of what is important. Programmers tend to get used to a lot of their assumptions, i.e. they obviously don't need documentation. Writing a tutorial is a highly creative process, there is not much space for automation.

Then there is a reference manual to the API. This has to follow closely the developmental process itself. Combining documentation with the API declaration (types, operations, etc.) and using a tool to extract that is a good idea. However, I found that most of the available documentation tools lack a lot of things. The simplest approach is to extract specially formatted comments. However, in lots of cases I want more. For one, I don't want to rely on special comments. Even without comments I want to be able to inspect the sources, i.e. for example generate a dependency graph (for source files, for the type hierarchy (inheritence), for variables (containment and association). Also, for generating references, different output formats like docbook or html (in various formats) should be supported. Generating UML diagrams (at least static diagrams) would be very useful as well. For all this, the tool would need to have an internal representation of a (possibly annotated) Abstract Syntax Tree, which then can be postprocessed into various forms.
Since no other tool I was aware of met this requirement, I started the Synopsis project, which is aimed at exactly this. You plug in various parsers (so you can combine source code from different lagnuages into your AST such as IDL and C++), and various formatters.

docs..., posted 3 Nov 2000 at 17:42 UTC by Malx » (Journeyer)

Look at maillist archive at Software carpentry project.
You need dependency tool, which could work with blocks of text instaed of files. And Preprocessor, rules of which could be inserted into any text just like C language have.

The main thing - to make documentation independent of natural language :)

PERCEPS, posted 3 Nov 2000 at 20:05 UTC by sej » (Master)

Last year I surveyed the documentation tools you listed above, and most of them fell short in terms of code-obtrusiveness and back-end flexibility. But there is this one Perl script, PERCEPS, which understands C and C++ syntax, uses the simplest embedding/extracting strategy I've seen, and generates a variety of documentation formats using back-end templates.

It's rather slow on a sizeable class library, but you gotta love a document extractor that assumes trailing // comments after a variable or method apply to that member (option for leading comments as well). It generates customizable web pages with hyperlinks between classes, and you get an index of these classes as by-product. Here's an example of its use:

Re: PERCEPS, posted 3 Nov 2000 at 20:30 UTC by stefan » (Master)

sej: yes, I had a lot of hope for this tool around one and a half year ago. The maintainer announced that he had plans to rewrite it in python ('pyceps'), and even created a sourceforge project for it. Unfortunately, that's all that happened. I discussed with him about the general layout of things (modular design etc.), and we pretty much agreed. Since nothing happened, I started 'Synopsis' on my own.
PERCEPS has a lot of lacking features (notably trouble with templates), and the fact that it isn't even maintained not to speak of developed doesn't help either.
But yeah, among the existing tools it comes closest to what I'm looking for in terms of flexibility.

Re: PERCEPS, posted 3 Nov 2000 at 21:40 UTC by sej » (Master)

Stefan: the IUE project put PERCEPS to work, and distributed a modified version that probably handled templates with their source. Contact me if you want more contacts. -Scott

Autodocumenting tools a starting point, posted 4 Nov 2000 at 09:45 UTC by jennv » (Journeyer)

I think that, no matter how good, autodocumenting tools can only supplement, not replace, good handwritten documentation.

For an audience of programmers, autodocumentation can produce correct and up-to-date dependancy trees and function lists - and, if the comments in the code are correct, this may be all that's needed for those. But autodocumentation can't (AFAIK) produce an overview of what the program is meant to do, what design decisions the programmer made (why an array rather than a linked list? Why sorted on retrieval, or why sorted on entry?), what changes the programmer wants to make later, what hooks he left in place for later use...

For an audience of end-users, I don't think autodocumentation will ever work properly. A user manual is a design effort - there are questions to be answered like: 'is it a reference, a tutorial, a walkthrough of a process, a handbook...' . What level of knowledge do they have? What type of language do they understand best?

For some audiences, it's best to use latinate words - utilise, comprehend. Others prefer simpler words - use, understand. Some need to be told how to open a CD-ROM drive. Others are impatient with low-level detail.

I suspect that the effort required (or should I use 'needed'?) to write an autodocumenter with that amount of configurability would be better spent making better autodocumentors for programmers - and teaching programmers to include their design decisions and program overviews in useful places for the autodocumentors to pick up.

The Good DOCtor, posted 4 Nov 2000 at 14:11 UTC by japhy » (Journeyer)

Before we can wield supreme executive power via a documentation tool, we need to make sure programmers know how to write effective documentation, and make sure that the end-users know where it is, how to get it, and how to grep it. However, in this vein, The uber-tool should be able to keep programmer-to-programmer documentation and programmer-to-user documentation separate.

But I digress. Back to the pre-wonder-tool stage. In Perl, there are basically two types of comments -- inline comments, that start with a # and go to end-of-line, and embedded documentation called "POD" which stands for "plain old documentation". As one of the POD-people, I know the usefulness of this feature, because it allows for simple extraction from a program source. It's also a very convenient format for turning into HTML, (n|t)roff, or simple text output. It's simple and kinda powerful, given the right path to power. But it is more for documenting the functionality (that is, interface) of the program/module/thingy, than for explaining what a line of code does. It's there with the intention that someone will have read the documentation, and knows that perldoc program will return any embedded POD it finds in a happy readable format.

So in-code comments are needed, since we all do some nice krufty things, and the future maintainers of our code (which might end up being us) might want to know what's going on. I'll put a comment or two around an especially idiomatic chunk of code I write, so that if I end up showing it to someone, they'll see Black Magick, but they'll also see The Way (or at least instructions on finding it).

Maybe I'll think of something else later to say about the actual tool, rather than babble about the people involved.

Documentation made easier, posted 6 Nov 2000 at 12:47 UTC by Excalibor » (Observer)

Hi all, thanks for your input...

One thing I've thought up during the weekend is that such a state-of-the-art tool would accept is kind of literate programming style of documenting... I am not sure how it would be implemented though, but it would allow better integration of readability for end-users and maintenance for both users and clients of the code...

I mean, specially, in the examples area, some code snippets could be marked as an example which would be construed from the latest sources every time, thus making possible to have functional code and useful examples... Again, I only have some intuitions on how this could work...

ASTs and doc generators, posted 9 Nov 2000 at 09:06 UTC by taj » (Master)


I'm the author of kdoc, yet another doc generator. While your parser-AST-generator idea is the approach I've taken with kdoc (and is being refined over time), I toyed with the "everything into the same AST" idea for a bit but no longer believe it's such a good idea. Instead I believe that a "mapping" approach would be a better idea. I'll try to explain...

Take for example an IDL module and corresponding implementations in C++. There may not be a one-to-one mapping between the IDL interfaces and the implementations and then there is also the problem of different ORBs having different styles of generated code.

To get around this I've been thinking that parsing the IDLs and the C++ sources into parallel ASTs is a somewhat better idea, with a "mapping" type defined for specific domains eg, a mapping that defines the correlation between IDL interfaces and corresponding skeleton, impl etc classes as generated by OmniORB, mico etc.

I am currently refining the AST and reference system in kdoc and will attempt to implement this approach in a few weeks.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page