Documentation tools
Posted 3 Nov 2000 at 15:15 UTC by Excalibor
We all know good programmers document the API of their libraries and
document their programs. But this is a tedious, error prone, activity
that takes time from hacking the code. Documentation tools came to help
us, but are they the ultimate tool? What comes after that?
If you are a bit like me, you hate to join in a big project and find
that code is uncommented, the libraries APIs undocumented and the
project organization overtly poor.
While documenting, for example, the API of a library shouldn't be
that difficult, it's time consuming. Some programmers don't do it
because it takes time from hacking, others because they are being lazy
or a bit irresponsible (IMHO), some because they don't know how to do it
easier and take the pains
of hand-writing the documentation, inevitably producing documents that
are instantaneously outdated or, worse, they don't document a dime.
Documentation tools came to the aid. They seek special comments on
the code, analyze the structure of the program and other things to get a
document with all those functions, methods, classes, structs and
basically everything you care for, readable, browsable, organized...
The first documentation tool I ever saw was a little Lisp program
that seeked the "" doc strings in the (defun)'s and generated a list of
defun's, their parameters and their doc strings. Then came Javadoc,
Doc++, Cocoon, doxygen, KDoc, and may others (and I am surely missing a
lot of them in between).
However, documenting an API or a program is far from generating a
useful programmer guide and end-user level manual and all that's in
between.
If you put examples to show other programmers how to use your
methods/functions, you have to worry about updating those examples as
the API changes, and you are not following the DRY principle: Don't
Repeat Yourself (you see, I try to be a pragmatic programmer :-)
besides, that doesn't help to the end-user documentation (usually) which
is, as well, time consuming but as important as the API itself...
So, even when documentation tools are really useful, they aren't the
last word. Which do you think are the features such a last-word
documentation tool should have? What should their filosophy be?
My guesses:
- It must be generated from the code itself, with the help of as
little comments as necessary (eg. explanations should into the comment,
but the signature itself should be generated from code)
- Must generate examples from code itself, maybe from test code or
from whatever source that may be using the APIm so it's always
up-to-date with the working code
- Must generate a handful of formats (at least HTML/XML) to be useful
and portable
- Must be as unobstrusive as it's possible, so if someone reads the
code, it doesn't have to sort through 3 screenful of comments before
reaching to the code being documented
- It should support a wide range of languages/programming styles or
paradigms to fit as many projects as it's possible, so standardization
is feasible and programmers don't have to learn a lot of tools (not that
it's difficult, but it takes time from hacking :)
- Free Software
- Configurable to some extent (don't ask me, but some config file or
CLI options or something in that line)
Comments?
The most important feature the Ultimate Documentation Tool would have
is that it would automatically delete documentation that has become
obsolete.
K<bob>
I think documentation falls into two distinct parts. A tutorial is
something which illustrates the use of a program/library with examples.
It is best written by someone with some distance to the development
process, as this helps to get a clearer view of what is important.
Programmers tend to get used to a lot of their assumptions, i.e. they
obviously don't need documentation. Writing a tutorial is a highly
creative process, there is not much space for automation.
Then there is a reference manual to the API. This has to follow closely
the developmental process itself. Combining documentation with the API
declaration (types, operations, etc.) and using a tool to extract that
is a good idea. However, I found that most of the available
documentation tools lack a lot of things. The simplest approach is to
extract specially formatted comments. However, in lots of cases I want
more. For
one, I don't want to rely on special comments. Even without comments I
want to be able to inspect the sources, i.e. for example generate a
dependency graph (for source files, for the type hierarchy
(inheritence), for variables (containment and association). Also, for
generating references, different output formats like docbook or html (in
various formats) should be supported. Generating UML diagrams (at least
static diagrams) would be very useful as well. For all this,
the tool would need to have an internal representation of a (possibly
annotated) Abstract Syntax Tree,
which then can be postprocessed into various forms.
Since no other tool I was aware of met this requirement, I started the
Synopsis project, which is aimed at exactly
this. You plug in various parsers (so you can combine source code
from different lagnuages into your AST such as IDL and C++), and various
formatters.
docs..., posted 3 Nov 2000 at 17:42 UTC by Malx »
(Journeyer)
Look at maillist archive at
Software carpentry
project.
You need dependency tool, which could work with blocks of text instaed of
files. And Preprocessor, rules of which could be inserted into any text just
like C language have.
The main thing - to make documentation independent of natural language
:)
PERCEPS, posted 3 Nov 2000 at 20:05 UTC by sej »
(Master)
Last year I surveyed the documentation tools you listed above, and
most of them fell short in terms of code-obtrusiveness and back-end
flexibility. But there is this one Perl script, PERCEPS,
which understands C and C++
syntax, uses the simplest embedding/extracting strategy I've seen, and
generates a variety of documentation formats using back-end templates.
It's rather slow on a sizeable class library, but you gotta love a
document extractor that assumes trailing // comments after a variable or
method apply to that member (option for leading comments as well).
It generates customizable web pages with hyperlinks between classes, and
you get an index of these classes as by-product. Here's an example of
its use:
http://www.ivtools.org/ivtools/doc/classes/
Re: PERCEPS, posted 3 Nov 2000 at 20:30 UTC by stefan »
(Master)
sej: yes, I had a lot of hope for this tool around one and a half year
ago. The maintainer announced that he had plans to rewrite it in python
('pyceps'), and even created a sourceforge project for it.
Unfortunately, that's all that happened. I discussed with him about the
general layout of things (modular design etc.), and we pretty much
agreed. Since nothing happened, I started 'Synopsis' on my own.
PERCEPS has a lot of lacking features (notably trouble with templates),
and the fact that it isn't even maintained not to speak of developed
doesn't help either.
But yeah, among the existing tools it comes closest to what I'm looking
for in terms of flexibility.
Re: PERCEPS, posted 3 Nov 2000 at 21:40 UTC by sej »
(Master)
Stefan: the IUE
project put PERCEPS to work, and distributed a modified version that
probably handled templates with their source. Contact me if you want
more contacts. -Scott
I think that, no matter how good, autodocumenting tools can only supplement, not replace, good handwritten documentation.
For an audience of programmers, autodocumentation can produce correct and up-to-date dependancy trees and function lists - and, if
the
comments in the code are correct, this may be all that's needed for those. But autodocumentation can't (AFAIK) produce an overview of
what the program is meant to do, what design decisions the programmer made (why an array rather than a linked list? Why sorted on
retrieval, or why sorted on entry?), what changes the programmer wants to make later, what hooks he left in place for later use...
For an audience of end-users, I don't think autodocumentation will ever work properly. A user manual is a design effort - there are
questions to be answered like: 'is it a reference, a tutorial, a walkthrough of a process, a handbook...' . What level of knowledge do they
have? What type of language do they understand best?
For some audiences, it's best to use latinate words - utilise, comprehend. Others prefer simpler words - use, understand. Some need to
be told how to open a CD-ROM drive. Others are impatient with low-level detail.
I suspect that the effort required (or should I use 'needed'?) to write an autodocumenter with that amount of configurability would be better
spent making better autodocumentors for programmers - and teaching programmers to include their design decisions and program
overviews in useful places for the autodocumentors to pick up.
The Good DOCtor, posted 4 Nov 2000 at 14:11 UTC by japhy »
(Journeyer)
Before we can wield supreme executive power via a documentation tool, we need to make sure programmers know how to write
effective documentation, and make sure that the end-users know where it is, how to get it, and how to grep it. However, in this vein,
The uber-tool should be able to keep programmer-to-programmer documentation and programmer-to-user documentation separate.
But I digress. Back to the pre-wonder-tool stage. In Perl, there are basically two types of comments -- inline comments, that start with
a # and go to end-of-line, and embedded documentation called "POD" which stands for "plain old documentation". As one of
the POD-people, I know the usefulness of this feature, because it allows for simple extraction from a program source. It's also a very
convenient format for turning into HTML, (n|t)roff, or simple text output. It's simple and kinda powerful, given the right path to power. But
it is more for documenting the functionality (that is, interface) of the program/module/thingy, than for explaining what a line of code does.
It's there with the intention that someone will have read the documentation, and knows that perldoc program will return
any embedded POD it finds in a happy readable format.
So in-code comments are needed, since we all do some nice krufty things, and the future maintainers of our code (which might end up
being us) might want to know what's going on. I'll put a comment or two around an especially idiomatic chunk of code I write, so that if
I end up showing it to someone, they'll see Black Magick, but they'll also see The Way (or at least instructions on finding it).
Maybe I'll think of something else later to say about the actual tool, rather than babble about the people involved.
Hi all, thanks for your input...
One thing I've thought up during the weekend is that such a
state-of-the-art tool would accept is kind of literate programming style
of documenting... I am not sure how it would be implemented though, but
it would allow better integration of readability for end-users and
maintenance for both users and clients of the code...
I mean, specially, in the examples area, some code snippets could be
marked as an example which would be construed from the latest sources
every time, thus making possible to have functional code and useful
examples... Again, I only have some intuitions on how this could work...
Stefan,
I'm the author of kdoc, yet another doc generator. While your
parser-AST-generator idea is the approach I've taken with kdoc (and is being refined over time), I toyed with the "everything into the same AST" idea for a bit but no longer believe it's such a good idea. Instead I believe that a "mapping" approach would be a better idea.
I'll try to explain...
Take for example an IDL module and corresponding implementations in C++. There may not be a one-to-one mapping between the IDL interfaces and the implementations and then there is also the problem of different ORBs having different styles of generated code.
To get around this I've been thinking that parsing the IDLs and the C++ sources into parallel ASTs is a somewhat better idea, with a "mapping" type defined for specific domains eg, a mapping that defines the correlation between IDL interfaces and corresponding skeleton, impl etc classes as generated by OmniORB, mico etc.
I am currently refining the AST and reference system in kdoc and will attempt to implement this approach in a few weeks.