Source code as XML versus editing source code in structured form

Posted 19 Dec 1999 at 10:36 UTC by Radagast Share This

After we announced Conglomerate, we've had a lot of feedback, and a lot of it is from people who envision new areas of use for Conglomerate, areas we didn't even think about. A particularly common question is about editing source code as XML.

The essence of these questions usually goes something like this:

-Now that we can edit XML so great in a structured environment, why don't we create an XML DTD for the structure of a programming language like C, and then we can leave the plain text code editors behind, and have on the fly syntax checking as we code?

At which point most experienced coders are on the floor with spasms. And rightfully so, as this sort of approach would probably be cumbersome and be used by few people. But it got us thinking. Isn't there a good idea in here somewhere? Probably there is. Conglomerate wasn't really made for XML specifically, it was made for visualizing and editing documents which have structure. And C code most definitely has structure. What's more, that structure, or grammar, is easily and uniformly describable using EBNF (Extended Backus-Naur Form), which incidentally is what XML and SGML DTDs are based on too.

In addition, the visual representation of the code would probably vary considerably from the type of presentation Conglomerate uses for XML structure today. Specifically, a lot of the fancy borders and delimiters will have to go, and the display would probably be very close to what you see in a text editor with syntax highlighting.

So what, then, would be the advantage? Well, Conglomerate with source code editing capabilities would have a full C parser built in. This means that rather than using regular expressions to recognise different parts of the code (like text editors do today to figure out how to syntax highlight, for instance), the code would be represented internally as a syntax tree.

The advantages of this are many. First of all, syntax highlighting would be perfect. It would be extremely fast, and always recognise the parts of the code for what they are. In addition, it would be easy to always keep the code in a consistent state. Need to change the name of a function globally? Swap the order of the parameters? Without breaking stuff? Not a problem. It would also be easy to highlight places where you forgot a brace, etc., without trying a compile. Adding LXR-like cross-referencing capabilities on functions, defines, and variables would take a minimum of work.

Also, since the Conglomerate server will have CVS-like capabilities, a new set of features becomes obvious. The CVS manual specifically states that it doesn't take care of merging in changes on the syntactical level, only on the textual level. That is, there's nothing in CVS keeping you from changing the name or parameters of a function in one source file, commit, and break everything else. If the server knows about code structure, however, it's comparatively easy to add warnings for, and even semi-automatic fixing of this sort of problems.

Obviously, there's only so much time. We'd like to know about the interest for this type of functionality, and more ideas for what it could be used for. If there's sufficient interest (and it doesn't take all that much, we want something good to edit in ourselves), we will implement this sooner or later, but with all the other things that are going into Conglomerate, it'll probably not happen until late spring/early summer 2000, at the earliest. If you'd like to work on this, let us know.

An idea whose time has come, posted 21 Dec 1999 at 16:29 UTC by andrei » (Master)

You just made my day. I've long been looking for a source code editing enviroment that has features like what you have described. My only hesitation is that a lof of source code editing is happening in terminals using Emacs/vim. Perhaps there could be some sort of Conglomerate engine running that Emacs/vim could communicate with, exchanging information about highlighting/search/replace etc.?


Running as an engine, posted 8 Dec 1999 at 04:33 UTC by Radagast » (Journeyer)

Yes, we're aiming to modularize as much of the work we're doing as possible. All the engine stuff (both for the XML and future other structures, like source code) is going into Flux, the general-purpose toolkit library we use for all our projects. It does quite a few things already, and more features are coming. So it should be quite possible to link your editor of choice to Flux, and use the frontend you like.

On the other hand, Conglomerate is getting a scripting host too, which means that it'll be extensible and configurable almost to the level of Emacs. The choice is yours.

consistency desirable?, posted 27 Dec 1999 at 00:26 UTC by shillo » (Journeyer)

One problem I see with this approach is that Conglomerate should allow temporary drops in consistency... At least with my coding style, I would mind if the editor bothered me about cut&pastes that break structure. For example, deleting braces from the if statement would become rather involved operation.

While I suppose it -might- be possible to get used to the editor that doesn't allow me transient code breaking, I'm not sure if such an editor would be fun to use in a long term.

Breaking structure., posted 27 Dec 1999 at 07:05 UTC by Radagast » (Journeyer)

Yes, we've thought about that. The main thing that would happen is probably that the syntax highlighting would change to show you exactly where stuff starts breaking. Challenge no. 1 in this case would be to make the parser able to get in sync with the flow of the code again as close after the problem as possible, so that you wouldn't cause the entire syntax highlighting to break down for several lines. This should be possible to do, though.

But the exact behaviour of the editor when you enter something that won't parse will be scriptable. You can do everything from changing sytnax highlighting to refusing to allow the change to trying to fix it automatically.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page