10 Oct 2007 tagishandy   » (Journeyer)

CPAN v DTD (spoiler: CPAN loses)

I need to parse a DTD. Specifically I need to parse this DTD:

http://www.w3.org/TR/html4/strict.dtd

It’s quite a well known one. Certainly there’ll be a module on CPAN that can parse it. Let’s have a look.

XML::DTD

Looks promising, comprehensive. Unfortunately it fails with an error which is eventually tracked to a misspelled method name. So much for test coverage. Fix that and it throws a bunch of warnings that cause a rapid loss of confidence. No matter, let’s try…

XML::DTDParser

It’s a “quick and dirty DTD parser”. Hmm. “I’m too lazy to document
the structure”. Nice.

“Since version 1.6 this module supports my “extensions” to DTDs. If the DTD contains a comment in form…”

Maybe I’ll come back to XML::DTDParser…

SGML::DTDParse

SGML? That’s got to be good, right? SGML is the daddy. Every fule no that. Unfortunately it doesn’t really seem to have much of a Perl interface. It’s all about translating DTDs to XML. I might be able to use that. I’m getting desperate.

I’ll take a quick look at the test suite for a confidence boost. Here’s one:

    # Before `make install' is performed this script should be
    # `make test'. After `make install' it should work as `perl

    #########################                                  

    # change 'tests => 1' to 'tests => last_test_to_print';    

    use Test::More tests => 1;
    BEGIN { use_ok('SGML::DTDParse') };                        

    #########################                                  

    # Insert your test code below, the Test::More module is use
    # its man page ( perldoc Test::More ) for help writing this

(I’ve cut the right hand side of the test off so it fits my stupid page layout. Don’t worry - you’re not missing anything good)

The other test is pretty similar. I’m not that confident now.

XML::ParseDTD

Running out of options. Let’s look at XML::ParseDTD. From the documentation it appears to rock. The test results say “2 PASS, 2 FAIL”. 50/50. So at least it’s got some tests, right? Damn right! Here they are in their entirety:

    #!/usr/bin/env perl -w

    use strict;
    use Test::Simple tests => 2;
    use XML::ParseDTD;

    my $dtd = new XML::ParseDTD('http://www.w3.org/TR/xhtml1/D
    ok( defined $dtd, 'new() returned something' );
    ok( $dtd->isa('XML::ParseDTD'), 'it's the right class' );

(Again I’ve cut the right hand side of the test off. Again you’re not missing anything good)

I’m momentarily impressed that it managed to score two failures with that. I’m about to find out how.

Never mind, the DTD URI in the test looks a lot like the DTD I need to parse. I’m getting close. I can feel it.

Unfortunately it has a dependency on Cache::SharedMemoryCache (why?) which in turn depends on IPC::ShareLite - which doesn’t install on my PowerBook. So now I need to fix / avoid IPC::ShareLite.

See kids: the great thing about CPAN is how much time it saves!

Syndicated 2007-06-01 16:41:57 from Hexten

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!