22 Sep 2005 prozac   » (Journeyer)

All I need is a Perl script and I can change the world

Adobe's PDF Software sucks. Their Business Model sucks. And their so-called Portable Document Format is a fraud in that it is not portable at all except when using their horrible software.

But I have to deal with Adobe. Everybody uses PDF documents.

Perhaps there are some other ways around what I am going to be complaining about here but I am complaining because I have not been able to find a solution other than buying Adobe SDK for thousands of dollars and writing my own C++ application. And I have spent months looking at what is available for download; I have investigated everything.

What Was Wanted

I had several pages of PAPER FORMS that our company's clients fill out. They were created in Microsoft Words years ago -- text with underlines and little square boxes.

We want interactive online fillable PDFs of these forms.

Adobe Acrobat is -- IMHO -- absolutely the stupidest way of creating PDF forms. In fact, using a GUI at all to create PDF forms is stupid.

Here's why:

Say I want a typical name and address form. This is what you do with a GUI to create the name field using Adobe Acrobat:

  1. Select the Text Field tool icon.
  2. Click in the approximate location and drag to the approximate size.
  3. A "Properties Dialog Box" pops up.
  4. Enter the field name and other attributes.
  5. Close the dialog box.

We now have now a the Name field. Repeat these steps for all fields (and buttons and checkboxes etc.). With six or seven fields no big deal. But I left something out.

You can not, in Adobe Acrobat, place simple text. I.e. we want a form like:

	Name: ____________________
	Address: _________________

Since you can not place the "Name:" text with Acrobat you must use a word processor -- a compatible word processor -- to PRE-create the form with general text that goes around all the form fields. THEN you create a basic PDF and edit THAT to ADD the form fields. Again, no big deal. Or is it?

What if you have ten forms? With twenty, thirty fields per page?

What if you have hundreds? THOUSANDS?

That is a lot of people manually editing PDF files.

And guess what? All those fields you used the latest pretty GUI to click and drag? You have to manually place and size each and every one of them! Sure, there are tricks like copy and paste, nudge, align horizontally, etc. But still it is all such... drudgery! Computers are supposed to free us from work. Instead computers (software) degrade us by simply replacing pencil and paper with mouse and screen.

There Must Be A Better Way

There is a better way. Adobe's Form Designer software is a step in a better direction but it too is still the GUI paradigm -- clicking and dragging pixels on a screen. Form Designer's GUI helps with some of the drudgery only with "advanced property dialog boxes" which is hardly a solution.

But, one thing Form Designer does is to save the document data not in a proprietary format but in XML!

Where a PDF is 500K the same in XDP is 50K. Not only that, I can use any text editor to modify the XML. I can use any scripting language to process the XML. XML is that cool. What Perl did to programming XML is doing (has done?) to documents.

More on why XML is so nice later, but now I need to make a side step...

But I still need PDFs for our company's clients. So, still learning Adobe Form Designer I "saved as" a PDF. And it turns out that -- right back to the sucky Adobe Business Model -- form fields in Adobe Form Designer created PDFs can not be edited in Adobe Acrobat!

Why is this a big deal?

Form Designer does a really stupid thing with it's form identifiers. I name it "name" and it names it -- when converted to a PDF -- "F[0].P1[0].name[0]".

Why is that a big deal?

Well, for me, our office needs a way to get an online filled-in PDF from our website to out office. Since our webserver is LAMP server, I had designed a rather simple and easy to use PHP script to gather the submitted PDF forms -- as an FDF file -- and then to e-mail the FDF file to the office where I can save the file in the clients "folder". Now all submitted forms are stored as an FDF which when opened by an office minion using Windows will see the form exactly as the client filled it out online. She can then print it, input it into the database, respond to it, etc.

But Form Designer's form-sub-form "F[0].P1[0]" garbage breaks my PHP form handling system. And to accommodate it I not only have to re-design it all I will have to delve into the inner -- and undocumented -- working of Form Designer's new format. I already went through that process learning PDF and FDF. Now I have to start from scratch again!

Also, since the later versions of Adobe software are so damn expensive, we do not want to convert all our staff to the new versions -- the old versions are just fine for other documents. And, of course, newer PDFs are not editable by older Adobe software.

What I Did

That Form Designer used XML was a boon because I was fairly quickly able to figure out what they were doing. So I came up with my own file format to describe and layout my forms (by converting the Microsoft Word document to text and some more text editing) -- this is all a very old way of doing things.

I then wrote a Perl script to read my form file, providing some basic input parameters like page size and margins, line height, etc., which converted it to Form Designer's XML. And I did a pretty good job for a quick hack. I had to use Form Designer to tweak the pages a bit, but I bypassed 99% of the click 'n drag drudgery!

I then created the PDFs and I used the command-line program called PDFTK to convert the PDFs into an earlier version of PDF which were much more Perl friendly (less binary data and more line-feeds).

I then created another Perl script to rename the field identifiers back to normal names (also converting them from UNICODE).

I now have PDFs that remain compatible with all Adobe versions, my PHP form handling software, and I can create PDFs from easily editable text templates.

With my Perl scripts processing my data and templates I bypass those terrible Adobe GUIs! I save so much time!

Afterword

I think that the best thing that could happen to the "Office Paradigm" is the death of the GUI.

Yeah sure, you want to do some things with the mouse, but to click 'n drag hundreds of tiny little boxes all around the screen for hours on end to create a simple form is absurd when all one needs to do is to enter a few columns of data and then have the computer do the rest!

Use the GUI to design the template, the images, the "look and feel". But start with a text file or a spreadsheet and then process the data and merge that data with a template and THEN create the resulting PDF or what have you.

Keeping all your text and images only in the end result is such a waste.

I will be a good day when all I have is text and images and rules and templates to create my PDFs, my brochures, my advertising, my letters and my websites.

All I need is a Perl script and I can change the world.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!