Older blog entries for prozac (starting at number 36)

Inefficiency By Design

One of my (many) weaknesses when it comes to programming is the overwhelming desire to produce the most efficient code as possible.

One might think that that could hardly be a weakness, but that desire causes me to spend an inordinate amount of time re-writing and re-designing. Bordering on obsession, I probably spend, and I am not exaggerating, 10 times longer than I would otherwise.

(Of course, this would be all okay if I always ended up with excellent code, but that is not always the case.)

I mention this obsession with efficiency--size, speed, whatever may be the case--because it probably is the cause behind my extreme dislike (and lack of understanding) of people/groups who purposefully create inefficiency.

The World Wide Web offers two such examples: Source Forge and PHP Classes (sourceforge.net and phpclasses.org).

These two Websites could be criticized just for their "look & feel" but that would be very subjective of me (my own visual design skills are pretty lame). What bugs me more about them are the programmed in inefficiencies, their inefficiencies by design.

"Inefficiencies by design?" How could that be? you ask. Why would anyone do that? Simple answer: providing Advertisements.

There is nothing wrong with ads per se, but coupled with designed inefficiencies created solely to maximize ad viewing at the expense of web users is.... well, what is it?

Source Forge, the "world's largest Open Source software development web site" has up to five ads per page. It takes about three "clicks" to browse to a project hosted at Source Forge, which is typical of most any development website (can't get much more efficient than that!).

But when you click on a project file available for viewing or for downloading, you can get up to three more pages before you get to the file download page. (You can enable cookies to lessen the number of pages viewed by one.)

Like Source Forge, PHP Classes is a repository for developers to distribute code. Getting to the page where one can browse their content takes a few clicks, but this is to choose a mirror site. Once you get in and start to click around you are subjected to three or four ads per page. And once you see some code you like, you are subjected to several more ads, only to be told that "You need to be a subscriber and log in to access this file".

It is this process, this "bait and serve", that is so annoying.

This process could just be bad design, but if you look at the code and see the results, the conclusion that it was designed this way on purpose kind of stands out. The pages actually refresh themselves over and over as you navigate through all this.

Making things even worse, is that although much of the code made available there is free software, and there are links to Freshmeat and to author's homepages, all code archives that I have looked at are only available for download via PHP Classes. (For example, the typical Freshmeat entry has links to directly downloadable archives, but not for those referring to PHP Classes.)

Certainly, I can be considered as "picking nits" here. And I freely admit that I am no Web Guru or Programming Expert or anything. But software and websites can be designed so much better. I mean, its not like things have to be this way.

The Broken Office Paradigm

In this post I describe the current paradigm of office productivity software and why it is broken.

Background

I have been the IT Manager/Network Administrator for a small business for two years. The business, a Mergers & Acquisitions brokerage, like many small businesses, generates many documents. These documents are printed, mailed, faxed, e-mailed and posted to the Web.

The type of documents vary from standard letters and faxes to advertisements, brochures, product/service announcements, etc. We have a company logo and a corporate "style" which we use when designing documents. These documents are collectively called our "literature".

We use Personal Computers and standard "office productivity software" such as word processor, spreadsheet, graphics manipulation, and an HTML editor to create these documents. Very typical small office stuff.

I have come to the conclusion that the office document paradigm is seriously broken for it causes excessive waste of time and resources.

For complete disclosure, we use Microsoft Office (Word, Excel, Outlook, Frontpage) and Adobe Acrobat and Photoshop. At times we use other similar Windows software such as from Lotus and Macromedia. We have a small computer network (a dozen) all running versions of Microsoft Windows, along with a few networked printers, a copier and a fax machine.

All of this is what I figure the majority of small businesses use. And it is a complete waste of my time.

And I cringe at the thought of millions of businesses all wasting their time--and paper--dealing with the problems that result from all this.

The problems are many. I will take on the many issues one at a time. Most are technical but some are human issues.

First, let me dispel the notion of a "paper-less office" ever being achieved under this paradigm. In fact, we generate (and waste) more paper than ever I would have imagined. Part of this excess of paper generation is that the non-technically inclined are just so used to paper (as far as I can tell) that they can not deal with information unless they can hold it in their hands. Many in the office distribute inter-office memos by writing them in a word processor and printing them and putting them on people's desks and "in-boxes". The first thing some people do with received e-mails is to print them out!

(One of the worst examples of this is when someone in the office wants a new advertisement, mailing or letter generated, they write the text they want in Word, print it out, and then hand that paper to the IT department for processing.)

This techno-illiteracy, it seems to me, can not be solved without time consuming training on the computers and software people use, even though they use them every day. (This may, of course, be an isolated case limited to the people in the office I work in.//1) Part of the reason for techno-illiteracy is the extreme, and I do mean extreme, complexity of the Microsoft Windows menu/dialog-box interface--but that is a separate issue.

Perhaps to grasp the difficulties of office document life, I should just examine the life of a typical office document.

The act of saving a document makes no sense to the inexperienced computer user, I see it all the time. Discussing this is a bit off topic, but consider: you are presented with a dialog-box with a "save in" location which is the current folder--you do not see the entire folder path--with a linear, branching folder hierarchy knowing the full folder path is crucial to understanding where you are in the Windows computer file system. And in this "current folder" there is a list of other documents and other sub-folders.

If simply clicking the "save" button was enough there would be no issues, but saving documents in a single location is too confusing when one has thousands of documents to deal with. Hence, we put documents in folders. But folders live in "paths" and traversing back-and-forth along these paths is difficult when all one sees is one part of the path only.//2

Having successfully created a new document in a word processor, it now exists as a file somewhere on some file system on some computer in the office. With the standard set of "office productivity software" there is no easy way of locating documents--one has to know the full name and the full path of the document to find it--you have to know where it is located to locate it. All the programs I have used have a small "recent file history", and there is a Windows "search" interface--that is it.

Broken piece number one: Non-intuitive, overly complex GUI designs.

Once we have documents, and have learned the ways of saving and opening them, we start to do things with them. Most of the documents are to be printed; letters for example, letters on "letterhead". To print a document on letterhead (a really nice paper with or logo, printed by a real printer) we have to format it to fit the margins. This process generally goes well for that is what word processors are designed for. You can even set the word processor up to default to the right settings for new documents. This works well. Until you need to change something global.

If your office is like ours you have many documents. And when you change something global, like the size of your logo, which effects say, the top margin, with a standard office word processor, you have to manually change every document you have to update to the new margin settings. Sometimes, word processors can store "styles" in a common template, but not some things like margins and paper size.

Broken piece number two: Outdated software design of embedding sheet parameters within each document.

Microsoft Office documents are, of course, of a proprietary format. This means that the documents I create can not be shared with anyone who does not have the same program that I used to create the document; sometimes even the exact same version. (This is known as "planned obsolescence".) This problem is so widespread that we all know about it. It is quite ironic, to me anyway, that the main "solution" to this is a well advertised, widespread format calling itself a "portable document format" that is itself a proprietary format which requires everyone to have the same program and sometimes even the exact same version to read.//3

So, let's say I want to send someone one of my Microsoft Word documents as a "portable document". I load the document in Microsoft Word and I "print" it using a specially purchased "driver" which converts it to a PDF. I now have one of these "portable documents." Of course, I then have hundreds of these portable documents, one for each of the Word documents I originally created. And, of course, when I change my logo, I have to manually locate, edit and "print" each document anew.

Why don't you just change to a new word processor you ask? One that saves as PDF? You can't. PDFs do not work that way. Adobe only provides printer drivers. Adobe Acrobat, Adobe's PDF editor, is extremely limited in what it can do. It has no real word processing capabilities, it is basically for "touch ups".//4

Adobe's conversion capabilities are also a bit limited; it does not convert Microsoft Word forms for example, which is a dirty shame. Adobe Acrobat needs to be used for creating PDF forms.//5

Broken piece number three: Vast gap in format conversion software.

Another related problem our office has is that some of our literature we create is printed on pre-printed paper--that is paper which has on it our color company logo. Having paper pre-printed is cheaper than to buy and maintain a high-end printer ourselves. But this "pre-printing" does not have a digital equivalent. So we have to import images into our documents to duplicate the pre-printed paper and then we make the PDF document. What that means is that we have to have two versions of the original document--one without images and one with.

Our office basically has to maintain three or four versions of most of our documents, and we have to manually update each and every one of them.

Broken piece number four: Multiple document formats means multiple programs and editing processes per document.

Then along came the Internet.

The Internet changed everything. Well, it just added to the quagmire. We have, of course, a company Website. And, of course, our literature needs to get "published" on the Web. This means, of course, simply yet another document format--another computer program and another way of converting and editing our existing documents. The result is a fourth version of every piece of our literature that gets to the Web. And another manual edit every time there is a global change.

Broken piece number five: The edit/convert/maintain process grows geometrically with each new format and always requires another proprietary program.

Well, that is how document handling using Microsoft Windows "office productivity software" is currently "done".

A Solution

There are solutions; I think. (Hopefully I will be writing more about them at a later date.) For right now let me offer one solution, not viable for me right now, for it will take a long time to convert to, and in the meantime our office must deal with our current set of documents, but one that I want to work on.

But before I talk of a solution, let me say that the solution is not simply to replace Microsoft Office with StarOffice or OpenOffice. Those do not change the paradigm one bit. They may cost less, but they do the same thing and in the same manner.

Nor is the solution one great-big-does-everything-program like integrated "solutions" such as Lotus SmartSuite or Microsoft Works.

I think the solution is Open Source or Free software using HTML and XML along with programs running on an (internal) Apache server written in a language such as PHP.

All documents would be created and written in ASCII and stored on the server (the editing process would include some sort of WYSIWYG interface). I think that HTML along with CSS and XML will be able to provide a way to format documents just as does any advanced word processor.

I don't think there needs to be a database other than a database of information about the documents; for indexing, searching and reporting.

This way will present other issues (many I have not even thought of). You can not highlight a line and make it bold when you are editing an HTML form and text is in a TEXTAREA tag. And you don't want to have to mark your text with some really weird syntax just to italicize a few words in a paragraph.

But here is something I just went through, and if I describe it, it may help in conveying my meaning.

Someone gave we a word document of a bunch of paragraphs of text and wanted the text up on their website. Each first line and last line of the text was to be bold. There were about a hundred paragraphs.

Microsoft Word's export to HTML feature is horrible. And I was not about to go through the process of highlight a line of text with the mouse, press Control-B, highlight another line of text with the mouse press Control-B, highlight another line of text press Control-B, ...

No need to. I saved the text as ASCII. Wrote a PHP script (I could just as easily used Perl) to do the formatting for me; the script generated output in HTML. Viola! I now have the text along with a small piece of code that does the formatting they wanted. Perhaps they want italics instead of bold? Minor change to the PHP and I re-run the script. Viola!

That is the paradigm I am thinking of. Text and an algorithm to display/convert that text. Perhaps there would be a template describing the attributes of layout. Text, a template, and an algorithm. We have the text, assign it to a category, the category has a layout and we run a script to convert the text using the layout to create either, a letter, a fax, a memo, a brochure.

We can do anything with Free Software. But we will always be limited by the current Broken Office Paradigm.

Notes:

1. Some of the users in our office show absolutely no initiative when it comes to computers.

2. I can write much more about some of the problems relating to GUI designs, but wanted to keep this short. Maybe later...

3. I mean Adobe PDF of course. Perhaps PDF readers are freely available enough now, but the format remains proprietary. Perhaps there are some third-party programming libraries, but still, it is a very difficult process to manipulate PDF documents; especially to convert to and from PDF document. Try it with forms and see.

4. Perhaps there is a word processor out there that saves in PDF form. I am not about to look. The process of trying demo versions is too time consuming. Besides, converting to yet another set of programs to edit documents is a long and expensive process.

5. There are some other conversion programs out there, but again, the process of finding them is long and can be expensive.

A "Username/Password" Cookie?

Most Websites these days have "forums" or allow for user "comments" etc. And most of them do not allow posting without "logging" in or becoming a "member".

I dislike the thought of constantly having to create yet another online account every time I feel that a site is interesting enough to want to participate. It is a complicated process: I go to a site, feel that I want to post a comment, get a "you must register" message, adjust my Browser settings to allow cookies for this site, sometimes I have to allow the site to use Javascript, go through a form or two providing name and e-mail address, sometimes more, submit form and wait for e-mail confirmation, get e-mail, now I can log in and "participate" in the "community" by posting my silly little comment.

But all this can be automated.

And all proposals that I have seen seem like HUGE CONGLOMERATIONS of PARADIGMS of OBJECT ORIENTATION and other high-falutin computer science precepts.

Well, what if there was something like the how COOKIEs are stored and transfered that allowed for the transfer of some kind of USERNAME and PASSWORD that sites can read from our computers?

I see an ASCII file format like an LSM:

Username: Jones
E-Mail: jones@adventure.org
Password: xn00Hg&6lklj(08jhss896

Where the password is a one-way hash and when the Browser initially negotiates with a Website that hash value is what is transfered and the correctness of the password is then checked via standard HTTP Authentication.

I mean, it's got to be as simple as this? Can't it?

PHPyu

No name change. The secondary names I thought of are already in use. (It is very important to check for names in use or even similar.)

Redhat

I just installed Redhat v8.0. Nice. But what's with the use of "Wizards"???

I think Redhat/Linux developers should use the name "Guru", as in "Internet Connection Guru". Much better and not so blatantly Microsoft.

I just figured it was common practice to add structure members to the end of the structure so as to not break existing code initializations.

Alas, the Linux 2.4 file_operations device driver structure has a new member, struct module *owner inserted at the beginning of the struct! Ugh!

I wasted several hours debugging/porting a 2.2 driver that had all of the function pointer members off by one!

(Partly due to a stupidism on my part: I ignored some compiler warnings about it--shame on me. Sometimes my brain just does not work like other people's.)

[I know about the abreviated GCC way of initializing structs but this was existing code written by someone else.]

I think it would be cool if more news and magazine Websites (those which have new content each day in the form or articles) had syndication like Advogato: http://www.advogato.org/rss/articles.xml

Imagine if one could go to, say, CNN.com, and get it's site in RSS/XML--just the content and no HTML, images, etc. If I have an RSS/XML client I could download content so much faster and view in my own way, in my own style. I could download and archive for later viewing. The Net would be less congested (perhaps).

Of course, there wouls have to be some sort of money-making scheme to get it all to work...

An Adaptive Sorting Technique for the Web

Web-sites such as news.google.com can provide a more personalized presentation to visitors by applying a simple sorting technique.

Long lists of information--principally of Web-links--can easily be sorted by probable interest so that links a user may be most interested in gravitate toward the top of the page.

Google's news page is a good example to explain how to apply this technique. (http://news.google.com/)

Google's news page lists news article links from many disparate yet related sources--from traditional new sources such as Time and CNN to online news such as Salon.com and Slashdot.org. The list is sorted in (what appears to be) an arbitrary manner.

There are two levels to Google's sorting approach; links are categorized first--Political, Entertainment, Science, Sports, etc. Each category is sorted by publication time, latest first. What Google has (intentionally or not) basically applied a weighting factor to each link: Category and Time. By further weighting with keywords--something Google has capability of already--Google can simply maintain a selection history for each unique user, and use it to sort by this weighting factor.

For example, if we were to look at my viewing history, one would find that I rarely view articles categorized as Sports and Entertainment, and mostly view articles categorized as Science and Politics. If there were a list of keywords attached to each article-link Google would have a measure of what kind of articles I mostly view.

Google can then sort the articles list to my probable liking--articles most likely of my interest at the top.

"The artist learns what to leave out."
   -- Ray Bradbury

Sometimes it pays to re-write. I finally do not feel apprehensive about releasing more code -- I have re-written, taking many things out, PHPPyu. I feel good about its quality and its usefulness.

(PHPPyu is a mini Web Portal written in PHP with Blog/BBS like features.)

1 Nov 2002 (updated 1 Nov 2002 at 15:15 UTC) »

Added BBS features to PHPPyu: couple of forums, mail to users, finger feature, etc. Kind of like the Waffle BBS (if you have ever seen it). New and different -- different from any other Web BBS I have seen.

People actually posted to the story. Cool.

Still no passwords for user accounts. I wonder if it will work.

27 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!