30 Aug 2010 zanee   » (Journeyer)

Relational Databases are killing content management.

Why LAMP is wrong for content management

LAMP. Linux/Apache/Mysql/PHP,Perl,Python is THE opensource software solution stack for programmers/administrators doing almost anything, specifically rapid application development on the web. There are numerous frameworks that pull all the pieces together in hopes that large and parcel anyone can build robust and usable web applications and websites. This includes content management. For the most part this works well, even through the hurdles of getting all of the specific components to work together nicely. Unfortunately the masses have taken the context of building anything to mean that LAMP is the ONLY solution to building applications on the web.  So the standard de facto has been to look at a problem and automatically assume LAMP as the underlying technology for the space. Blog? LAMP. Twitter like site? LAMP. Website for my company? LAMP.  Simple web page with a small contact page? LAMP.  To be fair, this works well for many context spaces. Especially because the LAMP system is highly modular one can literally pull one piece of the overall solution set and replace it with something completely different. However as any engineer and architect will tell you. When building a bridge, it helps to know what is going to travel over it, under it, through it and unfortunately, into it. If not you end up with something like this.

The collapse of the Ironworkers Memorial while under construction on June 17 1958

The collapse of the Ironworkers Memorial while under construction on June 17 1958. Collapsed during construction due to miscalculation of weight bearing capacity of a temporary arm.

This is generally what content management tends to resemble when it meets the architecture of LAMP. The software solution set is a complete failure for content management applied as a standard de facto solution primarily because of the M in LAMP. In this case I mean Mysql but the problem in actuality is any RDBMS or Relational Database Management System. It's not the only problem but it's one of the most critical components to getting the idea of content management right.

Lets start by taking apart the LAMP acronym.

Linux; As far as the operating system goes. Linux is a tried and true system which has been under development for nearly two decades. It has consistently pushed the envelope in regards to utilizing the underlying hardware to provide a robust and capable operating system that can scale from the server to the desktop.

Apache; As far as web serving goes. Apache is again, tried and true and has been under development for nearly just as long with many lessons learned. It is the standard de-facto server for serving static/dynamic content with an extensible and modular system which makes it capable of providing support for many different applications and setups.

Mysql; The relational database management system that had the honor from many as being the first RDBMS they used and the cut throat bare bones data solution. In early Mysql releases there was no idea of ACID Atomicity, Consistency, Isolation, Durability in the project. This overhead was seen as generally a problem that could be solved higher in the application space but for the most part everyone using Mysql during that time had no need for such compliancy. This made Mysql extremely fast and of course, with the exception of Postgres (which did concentrate on these things) it was opensource and freely available. Things have changed much since those days, Mysql has generally become ACID aware and is surprisingly owned by Oracle.

Perl/PHP/Python; For the most part Perl as a language was dominant in the 90's. Since then it's fallen behind in regards to the web application space. There are numerous reasons for this but one of the major reasons is that many new programmers have written Perl code that is difficult to read and maintain. The language wasn't originally intended for large object-oriented projects (Ruby which has syntax largely like Perl was written primarily with object orientation at its core and the Ruby language has a vibrant community and a popular web app framework called Ruby on Rails or ROR) and maintenance has generally become a nightmare. This isn't a blackmark against the language past it allowing poor practices that have seemingly been embedded with the programmer. As those new and junior programmers become more literate their Perl code improves but those old projects and web applications they've written or help write don't fare as well. There tends to be a group-think backlash against the language because of that. "It was written in Perl? Ugh.. nightmare".

PHP tends to have the same exact problem, low cost of entry, except the community around PHP is wholly vibrant and releases often. PHP as a language itself has had many of the same problems as Perl in regards to building a large software project. Most of which have been remedied with time, namespace support, halfway usable object orientation support etc. It still lacks many commercial features that you can't readily use without purchasing them or their frameworks.

Python on the other hand, has a much higher cost in regards to learning curve than the previous two. However it's still a very easy language to learn and enforces many good practices out-of-the-box that the other two languages don't. Proper formatting, a useful object oriented system, the idea of namespaces, unit tests and code structure. These are all important concepts in architecting software (building your bridge) and it's a viable language.

Ruby isn't a P but needs mention as it's also a dominant language used. It's highly object oriented and as stated above has syntax largely like Perl. It's essentially referred to some as "Perl done right"  and it's primary author has stated that "I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python. That's why I decided to design my own language".

So what is the problem with Relational Database systems and content management?

Relational Database systems are from an era where object-oriented programming didn't readily exist large and parcel. The concept of an object quite frankly was  foreign. Most programming had been functional and procedural, no one had any idea how useful it would become. The general crowd-think was more concerned with "records"; it had worked well since the 1940's and many companies had spent large sums of money setting up internal systems. A nice line printer spilling out data onto dead tree's with a list of users, phone numbers, etc. It was easy, pulling out all of this information. However as time passed and with the advent of the internet the concept of object-oritentation became more important. Java came to dominance because of this. We needed to do more than just relate information but also update it real time, expose it to other systems in-house and to our partner systems. Records in a flat table were simply not enough, we wanted to have a representation of a user and all of the attributes important to us updated dynamically as they were updated by our staff, customers or both.

Computer languages started to reflect this fact. Most languages started receiving object orientation methodologies. C got it's OO super set in C++ and then there was Objective-C which got it's clothing from Smalltalk and obviously Java. Which sparked a new commercial trend and is probably the most dominant object oriented program in use today.

Cube vs Cylinder

Square vs Circle, Cube vs Cylinder, Oil vs Water.

Unfortunately, as the way we wrote programs changed, the way we stored the data our programs created or needed did not. Programmers began writing Object Relationship Mappers ORM's to map the objects they created in their programs with the way they stored their data. So one would design a user object with a name, address and phone number in their program and then have to create a table for this in the relational database and then have to map between the two. Obviously for time sensitive applications the overhead of conversion became an issue. More importantly though a whole system to manage the consistency between the two became an issue. If one got out of sync with the other it would cause no small amount of trouble for critical applications.

Enter Object Oriented databases or OODB's. A programmer could create the object in his program and store it into an object database. Early versions were considered slow and inefficient however OODB's tended to hold more data than relational database systems, were generally faster than relational databases as there is less to lookup and no overhead or extra systems to manage between a relational system and an object system. As well as being more secure it seemed like an easy win however there was no uptake. Most commercial organizations were and still are used to the idea of a "record" and Oracle as a company is simply a good salesman; they came up with Oracle Object Relational support. OR in more simple terms a superset to SQL to make Oracles relational database behave more object like. Also the sheer force of SQL and the relational database eco-system made it hard to see through the clouds. In-fact, most people didn't and don't even bother looking. There was no real advantageous reason to go with an object oriented system if you had an object relationship mapper. Summarily, no one took up OODB's except for organizations in the know, primarily scientific and engineering houses who had large amounts of data they needed to warehouse and work on. Every one else stuck with RDBMS, until it started hurting them. They would eventually retool by either finding and consulting with Oracle or one of the commercial Object Oriented Database providers. It's a testament to Oracle's success that they are the ONLY database game in town. Really, what other commercial database company can you refer to off the top of your head? I'll wait... Right, so that leads to the present day where there is more talk of Nosql databases (object databases, graph databases, high perform key/value stores etc things like HadOOP, CouchDB, MongoDB, Redis, Neo4j, Allegrograph) but not much has changed in the last two decades. This time around things seem much different and the database playing field is bound to go through transformation with web semantics and html5 database standards. We can only wait and see.

In the interim the previous decades were simply unfortunate for database stores and summarily the content management space as it is highly object oriented. Your customers want to manage content. Videos, Users, Large lists, Blogs, News, Images etc. All of which are objects that need to be stored somewhere, and for the most part that is occurring in a relational database. Which means one has to overcome the problems above and 9 times out of 10 it requires a lot of time and engineering that simply isn't done properly. Hence choosing the standard de facto in LAMP, the bridge eventually collapses. A collapse maybe downtime, or loss of records, or constant maintenance, security threats all of which can be lessened by building the correct bridge for the problem space. In content management that is an object database; or some combination of nosql/relational and object data depending on application.

How do I change the M in LAMP to an O for object database or something similar?

Well, if you plan on managing content there is ZODB or the Zope Object Database for Python which is part of the overall package for the Plone Content Management system. There are also DyBase, db4o, Twig etc. Past that your options are currently limited without an object relationship mapper but it pays to understand the problem space so you can architect your content management solution appropriately even should you need to keep your data in a RDBMS. Whether it be for something as simple as management of blog data or a large list of data it pays to know that you have a bridge that can withstand all of the nets elements. Hopefully, next time you are talking with your consultant, client, design company or web team and you hear LAMP you have a better idea of what it entails and how to apply your business needs and process using LAMP if you need to. LAMP isn't always the answer and it certainly isn't always the answer for content management solutions.


Syndicated 2010-08-30 19:47:00 from Christopher Warner » Advogato

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!