System R
One of the classes I'm taking at Berkeley this fall is
CS262a, which is the first part of their graduate-level
introductory "systems" class -- looking at great papers and
common threads among operating systems, networking,
databases, and the like. One of the first papers we're going
to discuss is "A
History And Evaluation of System R", which
describes the seminal DBMS built by a team of 15 PhDs at IBM
Research from 1974 to ~1980. The paper is a great read,
especially if you're interested in database internals. (If
you're going to read the paper, I suggest Joe Hellerstein's
annotated
version, which contains a number of insightful comments.)
A few comments of my own:
- The scope of the project goals and the completeness of
the implementation is remarkable, considering the time
period and the lack of other production-quality RDBMS
implementations at the time. System R included a cost-based
query
optimizer, joins, subqueries, updateable views, log-based crash
recovery, granular locking, authentication and
authorization, a relational system catalog, prepared
queries, and other sophisticated features. In fact, System R
even had the ability to automatically invalidate and replan
prepared
queries when their dependent objects changed, a feature
Postgres didn't add until 8.3 (and we still don't have
native support for updateable views).
- People often complain that SQL is a poorly-designed
language. In many respects that may be true, but it's not
because the design of the language itself was neglected:
even in 1975, the System R team gave "considerable thought
... to the human factors aspects of the SQL language, and an
experimental study was conducted on the learnability and
usability of SQL." While the goal of having secretaries and
other non-technical staff writing SQL queries was perhaps
not achieved, SQL wasn't a hackishly-designed language, even
if it sometimes feels that way :)
- The initial System R prototype supported subqueries, but
not joins. That seems an unusual order in which to implement
features, although it does make some sense (JMH points out
that neglecting joins makes the optimizer search strategy
much simpler).
- One interesting design choice is that System R generated
machine code from the query plan, rather than having the
executor walk the plan tree at runtime. While this design
sounded exotic to me at first glance, it actually makes
sense: on the hardware of the time, queries were much more
likely to be CPU bound than they are today.
The notes from the 1995
System R reunion are also an interesting read, if you'd
like to
learn more about the politics and history of the project.