OBJECT PREVALENCE
Posted 23 Dec 2001 at 03:46 UTC by KlausWuestefeld 
Transparent Persistence, Fault-Tolerance and Load-Balancing for
Java Systems.
Orders of magnitude FASTER and SIMPLER than a traditional
DBMS. No pre or post-processing required, no weird proprietary VM
required, no base-class inheritance or clumsy interface definition
required: just PLAIN JAVA CODE.
How is this possible?
Question: RAM is getting cheaper every day. Researchers are
announcing major breakthroughs in memory technology. Even today,
servers with multi-gigabyte RAM are commonplace. For many systems, it's
already feasible to keep all business objects in RAM. Why can't I
simply do that and forget all the database hassle?
Answer: You can, actually.
Are you crazy? What if there's a system crash?
To avoid losing data, every night your system server saves a
snapshot of all business objects to a file using plain object
serialization.
What about the changes occurred since the last snapshot was
taken? Won't the system lose those in a crash?
No.
How come?
All commands received from the system's clients are converted into
serializable objects by the server. Before being applied to the
business objects, each command is serialized and written to a log file.
During crash recovery, first, the system retrieves its last saved state
from the snapshot file. Then, it reads the commands from the log files
created since the snapshot was taken. These commands are simply applied
to the business objects exactly as if they had just come from the
system's clients. The system is then back in the state it was just
before the crash and is ready to run.
Does that mean my business objects have to be deterministic?
Yes. They must always produce the same state given the same
commands.
Doesn't the system have to stop or enter read-only mode in
order to produce a consistent snapshot?
No. That is a fundamental problem with transparent or orthogonal
persistence projects like PJama (http://www.dcs.gla.ac.uk/pjava/) but
it can be solved simply by having all system commands queued and routed
through a single place. This enables the system to have a replica of
the business logic on another virtual machine. All commands applied to
the "hot" system are also read by the replica and applied in the exact
same order. At backup time, the replica stops reading the commands and
its snapshot is safely taken. After that, the replica continues reading
the command queue and gets back in sync with the "hot"
system.
Doesn't that replica give me fault-tolerance as a bonus?
Yes it does. I have
mentioned one but you can have several replicas. If the "hot" system
crashes, any other replica can be elected and take over. Of course, you
must be able to afford a machine for every replica you
want.
Does this whole scheme have a name?
Yes. It is called system
prevalence. It encompasses transparent persistence, fault-tolerance and
load-balancing.
If all my objects stay in RAM, will I be able to use SQL-based
tools to query my objects' attributes?
No. You will be able to use
object-based tools. The good news is you will no longer be breaking
your objects' encapsulation.
What about transactions? Don't I need transactions?
No. The prevalence design
gives you all transactional properties without the need for explicit
transaction semantics in your code.
How is that?
DBMSs tend to support only
a few basic operations: INSERT, UPDATE and DELETE, for example. Because
of this limitation, you must use transaction semantics (begin - commit)
to delimit the operations in every business transaction for the benefit
of your DBMS. In the prevalent design, every transaction is represented
as a serializable object which is atomically written to the queue (a
simple log file) and processed by the system. An object, or object
graph, is enough to encapsulate the complexity of any business
transaction.
What about business rules involving dates and time? Won't all
those replicas get out of sync?
No. If you ask the use-case
gurus, they will tell you: "The clock is an external actor to the
system.". This means that clock ticks are commands to the business
objects and are sequentially applied to all replicas, just like all
other commands.
Is object prevalence faster than using a database?
The objects are always in
RAM, already in their native form. No disk access or data marshalling
is required. No persistence hooks placed by preprocessors or
postprocessors are required in your code. No "isDirty" flag. No
restrictions. You can use whatever algorithms and data-structures your
language can support. Things don't get much faster than
that.
Besides being deterministic and serializable, what are the
coding standards or restrictions my business classes have to obey?
None whatsoever. To issue
commands to your business objects, though, each command must be
represented as a serializable object. Typically, you will have one
class for each use-case in your system.
How scalable is object prevalence?
The persistence processes
run completely in parallel with the business logic. While one command
is being processed by the system, the next one is already being written
to the log. Multiple log files can be used to increase throughput. The
periodic writing of the snapshot file by the replica does not disturb
the "hot" system in the slightest. Of course, tests must be carried out
to determine the actual scalability of any given implementation but, in
most cases, overall system scalability is bound by the scalability of
the business classes themselves.
Can't I use all those replicas to speed things up?
All replicas have to
process all commands issued to the system. There is no great
performance gain, therefore, in adding replicas to command-intensive
systems. In query-intensive systems such as most Web applications, on
the other hand, every new replica will boost the system because queries
are transparently balanced between all available replicas. To enable
that, though, just like your commands, each query to your business
logic must also be represented as a serializable object.
Isn't representing every system query as a serializable object
a real pain?
That's only necessary if
you want transparent load-balancing, mind you. Besides, the queries for
most distributed applications arrive in a serializable form anyway.
Take Web applications for example: aren't HTTP request strings
serializable already?
Does prevalence only work in Java?
No. You can use any
language for which you are able to find or build a serialization
mechanism. In languages where you can directly access the system's
memory and if the business objects are held in a specific memory
segment, you can also write that segment out to the snapshot file
instead of using serialization.
Is there a Java implementation I can use?
Yes. You will find Prevayler - The Open-Source
Prevalence Layer, an example application and more information at
http://www.prevayler.org.
It does not yet implement automatic load-balancing but it does
implement transparent business object persistence and replication is in
the oven.
Is Prevayler reliable?
Prevayler's robustness
comes from its simplicity. It is orders of magnitude simpler than the
simplest RDBMS. Although I wouldn't use Prevayler to control a nuclear
plant just yet, its open-source license ensures the whole of the
software developing community the ability to scrutinize, optimize and
extend Prevayler. The real questions you should bear in mind are: "How
robust is my Java Virtual Machine?" and "How robust is my own code?".
Remember: you will no longer be writing feeble client code. You will
now have the means to actually write server code. It's the way object
orientation was intended all along; but it's certainly not for
wimps.
You said Prevayler is open-source software. Do you mean it's
free?
That's right. It's licensed
under the Lesser General Public License.
But what if I'm emotionally attached to my database?
For many applications,
prevalence is a much faster, much cheaper and much simpler way of
preserving your objects for future generations. Of course, there will
be all sorts of excuses to hang on to "ye olde database", but at least
now there is an option.
---------------------------------------------------------------------
ABOUT THE AUTHOR
KlausWuestefeld enjoys writing good software
and helping other people do the same. He has been doing so for 17 years
now. He can be contacted at klaus@objective.com.br.
---------------------------------------------------------------------
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of
Klaus Wuestefeld.
Copyright (C) 2001 Klaus Wuestefeld.
Unmodified, verbatim copies of this text including this copyright
notice can be freely made.
Interesting but..., posted 23 Dec 2001 at 07:08 UTC by ncm »
(Master)
There are still quite a few things we really need transactions for:
- When you make the first of a series of changes to objects in the
database, you typically break one or more database invariants until you
get the last change entered. Other processes looking at the database
had better either wait, or had better see the state it had before you
started. To get much concurrency, you need to snapshot the state
before the first change.
-
If you get halfway through a series of changes and crash, the system had
better come back up without the changes you made, because you're not
going to be equipped to continue where you left off.
-
If you get halfway through a series of changes and discover some
condition that keeps you from finishing, you had better be able to
just drop the changes and pick up with the original snapshot.
-
If N processes make a series of conflicting changes concurrently,
(N-1) of them had better be told that their changes have failed,
and that they must try again.
There's a reason that databases are written by career professionals.
A simple object database can be really useful, but that doesn't make
it a substitute for the real thing. That's part of why so many
"object database" companies failed some ten years back.
Transactions, posted 23 Dec 2001 at 09:03 UTC by Pseudonym »
(Journeyer)
Actually, transactions are not so important in the external
interface of an OODBMS. In an RDBMS, a manipulation typically involves
several SQL statements (e.g. insert, update, remove) each of which can
act on only one table at a time. So if a transaction needs to
manipulate more than one table, you need to ensure that the set of
statements is atomic by issuing a transaction.
In an OODBMS, where manipulation methods can operate on more than
one class, the need is reduced somewhat. Internally, you can just
queue up the command logs until the method is complete, then write them
out together. Then the problem becomes entirely one of
synchronisation. It's not quite ACID, but it'll do for most business
applications.
You (ncm) are right, however, in that this solution, while no doubt
excellent for many purposes (e.g. if you're happy with the robustness
and performance of MySQL, you'll probably be happy with this, too),
won't scale to many critical applications. For example, it would be
quite hard to handle replication in any sane manner.
Klaus, as a matter of interest, how did you manage to get Java to
force flushing to disk?
Scalability ?, posted 23 Dec 2001 at 12:03 UTC by jneves »
(Journeyer)
Is it just me, or prevayler isn't useful in anything else but a
uniprocessor machine as is? You process requests one at a time, which
means that there can't be 2 different requests being processed at the
same time in different processors. And when you have several replicas
you have to have some coordination
between all replicas to insure the order of the requests. Or am I
missing something here ?
Thank you (Klaus Wuestefeld) for your nice write-up.
I mostly agree to the points made by the other repliers.
Let me discuss/ask some further points:
distributed systems? if a systen-prevalence-deployed
application contacts
other services resp. other servers you have a synchronization
problem. How do you handle that? I guess, that you end up doing
a 2PC-like synchronization between your prevalence servers.
fine grained tx-model versus all or nothing? with big BOs
-systems there are actually
lots of small transactions.
the system prevalence paradigma doesn't give you a
fine grained application side control, or does it?
Note though that you can adapt (extended) 2PC-transactions
to efficiently work RAM-based while retaining persistent
storage properties (by using RAM-based subtransactions and files or
RDBMS in the root transaction).
scalabity? having all "commands queued and routed through
a single
place" doesn't scale very well. consider one of these big 64 processor
multigigabyte
machines using a gigabit card: you wouldn't want all requests
to be
serialized through a single bottleneck which involves IO. With fine
grained distributed transactions
you don't need this "single place" or even a single server. I
appreciate
the "do it in background" approach, though, as an advance to
requiring requests to be queued while saving the state. It's quite
neccesary for 24/7 systems.
In my oppinion the
complexity of 2PC-systems comes from
shortcomings of the commercial products (BEA WLE, Websphere, Oracle
etc.).
They impose big clumsy quite old fashioned development schemes where
the developer is restricted and has to keep track of many conditions.
This partly stems from the pain with
underspecified and often incorrectly
implemented XA-interfaces.
(e.g. writing multithreaded programs with XA-adapters
from the main RDBMs is a desaster).
I think that system prevalence would help implementing
web applications which are located on single systems. It is a simple
enough
paradigm to be used and understood by companies which often fail or
are very slow with
2pc-transaction
systems. Handling of error conditions (pointed out by
ncm) might still be a big problem.
just my 2 (soon to be) eurocent and best wishes!
holger
THANKS A LOT for the FEEDBACK!
This is the first forum outside of my working group to actually get the
idea and give me some positive feedback.
I am just leaving on a trip right now (my wife is calling me ;) for
Christmas and will be back on wednesday. Then, I will address all
concerns: ACID properties, error-condition recovery, scalability, the
works...
Just a note on scalability and concurrency to think about over
Christmas:
Suppose we have a subscriber management system that receives a file
from a bank with 100000 (one-hundred-thousand) payment records. A
prevalent server running on a regular desktop machine can handle a
command/transaction for this in less than a millisecond and be ready
for the next command.
Merry Christmas! See you soon.
I used to be a professional SmallTalk programmer, I also was a
professional Lisp programmer. Both of those languages use the concept
of a saved memory image as part of their normal development environment.
The simplicy of "just saving the system state" is a double-edged sword.
The downside is that it is often hard to specify a particular system
state that you might want to use for testing or debugging. If you ever
get an object into a "bad state", it can be very hard to find out how
it got into that state. In contrast, the impedence mismatch between OO
systems and RDBMSs provides a natural boundary and conceptual
bottleneck for testing and debugging. It is realtively easy to compare
2 database dumps to see what is different, or to populate the database
with test data, or to see which INSERT statement introduced a
particular row into the database. You could have test data consisting
of a long set of commands, but that "algebraic" approach to testing
does not scale well, and allows defects in mutators to mask defects in
accessors.
One thing that I learned while trying to actually sell ST-80 systems to
other divisions in a large company is that IS organizations see a
standard RDBMS as an integration point. If your system uses an RDMBS,
they can plan capacity on a shared database machine: they can generate
ad-hoc reports, they can use standard tools for disk backups and such
on the database machine only. Also, in the event that your system
eventaully dies (is no longer maintained, or the license is not
extended or whatever) they will at least have the data in a format that
they can get out of your system's tables and into some other system.
Lastly, upgrades were always a pain in image-based tools. Very
incremental changes (like adding an instance variable to a class) can
be handled by the serialization system. Any reoganization beyond that
would require custom coding. In contrast, you can do small and
mid-sized reorganizations a lot easier in SQL.
I'll take the opposite tack for variety:
If you're going this far, why bother with a disk at all? Just attach a
battery to your RAM. If you want reliability, keep replicas. If a
replica is lost, "clone" another one by freezing its message queue and
copying the frozen image; the two clones can then "catch up" with the
queued messages in parallel.
Copy-on-write VM tricks may soften the need to entirely freeze a replica
during checkpointing.
I suspect the points raised in most of the comments can be fixed.
(After all, suppose we were looking the other way. Compared to modern
programming languages, databases and middleware systems have lots of
horrible misfeatures, starting with bad syntax and ending with
fundamentally broken models of (non-)encapsulation and (non-)reuse and
(non-)genericity; the complaints in the other direction seem relatively
trivial by comparison. How can any self-respecting software engineer
stand to use today's RDBMS systems without feeling dirty all over?)
jrobbins's notes are the most interesting. It's worth
noting that these are basically software engineering problems having to
do with how to maintain long-running systems, not issues with the
physical architecture proposed here. Is an RDBMS the best way to solve
those software engineering problems? It's hard to believe. Are these
problems worth solving for other domains? You betcha. I'd love to be
able to upgrade my applications without restarting them. (Thanks to
Debian, I can mostly upgrade my operating system without restarting it
-- something users of e.g. Windows may have difficulty imagining.)
Data persistence is definitely not a new idea. In fact, if I remember
correctly, persistent storage (ferrite cores) actually predate volatile
storage. I guess it somehow faded away, only to emerge recently under
the guise of persistent OSs such as EROS, persistent architectures such as
Prevayler which we now discuss, and so on.
It's hard to see how relational algebra or persistence compare with each
other. After all, relational algebra was supposed to be simple
anyway -- data are nothing more than just lots of mathematical
relations, right? We now know however that this `simple' idea is fraught
with practical problems.
Will the same happen for persistence? Maybe, or maybe not. As jrobbins mentioned, changing the `shape' of
objects is a problem, and there are probably many other problems.
I might be taking a bit of a simplistic view on the subject but couldn't
alot of the issues raised by jrobbins relating to
testing and having data in a useful format if a system is retired; be
addressed by XML serialization. If we are going to be able to serialize
all the commands and business objects why not have an option or feature
to dump this information to a XML file. Then when tracking states you
could do a dump at each command and compare the XML output to see where
things are going wrong.
XML serialization also has the advantage of being self describing rather
than in a group of tables in binary format on a database server. I mean
what happens if your RDBMS company goes bust and you can't get at the
data because of a licence timeout for example...
Obviously XML serialization will implement another overhead to the
system, but if implemented correctly you could serialize in binary
format to boost performance, and then you should you need to restore the
state for investigative/testing/export purposes load the objects through
an Object to XML parsing engine and look at the output.
Yes, you are right: in RAM a desktop machine may be able to process your
100.000 records in less than a second or something (I don't think that's
representative for anything though), but I do not think that makes the
system necessarily more consistent or bullet-proof. What happens if you
(or any of your client applications) run into a deadlock within a
millisecond? How consistent will the rest of the system and data be
without an ACID paradigm to rely on? Correct me, if I'm just not getting
the point, but I believe such an issue is not addressed in this approach.
trademarks?, posted 25 Dec 2001 at 18:45 UTC by dalke »
(Journeyer)
Minor point, but '"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are
trademarks of Klaus Wuestefeld'? I'm curious on trademarking a few
things of my own, so I checked the USPTO. Neither marks are listed.
Given the email address of ".br", are they only trademarked in Brazil?
I agree that the Prevayler implementation, as it is today, is robust,
fast and scalable enough for most applications.
In the company where I work there are 7 people working on two projects
using Prevayler to be released in January. I am also glad to help any
other early Prevayler adopters.
I would like to share some thoughts, though, on the use of
prevalence
"in the large" to make sure that we are not missing out on some very
interesting possibilities.
First, I will give a few very quick, specific and UNJUSTIFIED answers,
and then, in a separate comment, I will give a more complete
explanation in an attempt to clarify all concerns so far...
There are still quite a few things we really need transactions
for: --
ncm
I apologize. Prevayler does have transactions.
Although a prevalent system can define transactions (commands) and
provide them for a client to use, there is NO TRANSACTION
SCHEME the client can use to arbitrarily define new TYPES of
transactions (new atomic sets of business operations) whenever it
fancies. The last thing we need is another transaction scheme allowing
clients to bring business logic into their own hands.
I realize the article is confusing in this respect. I have corrected the "oficial" version of the article to make this clear.
When you make the first of a series of changes to objects in the
database, you typically break one or more database invariants until you
get the last change entered. Other processes looking at the database
had better either wait, or had better see the state it had before you
started.
Yes. In the prevalence scheme, the other processes shall wait.
To get much concurrency, you need to snapshot the state before
the first change.
Hmmm. What if the waiting time for each transaction is only a few
microseconds? (I shall explain...)
If you get halfway through a series of changes and crash, the
system had better come back up without the changes you made, because
you're not going to be equipped to continue where you left off.
Yes. The article already covers this well, though. Are there any doubts?
If you get halfway through a series of changes and discover some
condition that keeps you from finishing, you had better be able to just
drop the changes and pick up with the original snapshot.
"You" (the system server, I presume) will never be halfway through a
series of changes and discover some condition that keeps "you" from
finishing. (I shall explain...)
If N processes make a series of conflicting changes concurrently,
(N-1) of them had better be told that their changes have failed, and
that they must try again.
There are no concurrent changes in a prevalent scheme. All changes are
sequenced.
There's a reason that databases are written by career
professionals.
Yes. Databases are way too complex. ;)
A simple object database can be really useful, but that doesn't
make it a substitute for the real thing. That's part of why so many
"object database" companies failed some ten years back.
Prevalence is a persistence scheme, and, like OODBMSs, Prevayler will
guarantee a logically crash-free object space for your business
objects. Prevayler is not an object database manager, as I see it,
though. It does not provide any sort of language for data storage or
retrieval (ODBMSs normally provide some OQLish thing). Database
managers are also worried, among other things, about how they will
store chunks of data from RAM to disk and how they will retrieve those
chunks later. When you have enough RAM for all your system data, you
need no longer worry about that.
When you have enough RAM (the prevalence hypothesis) and a crash-free
object space, many database career professionals' assumptions no longer
hold.
Interesting but ... one has to free one's mind. New possibilities are
waiting.
(e.g. if you're happy with the robustness and performance of
MySQL, you'll probably be happy with this, too)
Of course you will be happy! Prevayler is much more robust* and much
faster** than MySQL. ;)
* Robustness, as I understand it, is related to failure. The less
failures something presents, the more robust it is - as simple as that.
Prevayler's robustness is bounded by the robustness of the VM and its
serialization algorithm. Prevayler is so simple (564 lines including
comments, javadoc and blank lines) you could probably write a formal
proof for it.
** I have tried both but please don't take my word. Try them out too.
"Since Prevayler is also simpler to use, what is the advantage of
MySQL?"
Some people like SQL and the relational model. MySQL is a relational
database manager with an SQL interface. Prevayler is not.
Klaus, as a matter of interest, how did you manage to get Java to
force flushing to disk?
FileOutputStream.getFD().sync()
Thank you (Klaus Wuestefeld) for your nice write-up.
You are welcome.
Let me discuss/ask some further points:
distributed systems? if a systen-prevalence-deployed application
contacts other services resp. other servers you have a synchronization
problem. How do you handle that? I guess, that you end up doing a 2PC-
like synchronization between your prevalence servers.
I didn't understand the question very well.
Fine grained tx-model versus all or nothing? with big BOs -
systems there are actually lots of small transactions. the system
prevalence paradigma doesn't give you a fine grained application side
control, or does it?
No it doesn't. I believe that to be inefficient and unnecessary. Maybe
we could discuss an example where you think it might be necessary.
Note though that you can adapt (extended) 2PC-transactions to
efficiently work RAM-based while retaining persistent storage
properties (by using RAM-based subtransactions and files or RDBMS in
the root transaction).
Yes. I know. Three years ago, I wrote an object-relational persistence
layer for Java that had nested transactions in RAM and an optional*
RDBMS in the root transaction.
* You could run everything in RAM if you wanted. That was good for
presentations, developing without database configuration hassle and
running test scripts very fast.
Scalabity? having all "commands queued and routed through a
single place" doesn't scale very well. We should better consider one of
these big 64 processor multigigabyte machines using a gigabit card: you
wouldn't want all requests to be serialized through a single bottleneck
which involves IO.
Make sure you let the people using ORACLE (and its redo log files) know
about that. ;)
With fine grained distributed transactions you don't need this
"single place" or even a single server.
Sounds interesting. Could you elaborate and give an example?
I appreciate the "do it in background" approach, though, as an
advance to requiring requests to be queued while saving the state. It's
quite neccesary for 24/7 systems.
Was it clear to you, from the article, that your prevalent system DOES
NOT have to stop in order to save its state?
I used to be a professional SmallTalk programmer, ...
Me too, for 5 years. :)
The simplicy of "just saving the system state" is a double-edged
sword. The downside is that it is often hard to specify a particular
system state that you might want to use for testing or debugging. If
you ever get an object into a "bad state", it can be very hard to find
out how it got into that state.
In the prevalent scheme, with some daily system snapshots, you can
retrieve the system's state before it "got bad"; and with the command
logs you can actually replay your commands one-by-one until you get to
the rotten one. Of course, I am supposing you have a decent "object
encapsulation breaker" FOR DEBUGGING PURPOSES ONLY.
I know there aren't many of those around (compared to SQL-based tools)
but that is more of a cultural problem, I believe. As you say, people
are used to rows and columns. They like to break their systems'
encapsulation with SQL tools and, at the same time, they like to
complain: "Where are all the benefits object orientation has promised
us?". ;)
What can you do? I expect things like Prevayler to gradually break this
vicious circle.
Lastly, upgrades were always a pain in image-based tools. Very
incremental changes (like adding an instance variable to a class) can
be handled by the serialization system. Any reoganization beyond that
would require custom coding. In contrast, you can do small and mid-
sized reorganizations a lot easier in SQL.
Me and my team would always do our migrations in Smalltalk (I wrote an
object-relational persistence layer for Smalltalk 6 years ago). We
would only use SQL or PL as a last resort and for performance reasons.
With all your objects in RAM, that is a different story... ;)
You can certainly go for RAM all the way and have several replicas, if
you can afford it.
I could not agree more with egnor.
Just a comment on the "Copy-on-write VM tricks" to "soften the need to
entirely freeze a replica during checkpointing.":
It is a bit complicated dealing with executing threads because your
memory might never be in a consistent state at any given moment in
time. The orthogonal persistence guys (like the guys mentioned in the
article) have not figured how to solve this problem.
With prevalence, the problem simply doesn't exist.
There is a colleague of mine fiddling with several XML-serialization
libraries because he wants to include that in Prevayler.
The point about speed is that, if every transaction is extremely fast,
you do not have to handle concurrent transactions. That makes life MUCH
easier. I am not only talking about sheer RAM processing speed
increase, mind you. I am talking about a design change. I shall explain
it in one of the following comments.
The ACID properties do remain.
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of Klaus
Wuestefeld in the same way that "Linux" is a trademark of Linus
Torvalds.
They are not REGISTERED trademarks though. Much like a copyright, you
do not have to register it to be entitled to a trademark.
Of course, the suits will always tell you that it is better to register.
How fast does serialization run on your machine?
import java.io.*;
public class SerializationThroughput {
static public void main(String[] args) {
try {
FileOutputStream fos = new FileOutputStream(new
File
("tmp.tmp"));
ObjectOutputStream oos = new ObjectOutputStream(fos);
Thread.sleep(5000); //Wait for any disk activity to stop.
long t0 = System.currentTimeMillis();
int max = 10000;
int i = 0;
while (i++ < max) {
oos.writeObject(new Integer(i));
oos.reset();
oos.flush();
fos.getFD().sync(); //Forces flushing to disk. :)
}
System.out.println("This machine can serialize " + max *
1000 / (System.currentTimeMillis() - t0) + " Integers per second.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
My 450MHz K6II running windows98 with a 3 year old IDE hard drive gives
me the following result:
"This machine can serialize 576 Integers per second."
Does anyone give me more? :)
OK, here we go:
I shall leave automatic load-balancing aside for now and concentrate on
the concerns we already have.
Atomicity and Crash-Recovery
This is already covered in the article.
Consistency and Error-Conditions
Every command is executed on its own. The business system must either
check for inconsistencies before it starts executing any command or be
able to undo whatever changes were done if it runs into an
inconsistency. In my designs I prefer the first approach. The demo
application included with Prevayler has good
examples.
Isolation
While a client is preparing a command to be executed, no other client
can see what that command is all about.
Durability
The snapshots and command logs guarantee your persistence. If you use
replicas, as described in the article, your system shall not only
persist, it shall prevail.
Scalability and Performance
Suppose we have a multi-threaded system in which all threads do all of
the three following things:
1) Client stuff - Waiting for an HTTP request; Waiting for an
RMI request; Reading a file; Preparing a command to be executed;
Writing a file; Generating HTML; Painting a GUI screen; etc...
2) Prevayler stuff - Logging a command to a file. (This is the
only thing Prevayler does on the hot system during execution. The
snapshot is taken by the replica and has no impact here.)
3) Business stuff - Processing a command; Evaluating a query.
For simplicity, Prevayler's implementation, today, will synchronize
"Logging a command" and "Processing a command" in a single go. That is
not necessary though. The only conditions we have to meet are:
- All commands are logged.
- All commands are executed after they are logged.
- All commands are executed in the same order as they are logged.
Using two producer-consumer queues would already alleviate that a
little. The main problems, though, are still:
- It might take a long time to serialize certain large commands and
Prevayler doesn't serialize and log more than one command at a time.
- The business system cannot process more than one command at a time.
The first problem is easy to solve. 4096 (or more) "slave" log files
could be used to serialize and log up to 4096 (or more) SIMULTANEOUS
COMMANDS. There must only be a "master" log file indicating in
which "slave" log file the last command was serialized (it is not even
necessary that the first command that started being logged be the first
one to finish). In terms of scalability and throughput, this is as much
as you can get even in an RDBMS like ORACLE because of its redo log
files.
Take a look at the "Serialization Throughput Test" above, to see how
well your machine would do as a "master logger". :)
All these performance enhancements are already scheduled for future
Prevayler releases. If anyone is considering using Prevayler on a
project for a system that actually needs them already, I will be glad
to implement them sooner (or integrate someone else's implementation)
and help out on the project design.
All other thread activities, including query evaluation, mind you, can
already be processed in parallel. So, you can have as many
processors as your VM, OS and hardware will support.
On to the second problem: "The business system cannot process more than
one command at a time.".
To overcome that, then, we will establish a simple rule: "The business
system cannot take more than a few MICROSECONDS to run any
single command."
"Oh no! I knew it! This guy is crazy!", some might think,
"How can I possibly process 100000 payment records in only a few
microseconds?".
For 99% of your commands, like changing a person's name, you check for
inconsistencies (invalid name, duplicate name, etc), and then you just
execute it normally. With your objects in RAM, that will only take a
few microseconds anyway.
For 1% of your commands (the hairy ones), like processing a batch
payment with 100000 payments, lazy evaluation is the key: your
system simply doesn't process the command. Instead, it just keeps
the command in the "batch payments" list for future evaluation.
The command will be processed bit-by-bit whenever a query is evaluated
regarding that command. It is important to note that, while the client
is building the command, the command is internally preparing its
structure to be kept in the system without further processing. Remeber:
a prevalent command is much more than an atomic set of operations. It
is a full-fledged object and can be responsible for much of the
system's business intelligence! The batch payment command, for
example, would keep all payment records internally in a HashMap with
contract id as the key.
Suppose you then query the payment status of any given contract. The
contract will see "When was the last time I updated my payment
status?". It will then look at the "batch payments" list (there are two
or three batch payments a month): "Were there any batch payments since
my last update?". If there were, the contract updates itself
accordingly (one HashMap lookup per batch). Then, the contract simply
returns its payment status. This all takes only a few microsecond too.
You could have a query, though, that actually depends on the processing
of ALL the payments (e.g. "Total Monthly Revenue"). In this case, the
query AND ONLY THIS QUERY will take about 2 seconds* to execute. All
the rest of the system continues working at full speed and with full
availability.
*Today, my company has an ORACLE based billing system running on big
solaris boxes that takes 62.5 machine hours to process 100000
payment records. We estimate that doing it all in RAM would take no
more than 2 seconds (on my desktop machine, mind you).
Are there any more doubts or are all your systems already prevalent? ;)
Re: Trademarks, posted 26 Dec 2001 at 20:14 UTC by dalke »
(Journeyer)
They are not REGISTERED trademarks though. Much like a copyright, you
do not have to register it to be entitled to a trademark.
Ahh, thank you. The USPTO link for that is:
http://www.uspto.gov/web/offices/tac/tmfaq.htm#Basic001.
Do I need to register my trademark? No..
Also,
What are the benefits of federal trademark registration?
- Constructive notice nationwide of the trademark owner's claim.
- Evidence of ownership of the trademark.
- Jurisdiction of federal courts may be invoked.
- Registration can be used as a basis for obtaining registration in
foreign countries.
- Registration may be filed with U.S. Customs Service to prevent
importation of infringing foreign goods.
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of
Klaus Wuestefeld in the same way that "Linux" is a trademark of Linus
Torvalds.
Umm, except that Linus owns the registered trademark
on Linux, serial number 74560867 at uspto.gov. There was a big
hoorah about this some five years ago when someone other than Linus
registered the term for himself. Some of the links about the
topic are mentioned at
http://www.linux10.org/history/
.
Of course, the suits will always tell you that it is better to
register.
Most "suits" would say that if you have the $325/10 years and
don't want to go through the hassle of defending your mark if
your work becomes popular, then it's worth it.
I'm using prevayler at a beta system I'm developing, and I think the
main problem when you expose this kind of system is that you don't have
studies saying it's right or not.
Of course a lot of people thought about this before Klaus, but anyone
really made a serious study about what are the more commom actions
(procedures) perfomed for each category of application?.
What is the best application category for prevayler?.
Anybody knows what is the REAL consystency of the systems at the
market?.
Don't you think inconsystency at 99% of the cases are just result of
bad code at the top layer? Can't we just make a fault-tolerant system
and keep the system working, no matter how bad coder is the guy?.
New java implementation (1.3 and 1.4) has news classes that allows high
speed messaging pipes between applications. Can you imagine a better
use to these pipes?.
I agree that XML serialization is a good thing, mainly for debugging
purposes and it's atomicity, but how can you compress it? And if you
compress, why keep it as XML?.
I think that just a better serialization scheme should do the trick,
with compression, cryptography, and a hierarquical system that could
allow easily XML translation. Externalize methods do the job. Any
volunteer?.
One easy question. Is it a framework? Is there a planned plugin
structure? Everything will be done through interfaces? No register
classes or similar approaches?.
[]s, gandhi.
One easy question. Is it a framework?
Not at present.
Is there a planned plugin structure?
No. Can there be a plugin structure in the future? Yes.
There is no design trait in Prevayler based on predictions for the
future. Prevayler's design, at any point in time, will be the simplest design that we
can achieve and that satisfies all CURRENT requirements. The goal is
anticlimactic simplicity.
Don't worry. Thanks to simplicity, the day you write the first plug-in
for Prevayler, we will easily find a way to "plug it in". The day you
write your third Prevayler plug-in, there will certainly be a "plug-in
structure" in place.
That is the beauty of open-source and that is the beauty of simple
design.
Anyone interested in knowing more about prevalence or in further
discussing the subject (but not necessarily having Advogato
certification) take a look at the Prevayler Forum.
See you there, Klaus.
Askemos
has a simillar take on persistense.
Just not "all in memory"
but "allways saved to file"
- after
each transaction
in any of your objects.
I generalized the throughput test to write records of various size. For small records the time is dominated by the
flush; for large ones, transfer time. I found the knee of this classic curve to be at about 300 Integers (3k bytes)
on a Windows platform and 100 Integers on a Linux. All but one machine I tested showed other behaviour that I
cannot explain. I've written a short note with graphs
and the revised test source code.
I designed and implemented a RAM-based, transactional database in
Java years ago for Ganymede, and I can
attest that keeping everything in memory works splendidly. Add a
transaction log for recovery, and you're cooking with gas.
At least, that is, for reasonably small datasets. The big open
question for Ganymede, and for any memory-resident Java database
systems, is how big a cost does Garbage Collection become when you scale
up? Using the operating system's native VM subsystem to handle disk
paging works fine, but when the Garbage Collector has to sweep through
everything periodically in order to clean up garbage, that sweep has
presumably to do a good bit of paging to take care of things.
Do you have any insight into how serious a problem this is? Ganymede
works fantastically well for us at the scale we need it to, but I've
always imagined (but not tested) that putting a gigabyte of directory
data into it would probably not work so terribly well.
I ran a few tests creating huge arrays of Integers and serializing them
to stress the limits of some VMs. Everytime we increased the size of
the array to a point where the system started paging, we simply had to
abort the test after a few hours because we couldn't stand waiting any
longer. 55 million was the max we reached without paging, running on an
HP-UX machine (Thanks to the guys at HP/PortoAlegre/Brazil).
The prevalence hypothesis, though, is that you have enough RAM for all
your data so, even when the garbage collector kicks in, your system
shouldn't have to page to disk.
Even if you have enough RAM, the garbage collector can be a nuisance in
many large systems and a real show-stopper for time-sensitive critical
systems. I am not an expert but it seems that most VMs use a mix of
generational garbage collection and traditional mark-and-sweep. I
really would like to see some three-colouring going on anytime soon (if
you know of anything about this please post here).
A very popular VM's heap size won't even reach 1GB. (It will allow you
to set the parameter but will shamelessly ignore it if it is above a
certain limit). It seems that VMs like that one are targeted only at
feeble client code.
I believe that projects using Prevayler will
actually raise the bar for VM robustness, heap size and garbage
collection performance.
Nine years have passed since the comment above :) and we have seen
great GC improvements.
The new Java G1 garbage collector, for example, allows you to choose to
minimize two of these three things:
--- Use of RAM
--- CPU consumption
--- GC Pause Times
So you can configure your GC to produce pauses of 100 millis max, for
example. That was previously an issue with very large heap sizes.