Older blog entries for PaulWay (starting at number 2)

Further thoughts on the Linear FS:

It could easily be emulated in another OS by the creation of one large file which housed the FS structure. While this file may not be contiguous in its native OS, it would at least be a way to experiment with the FS without having to use a partition editor when you wanted to examine the LinFS structure workings (during debugging, eg).

Taking inspiration from the Unix FS method of storing the first ten inodes in a table within the directory structure itself, for really small files it might even be possible to store the contents of the file in the directory for the ultimate in zero redirection. Of course, you wouldn't want them to be very big, or changing very often, but it's an idea. See what happens.

Life goes on. Went to Netizen's disintegration party - felt like, well, a straight boy at a goth club, really. You talk the same language, but there's this undertone of "you're probably really shocked that I've got piercings and weird sexual habits, eh?". Which I'm not, of course :-) But mostly just the feeling of being someone who's amongst good friends who can't be bothered meeting new people when they're having fun with the old people. Hey, I've been in the other position, so I don't blame them at all :-)

Ho hum. Job search goes on.

20 Sep 2000 (updated 21 Sep 2000 at 00:33 UTC) »

Life goes on. If anyone working in Melbourne (Australya :-) has a position for an experienced Windows NT (yeah, I know) admin with SQL and internet experience, I'd appreciate it. paulway@nospam.earthling.net is my address - minus my request for no spam, of course.

Any advogates out there who own an empeg car player?

A lot of my thoughts on file systems come from an underlying idea that a lot of the relatively common things we want to do with computers are actually quite tricky - writing to any part of a file being a prime example - simply because we're still in the habit of thinking in terms of 20 year old operating systems.

I'm not suggesting removing all paradigms that remind us of the physical representation of our objects. I personally think the idea of having memory and disk as one single space makes about as much sense as trying to go down to the chemist in a 747 or trying to get to another continent by scooter. But we can start looking at the way we use our computing resources and see if we can make it work better for us. I think the use of a memory cache that exists in the 'free memory' and scales dynamically as memory is required and freed for other applications is an excellent example. The overhead of managing this is now less than the performance loss it requires, and the benefits are larger than the fixed-size cache implementation.

I can't think of any other good examples of computing operation paradigms that need shifting, though. :-)

This diary entry is sort of summarising all the thoughts I've had about a new file system methodology. I think it's new because I've never heard of anything like it; I think it has a lot of good features that both make things easier for programs, users and even the OS itself; I think it makes the best use of a fully multi-tasking OS without causing too much hassle on non-multi-tasking systems (i.e. this functionality is optional); and in the days where someone can actually write the code to do this and not be employed by a huge powerful corporation (and also not have to justify its development and testing on anything but eir own basis) it's time to get it out of my mental closet.

The essential difference between this FS and all others is that this treats a disk as a single contiguous array of sequential bytes. No sectors. Each object stored on the disk, be it a file or a directory or something else, is stored as a series of chunks, where each chunk starts at a point on the disk and consists of a contiguous stream of bytes from there. All these numbers are 64-bit, to cope with modern storage requirements. Disks are numbered starting from byte 0 and working up, in some sort of established order through the sectors, cylinders and heads, to provide a linear contiguous byte array up to the last byte on the disk.

Let's start by looking at how a file is arranged. The file entry in the directory contains the file name, date, total size, and the usual attributes. Then follows a list of the chunks in a file, consisting of ([Start Byte],[Offset on disk],[Length]) tuples. [Start Byte] is the number of the first byte in this chunk, [Offset on disk] is the physical disk location of the start of this chunk, and [Length] is obviously the length of this chunk. The total length should always add up to the total length of the file.

When writing a file, ideally one simply finds a chunk of free space and writes the entire file contiguously into it. If, for some reason, you're writing a file that is larger than the largest single chunk of free space, you just write multiple chunks, using up each chunk of space before moving on to the next. I haven't done much research into which allocation methodology is best - best fit, worst fit, random fit, whatever - but that can be tested and possibly even configured at time of use. Reading a file is fairly obvious.

The immediate advantages are that there's no wasted space at the end of the last sector in a file. Small files take up exactly their amount of space, no more. Files are mostly contiguous and fragmentation is kept to a minimum by virtue of the fact that it's relatively easy to find and use a contiguous block of storage. We waste no space in large FATs or inode lists (where most of those are listing sequential sectors anyway).

Some things we can do with this that are more difficult in other FS methodologies:

  • We can write sparse files (e.g. a 100MB file with only the last byte actually storing any data) simply by pointing the Start Byte at the location of the data we want to store.
  • We can insert an arbitrarily large amount of data anywhere in the file by writing that chunk and then manipulating the chunk list appropriately. For instance, we can write data into the start of a file by writing that chunk and then adding the new offset to the Start Bytes of all the other chunks in sequence.
    These two concepts would lead to a radically new and easier way to write and read all sorts of files that have had to done in roundabout ways - such as writing to the start or middle of a file without having to rewrite the entire file.
  • There are multiple ways to defragment a file. For instance, we can examine the amount of free space immediately after each chunk, and if that free space can take the next chunk we join the two chunks and shorten the chunk list.
  • We can even be defragmenting a file while it's in use, either by only defragmenting the chunks of a file that aren't in use at the time or by making sure that the new file is written out as a new, single, contiguous chunk.
  • Supporting compressed files is much easier because we break the file up into arbitrarily large segments and compress each as a chunk. When we want to rewrite part of a file we can write that chunk afresh and re-offsetting the existing chunks. We don't have to touch any chunk other than the one which is modified.
  • I'd like to build hooks into the operating system that allows it to gradually defragment files in quiet time.
I'm still thinking on the workings of directories, but the basic plan here is that there should be little or no need to greatly expand a file's number of chunks in one go. My preference is to have each directory entry given a 16-bit length and then allocate a reasonable amount of extra space for future expansions of the chunk list. But, for instance, Read Only files could have their directory entries contracted down to minimum space because we <U>know</U> there's going to be no other blocks inserted into the file (at this point in time, anyway) and thus no extra chunks added.

I might leave it there for one day and see what comment (if any) that invites.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!