19 Sep 2000 PaulWay   » (Apprentice)

This diary entry is sort of summarising all the thoughts I've had about a new file system methodology. I think it's new because I've never heard of anything like it; I think it has a lot of good features that both make things easier for programs, users and even the OS itself; I think it makes the best use of a fully multi-tasking OS without causing too much hassle on non-multi-tasking systems (i.e. this functionality is optional); and in the days where someone can actually write the code to do this and not be employed by a huge powerful corporation (and also not have to justify its development and testing on anything but eir own basis) it's time to get it out of my mental closet.

The essential difference between this FS and all others is that this treats a disk as a single contiguous array of sequential bytes. No sectors. Each object stored on the disk, be it a file or a directory or something else, is stored as a series of chunks, where each chunk starts at a point on the disk and consists of a contiguous stream of bytes from there. All these numbers are 64-bit, to cope with modern storage requirements. Disks are numbered starting from byte 0 and working up, in some sort of established order through the sectors, cylinders and heads, to provide a linear contiguous byte array up to the last byte on the disk.

Let's start by looking at how a file is arranged. The file entry in the directory contains the file name, date, total size, and the usual attributes. Then follows a list of the chunks in a file, consisting of ([Start Byte],[Offset on disk],[Length]) tuples. [Start Byte] is the number of the first byte in this chunk, [Offset on disk] is the physical disk location of the start of this chunk, and [Length] is obviously the length of this chunk. The total length should always add up to the total length of the file.

When writing a file, ideally one simply finds a chunk of free space and writes the entire file contiguously into it. If, for some reason, you're writing a file that is larger than the largest single chunk of free space, you just write multiple chunks, using up each chunk of space before moving on to the next. I haven't done much research into which allocation methodology is best - best fit, worst fit, random fit, whatever - but that can be tested and possibly even configured at time of use. Reading a file is fairly obvious.

The immediate advantages are that there's no wasted space at the end of the last sector in a file. Small files take up exactly their amount of space, no more. Files are mostly contiguous and fragmentation is kept to a minimum by virtue of the fact that it's relatively easy to find and use a contiguous block of storage. We waste no space in large FATs or inode lists (where most of those are listing sequential sectors anyway).

Some things we can do with this that are more difficult in other FS methodologies:

  • We can write sparse files (e.g. a 100MB file with only the last byte actually storing any data) simply by pointing the Start Byte at the location of the data we want to store.
  • We can insert an arbitrarily large amount of data anywhere in the file by writing that chunk and then manipulating the chunk list appropriately. For instance, we can write data into the start of a file by writing that chunk and then adding the new offset to the Start Bytes of all the other chunks in sequence.
    These two concepts would lead to a radically new and easier way to write and read all sorts of files that have had to done in roundabout ways - such as writing to the start or middle of a file without having to rewrite the entire file.
  • There are multiple ways to defragment a file. For instance, we can examine the amount of free space immediately after each chunk, and if that free space can take the next chunk we join the two chunks and shorten the chunk list.
  • We can even be defragmenting a file while it's in use, either by only defragmenting the chunks of a file that aren't in use at the time or by making sure that the new file is written out as a new, single, contiguous chunk.
  • Supporting compressed files is much easier because we break the file up into arbitrarily large segments and compress each as a chunk. When we want to rewrite part of a file we can write that chunk afresh and re-offsetting the existing chunks. We don't have to touch any chunk other than the one which is modified.
  • I'd like to build hooks into the operating system that allows it to gradually defragment files in quiet time.
I'm still thinking on the workings of directories, but the basic plan here is that there should be little or no need to greatly expand a file's number of chunks in one go. My preference is to have each directory entry given a 16-bit length and then allocate a reasonable amount of extra space for future expansions of the chunk list. But, for instance, Read Only files could have their directory entries contracted down to minimum space because we <U>know</U> there's going to be no other blocks inserted into the file (at this point in time, anyway) and thus no extra chunks added.

I might leave it there for one day and see what comment (if any) that invites.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!