Are files obsolete?

Posted 26 Aug 2000 at 04:30 UTC by dbryson Share This

Shouldn't files (and perhaps 'file systems', for that matter) be sent to the scrap heap (bone yard, or whatever) like other archaic computational devices and methodologies such as vacuum tubes, core memory, batch processing, and punched cards?

Ever since I learnt the concept of "virtual memory", some 15 years ago, I have questioned why files (and file systems) continue to exist. Are they really needed anymore? Are they really useful? Can't we do better?

More recently, I was directed to an article, The Anti-Mac Interface , which discusses, among other things, problems with using metaphors like the "folder", the "desktop", the "trashcan", etc. This just got me to thinking about files again. In my experience, new users get nothing from trying to think of saving documents (or whatever) in files nor in storing them in "folders". They can save their document, giving it a name and selecting a folder and still not be able to find it again (without help). In fact, I think it does nothing even for those who understand the whole file/folder thing. They understand how it works completely separate from how real/physical "files" are stored in real/physical "folders" in real/physical "file cabinets". This whole system might have been useful when computers were first used, but not today.

Why not dump the whole idea of files and file systems and instead treat a computer's hard drive space as "memory" and its actual RAM as cache for this memory? Why not just use virtual memory to treat the entire hard drive as "memory"? Why not just skip the steps of translating documents to/from files and just store them in "memory"? There is no need to make sure we "save" our work when exiting from a program, it will just be in "memory" until we delete it. This would make it drastically simpler for programmers (and OS designer's, for that matter) not having to worry about files, locking, etc. Just leave the documents in the linked-list or tree structure (or whatever) used by the program internally and probably attach them to some sort of global document database so that they can be easily found later.

My current design for such a system uses a somewhat Java-like object system implemented as a stack-based virtual machine with an address space capable of encompassing not only huge hard drives, but the entire internet (the internet as virtual memory, anyone?).

I wonder what others think of this and how it might be implemented.


Most definitely, posted 26 Aug 2000 at 05:02 UTC by fatjim » (Journeyer)

"Files" as they exist right now are indeed a very old concept. A number of projects and researchings have worked on replacements. Your idea sounds like a very low-level persistent object system, so it is a lot like NewtonScript, Self, (my very meager understanding of) smalltalk, and someothers.

UI and HCI people have been working to replace or get rid of files for a long time. I've seen researches trying to apply concepts from real life to get good filing systems - things like spacial data storage (VR, because your mind is wired for physical location memory) and a very neat "time knob" which adjusts your desktop back in time to see what you were working on five minutes ago or five months ago. My father has been using computers for 10 or so years at work and still doesn't understand directories/folders.

Your project sounds interesting.. have you begun implementing it?

I don't get it., posted 26 Aug 2000 at 18:50 UTC by Acapnotic » (Master)

I'm failing to appreciate the differente between "document" and "file" in your discussion. To me, the phrase "attatch [the document] to some sort of global document database" is indistinguishable from "move the file to the filesystem".

I do see that you would like a different interface to the filesystem (as fatjim says, a persistant object system) as opposed to the current read/write/seek, but that doesn't keep "document" and "file" from being synonyms in my mind.

(Although there are cases -- such as your average web page -- where one document is comprised of many files, and cases where one file may contain many documents, but I don't think that's at all relevant to this article.)

FIle Systems, posted 26 Aug 2000 at 19:55 UTC by nymia » (Master)

Here are some of my random thoughts worth two cents.

I don't think throwing out the file system would be the right move since file systems are a basic building block for operating systems.

From what I see, what you are are after is a layer on top of it where there are facilities that abstract the file system, like namespaces and SQL. Along that line, it is possible to build a database like file system that supports transactions by default. With that, a file loaded in main memory can be written onto a rollback segment located in secondary disk storage, making it possible to protect files from system crashes.

When you finally have a namespace+SQL storage system, you can then remove the navigation facilites (like cd, mkdir, etc) out of the shell. However, removing the navigation facilities would be a bad idea because in the database world, set operations like SQL also depend on navigation operations like movefirst, movenext, etc.

Persistent object networks, posted 26 Aug 2000 at 20:31 UTC by Radagast » (Journeyer)

I've been working on a followup to the article I wrote about anti-mac tendencies in free software, specifically about the use of pseudonatural languages in interfaces. While researching this, it's becoming more and more obvious to me that the traditional hierarchical filesystem is a pretty deficient model for a rich interface. Or rather, it's a model pretty well suited to the current interfaces, and vice versa. One has to wonder if these two models didn't evolve in parallel and reinforce each other. The effect of relative fitness and mutually reinforcing metaphors in user interface design is interesting, because it can lead to systems that are internally very elegant and where the interface maps very well to the underlying structure, but are totally opaque and non-obvious to the users.

Anyway, it seems that an abstract and persistent object network is a better underlying data model for a future interface with very high information density. How it maps to the physical storage isn't terribly interesting, really, your idea of mmap()ing the whole disk into the processor addressing space might be one way of doing it, but I'm sure you don't even what to deal with it at that level. Better to have a kernel or similiarly low-level facility that takes care of storage, caching, and so on for you, in which you can create and manipulate objects much like you create and manipulate files today.

Such a storage model would be non-hierarchical, a large space of objects with a multitude of links between them (links would also have attributes, to the point of being a special class of objects in themselves, so they can describe the relationship between the other objects.) It'd be useful to be able to have objects that point to network resources, but are otherwise similar to local objects. Objects should also have rich metadata facilities.

These sorts of models are complex to code on the low level, because graphs are much harder to map efficiently to linear storage than hierarchical models, and much harder to search, for instance. However, with a good low-level persistent object storage, it wouldn't be terribly hard to implement a very rich data model, and a subsequently very rich interface, on top of it.

I'm going to post my article about pseudo-natural language in a few days, hopefully. I might probe slightly deeper into this area as well in that article.

file and fayl, posted 26 Aug 2000 at 20:40 UTC by Zaitcev » (Master)

    Just leave the documents in the linked-list or tree structure (or whatever) used by the program internally and probably attach them to some sort of global document database so that they can be easily found later.

Somehow I get a feeling that this proposition was not very well though out.

Also, I think that a high profeciency in English makes the author to miss the point. For example, when I started to work with computers, the word "fayl" was recently introduced in my native language. It was just a meaningless bunch of sounds for me. I bugged people asking what "fayl" is and they answered "computers store data in fayls" or "fayl is just a bunch of records" or "fayl is everything that has a common name".

A file is a useful concept in computing regardless of its connection with a folder-like "file" that is familiar to older generation Americans. Try to rename it mentally into "plof".

Relevant links, posted 26 Aug 2000 at 20:41 UTC by Sunir » (Journeyer)

Here are some pages you may be interested in:

  • Eros OS -- All data are objects in RAM; all memory is "checkpointed" (i.e. written) to disk every five minutes.
  • Lifestreams -- "A lifestream is a time-ordered stream of documents that functions as a diary of your electronic life."
  • SqueakNOS -- Making the Squeak Smalltalk environment the actual operating system.

a filesystem is a poor mans db, posted 26 Aug 2000 at 21:49 UTC by jmg » (Master)

We have a filesystem which is simply a database. What you really are after is the ability to reduce all the componants of a operating system into a single shell. Where you don't have all the support files, and that all you have is one large file.

In some ways this already exists. Look at Word on the Machintosh. It is entirely self sufficent. You can copy Wor'd main executable to another machine, and run it. It will then install all the necessary support items in the local computers database so that it can easily access them (and the user can modify them).

What needs to happen is more the hiding of the archaic UI from the user. They should have multiple interfaces to view the available programs (be it a shell, or a GUI), and of course a seperate name space for their own files (like Unix's home directories). There is nothing new in trying to get rid of the file system, but we will end up replacing with something else that is just like a file system.

What needs to happen is smarter programs be developed that are allowed to "save" state so you can come back without anything else. Something similar to having it dump a core file, and then be resumed from that same core file. Before you can do this, you need to teach the operating system about persistant objects. This will be difficult, because you will need to learn how to resume a program to free a lock on a file or similar stuff. Then you enter into resource dead locks while this happens. A new OS that allows this would definately be an interesting project.

Some clarification (I hope), posted 26 Aug 2000 at 22:34 UTC by dbryson » (Apprentice)

Since several people (within just a few replies) have expressed confusion as to what exactly I am talking about, I hope to explain better here. In my original posting I wanted to be vague so as not to lead your thinking too much.

First off, my use of "document" vs. "file". A "document" is whatever is being created or edited with a program (like a letter, book, picture, 3D model, recipe, etc.) and a "file" is the physical storage of this document in a file in the file system of the computer. It is hard for those of us (all of us here, I would say) who have been working with files and file systems as long as we have been working with computers, to separate the two; but, there is a difference.

What I was trying to do is get people thinking about an OS/computer without files and file systems. How would you implement it and how would it work?

In my imagination I see a system that uses virtual memory to utilize hard disk space (and other machines on a network) as memory and doesn't have any "secondary" storage, per say. It just has memory. If the computer has a 10GB hard drive, it has 10GB of memory. Just take a moment and think about what it would be like to program without files. As some have pointed out this is similar to Smalltalk and some Lisp and Forth systems, although they are very limited in the amount of memory they can use and they still use files.

What do you imagine?

Object spaces, posted 26 Aug 2000 at 23:30 UTC by tetron » (Journeyer)

There is a basic problem with files, one which taken from a historical perspective is quite obvious, but is the source of many of the current problems of basic data interface we have today: files are noninteractive. Let me repeat this. FILES ARE NONINTERACTIVE. You can open, read and write to a file. That's it. There is no facility supporting the concept of rich interaction with files, they are simply passive data. Objects, by contrast, encapsulate execution units with data. The classic example is an image object - instead of calling some other function to draw it, you simply tell the image to "display itself." In a pure object space, documents would become objects which export an interface - an HTML document object, for example, would know how to tell you about it's structure, what it links to and embeds, what style sheets it uses and so on without the querying application needing to actually know how to parse HTML. It would also, of course, know how to render itself and thus be easily embeded in a browser or any other application object (which would be a document itself, oriented towards providing UI manipulation tools instead of storing data.)

Basically, this is the extending the concept of component architechtures (which are primarily execution-oriented) to the data space as well. This has been done before, most notable example probably being the Symbolics LISP machines of the 1980s. These systems were based entirely around lisp, applications to OS services, and the lisp objects existed in one big garbage-collected persistant object store. Apparently the debugger could even trace execution into the operating system, because the source code was essentially all there (in symbolic format.) I've heard of a few projects to build similar lisp-based OS's for modern PCs, but I don't think any of them are usable.

On a slightly different tangent, the essentially static nature of the web (web pages == files) has become a big problem these days. Serious interactivity - anything for which simple form-based pages is inappropriate, such as an Application Service Provider (ASPs) trying to present a spreadsheet - relies on plugins, applets like Java, or other technologies that don't quite "fit" with the technological design of the web. If instead we could interact with web pages, sending messages to them and having them respond in real time, I think some of the promises of the Internet as an interactive medium would become much more of a reality. The same thing, then, could happen with files!

One additional thing, one feature I would also like to see would be explicit, polymorphic document typing. The HTML document object might export various interfaces, including one which tells you about the HTML structure, one which supplies the plain text, and a third which does the rendering - these three interfaces are entirely orthagonal, as an ASCII text document can supply itself in text but doesn't have the explicit structure of HTML, and a jpeg image can render itself but lacks a textual or structure representation.

I belive this can all be done in a platform-independent, portable way, supporting a variety of programming languages. Someone just has to write the code...

DB + functions + logic, posted 27 Aug 2000 at 00:43 UTC by mettw » (Observer)

I've had a similar idea myself. There are a number of things I'd like to see filesystems impliment:

  • Database File systems are a pretty crappy form of record organisation. A true database that would allow you to get at files (records) through file type etc would be nice.
  • Better magic Determining file type by file extension or magic is a horrible hack. Idealy a filesystem should store file type.
  • versioning Pure functional languages have this problem that they are supposed to be free of side-efects, but the file system effectively introduces updateable variables (and hence side-efects) into a programme. If every file was versionable then this wouldn't be a problem.
  • Horn clauses These are a computable subsection of first order logic that allow you to do some extreemely powerfull data mining. I'm not sure how this would be implimented, but it should be possible.

In defense of filesystems..., posted 27 Aug 2000 at 06:09 UTC by egnor » (Journeyer)

Nobody so far has mentioned the "stability" angle. In a modern OS, the separation between "persistent data" on the filesystem and "transient data" in virtual memory is quite artificial (the filesystem can be buffered in RAM, virtual memory can be paged to disk). It's natural to ask why they aren't combined into one big happy object graph...

What many people forget, however, is that software is flawed. Yes, your cherished Linux box may have an uptime of years ("and I only had to shut it down because my cat ate the power cord!"), but individual processes and entire subsystems (e.g. the desktop) crash or need resetting much more frequently. You don't think much about it when you do "kill" and "restart" something, but what would this mean in a filesystem-less world? If everything is a live object, what do you do when your system crashes? Throw it out and start from scratch?

The very simplicity ("stupidity", if you will) of the filesystem means that it is a much more durable space. As people have observed, files are "dead objects"; you can read them and write them, but you can't interact with them -- which means they probably won't fail, either. You can depend on them to exist and to reliably encapsulate the entire state needed to start the system. If you tell some spiffy Smalltalk object to copy itself, who knows whether it will really make a copy, and who knows if the copying process had any side effects -- but with "cp", you know where you stand.

Now, none of this means we can't improve on the filesystem, but do keep these issues in mind. Protection domains and simple guarantees are useful things to have. Memory protection and safe languages alone are not sufficient (yes, they prevent you from overwriting someone else's memory, but can your state be backed up without invoking a method that could never return?).

Finally, to drive home the point that none of this is exactly novel, I'd like to mention the phrase "orthogonal persistence" and mention the Grasshopper Operating System.

Are you all so clueless?, posted 27 Aug 2000 at 08:08 UTC by kjk » (Journeyer)

Every single poster here (except jmb) seems to be completely missing the real message in The Anti-Mac Interface. The real problem is (and dbryson mentions this in his article): average user does not understand the concept of the file and doesn't give a damn about it. He's forced to learn the concept by us programmers because we're so familiar with it that we can't really imagine how difficult this concept is to grasp for an average person. So far so good. But the rest of the thread is a typical engineering reflex: let's fix the problem by developing a oh-so-much-better filesystem. Filesystems are not a problem. They are good enough to build anything you might want to build. The problem is how we gonna hide the fact that information is stored in files/databases/universal-memory-mapped-file-system. You're all trying to solve the wrong problem. Instead of dreaming about OSes without files try to dream about programs that will be easy for people to use.

tetron: the real problem is that YOU CAN'T HAVE SEX WITH FILES. Let me repeat... or maybe not, you get the idea.

mettw:

  • database you can build a database on top of filesystem. What do you want more? What would be the benefit of integrating the database with the filesystem (code bloat and harder maintance doesn't count)?
  • better magic BeOS filesystem already does that (stores mime type of the file in metadata, Gnome also implements metadata) but this is more a problem of enforcing the usage of such metadata in applications across the system not the insurmontable deficiency in a filesystem. You can just as well implement this on top of a filesystem (like Gnome).
  • versioning in theory a neat idea but what about very big files? I change one byte in a 1 GB file. Without a rather complicated code I end up with two 1 GB files. But the real problem is: how would you present (enforced) versioning to users? How should user select one version from among 100? How do you decide when to delete oldest versions to save space? Do you choose an arbitrary number or do you ask users "how many revisions do you want to keep?" And user goes: "ugh?". In short you'll only create new problems. Besides if versioning is really necessary you can always have application-specific implementation on top of a filesystem.
  • Horn clauses and what for? Do you think an average user will ever be able to use anything like that? For specialized applications you can always create application-specific implementation on top of a filesystem that more likely than not will have a better performance than a generic system could achieve.

The interface vs. the implementation, posted 27 Aug 2000 at 08:28 UTC by jdub » (Master)

In addition to Sunir's excellent links (all of which I'd recommend myself) I suggest having a good read of Alan Cooper's About Face. Make sure you sit down... and read it with an open mind for users! :) Notably, the concept of "saving" gets a big kick in the bum.

I've was dreaming about an interface with a non-filesystem-centric design a few years ago. After some research, I found Lifestreams, which was the idea's in code - and in user testing. The trouble is that you're replacing the dimension of your interface (space) with another (time), but you need more dimension: Relationships between information.

The other trouble is backwards compatibility, but I think this is almost solved. When your "applications" become "information interfaces" then those interfaces can be housed in a system that organises the information. Think Bonobo, COM and helpful related technologies like GConf. Take away this silly domination of windows. They're useful, but shouldn't form the base.

Taking away this complicated, archaic system of files, you find a problem with your interface design. That's another backwards compatibility problem which has to be solved before a move can be made.

I look forward to reading Radagast's articles on user interface design - who has the guts to make radical changes to interfaces, other than the user-oriented (as opposed to customer-oriented) Free Software community?

(Thanks to Radagast for correcting my use of the word "document" -> it should be "information" as we're talking about the entire system, which need to provide for a much deeper granularity than just "documents")

Re: Are you all so clueless?, posted 27 Aug 2000 at 10:11 UTC by mettw » (Observer)

kjk, there's nothing wrong with loving yourself - I have an extreemely high opinion of myself too. But if you suffer from this particular affliction there are a few rules you need to observe. (1) Don't expect everyone else to have the same high opinion of yourself. The Australian term for such people is `wanker' (devivative of `wank' - Col. To masturbate). (2) Don't go around declaring everyone to be an idiot unless you really know what you are talking about. This just results in someone pointing out that the idiot is yourself, which in turn causes your agression levels to rise. (3) People who dismiss new ideas out of hand are always remembered as being stodgey old fools who couldn't see what the more creative person could.

Filesystems are fine, and you still have problems to solve...., posted 27 Aug 2000 at 10:28 UTC by jlbec » (Master)

A filesystem is a form of hierarchical database. A filesystem can store many things that simple databases can. You can easily use a Berkeley DB as a filesystem, and you could write code that used a filesystem with a Berkeley DB interface.

The issue at hand has nothing to do with the filesystem. As they point out in the Anti-Mac Interface, it doesn't matter what the backend is, we need to provide a better interface to the user. Folders and files work decently well, but maybe we can do better.

The fact remains, the user is going to have to find the "object" they want to work on. Maybe it is a pain in the current system, but any new system has to solve this problem as well.

If you have a persistent object storage thingy, how does that help the user? The user still has to find the correct "object." To a journalist with 1000+ articles on his laptop, you can bet they would want some sort of organization. How does the human mind like to organize things? By hierarchical groups, of course. Before computers, that journalist would store articles and associated research in separate folders in a file cabinet.

I'm not saying you won't find anything better. But you have to think of the problem, not the implementation. If the user wants the spreadsheet for the first quarter of 1999, it doesn't matter if the spreadsheet is in the directory "/home/john/data/1999/1q", the directory "C:\Program Files\Excel\sheets", the Oracle database "select * from 1999 where quarter=1", or some fancy persistent object. The challenge is not these frivilous backend details. Rather, it is creating a front-end that not only simplifies the user's interaction with them, but makes finding them as easy as now or better.

This is, in fact, something the Macintosh has done better than most for years, even from 1984 or so. The Finder doesn't let you see much more than folders, so organization is decently apparent. Obviously, it isn't perfect, or we wouldn't be having this discussion. But it does pretty well. Better than Windows or Unix/Linux, which force the user to see a lot more of reality when hunting files. This has its downside, of course, as when the Unix/Linux type wants to see his/her actuall system layout and work with it. On Unix/Linux/Windows, you can. On the Macintosh, it's harder. There is always this trade off.

I would love to see a new and better approach to the viewing of "documents"/"objects"/"bodies of information". But don't believe that files, databases, or any other structure are the problem here. The problem is how it appears to the user, and any of the structures will do fine for a backend. The appearance does not have to match the backend.

(Oracle has recently released a product called the Internet File System [IFS]. It is simply storing "objects"/"documents"/"files" in the Oracle RDBMS. It allows you to access them with HTTP, NFS, FTP, IMAP, SMB, and maybe even some other protocols. It also does on-the-fly document traslation, so that you can save foo.doc, then get foo.txt, and it will do the doc->txt translation for you automatically. This is still tied to the semantics of a filesystem from the user point of view, but some of the ideas are there, if in their infancy.)

Distributed Shared Memory, One True Address Space, etc., posted 27 Aug 2000 at 18:06 UTC by Toby » (Master)

A little while back, when I was working at ETHZ in switzerland, some dudes in funky clothes did a presentation on Distributed Shared Memory. They were using the 64-bit alpha architecture to try and have a single address space shared between a number of machines. It sounds a little like what you are proposing, except you only seem to take into account one machine at a time.

Well, everything went well with the presentation, until question time. After fielding and answering a number of questions, someone in the back piped up with the "show stopper": How do you guys intend to handle backup?

file are not obsolete, but they are deficient., posted 28 Aug 2000 at 04:44 UTC by tetron » (Journeyer)

jkj: I'll ignore the ad hominim implications of your critisism, as you obviously didn't understand the point I am trying to make.

Files, currently, are the lowest common denominator. They are a basic mapping of keys (names) to blocks of bytes. You can read and change this block of bytes, but the logic to give meaning to those bytes must reside in each program seperately. When a new file format is introduced, every program must be changed to accommodate this new format. If you're lucky, all your programs might use the same library to handle this particular file, so you only need to update that. But honestly, how often does that happen? Take imlib. Among it's other features, it loads a bunch of different kinds of image files. However, the design of imlib is really based around displaying stuff in X so that it's not at all useful for someone writing a console application, or an application with no output at all. Centralizing handling of specific file types is, at best, ad hoc, and this is bad. We know what kinds of system policies will support code reuse, and we should use them.

I belive the current distinction between a program in core and the file on disk really stems from perfectly resonable engineering decisions made in the 1960s, when external storage meant punched cards, magnetic tape and washing-machine-sized hard disks. The concept of swapping and virtual memory was not even particularly well developed - I/O on these devices was timewise a very expensive operation. Programs manipulate data - of course, it's so simple! Data just needs to sit there.

When you're batch processing payrolls or scientific number crunching, of course it makes sense. These were not interactive systems. In particular, relatively few programs had to deal with extremely diverse inputs (compilers would be a notable exception there.) Programs or systems that humans interacted with, even indirectly, as in the form of API were relatively uncommon - most code was written from scratch. The point I'm trying to make is that the concept of a "file" as distinct from what's in memory made sense back when these systems were being developed.

Now, fast forwards to 2000. Program size and complexity has multiplied by a thousandfold, and interactive systems are the norm. Object-oriented programming is the current favored paradigm, and an outgrowth of this is towards interface/component-based programming. Why not build a system of pure components? Some components exist to store data, others exist provide user interfaces, but we can break down the arbitrary wall between data and code, and between memory and disk. Of course this is already possible with current systems, sort of, but it's not easy.

This is a revolutionary, not evolutionary, step in system design, even though it does follow from current computing trends. What really needs to happen is less bickering about what new systems should be like and more actually implementing ideas to determine empirically if they're actually any good!

The distinction bewteen filesystems and magic happy object thingies, posted 28 Aug 2000 at 05:54 UTC by witten » (Journeyer)

jlbec : It's not just the end-user's experience that is important. If that were the case, then I would completely agree with you that it doesn't matter whether we use traditional filesystems or persistent object stores or Oracle databases or whatever else. But what's just as important as the end-user's experience is the programmer's experience. If the programmer's job is easier and less painful, then the programmer will generally produce better code, faster.

So here's how the magic object store would ease a developer's life, and why the actual storage mechanism matters.. Because the harddisk is treated as nothing more than overgrown swapspace, the data can presumably be written out to disk automatically without any extra work by the application developer. The benefits of this are pretty huge. A very large percentage of the code in your standard application is spent explicitly serializing data to disk, reading data back in, parsing file formats, etc. etc. If all of this rigmarole can be eliminated, coders can spend that much more time worrying about the code in their application that actually does something useful. I think it would be really nice to deal with data in its native in-memory structures, rather than duplicating that data on disk in some sort of convoluted file format.

And yes, there are lots of issues with such an approach, as others have pointed out: dealing with crashes, recovery, etc. But I just wanted to point out one of the main benefits of doing away with filesystems: We get to code more of the fun stuff: the actual do something part of the application.

The 'memory' and 'process' metaphor applied to persistence, posted 28 Aug 2000 at 12:38 UTC by Bram » (Master)

I'm going to talk about how one could improve the programming interface of files - Improving over files as a user interface concept is, thankfully, a separate issue.

As other people have suggested, it would be valuable to have a hard drive interface API with calls similar to malloc() and free(). This raises the question of how you keep applications from leaking the entire hard drive. A logical solution would be to extend the memory metaphor to include persistent 'processes'.

When you kill a 'process', all hard drive space it allocated is freed up again. That solves the memory leak problem, but the 'process' concept helps with a lot more. Process permissioning is already well understood, while hard drive permissioning is a mess. Starting and killing a process is a simple and reliable, while installing and uninstalling programs barely works.

What I would really like to happen is that when an end user 'installs' an 'application', the programmer thinks of it as 'spawing' a 'process', and the only headaches are that the 'process' needs to make a distinction between persistent and non-persistent 'memory'.

Reliability problems could be dealt with in much the same way they are now - your first 'process' spawns another 'process' which does most of the real work, and when the second process starts to go haywire you kill it and have the first 'process' spawn a new one. This is essentially what's done now with killing applications, only more unified in what concepts are being used.

If you've never had to deal with direct file manipulation, be glad - it's only fun if you really enjoy pain.

Confusion in user interfaces, posted 28 Aug 2000 at 18:17 UTC by imp » (Master)

The big reason that this discussion keeps coming up over the years (I recall having it in 1985 in college) is that people misunderstand the basic concepts of a file. When you are writing something in the pre computer world, you'd jot it down on paper, then have one of the girls in the pool type it up (in the pre-computer age, there was rampent sexism). They would type up the paper and would put all the pages into a folder and hand it back to you (possibly multiple copies to multiple people). It seems to me that part of the problem here is that somebody STUPIDLY thought that a FOLDER should be used to describe collections of objects rather than what was traditionally (a collcetion of one document). Also FILE and FOLDER are the same thing to many people: An over large hunk of paper that goes around a collection of pages that make up a document (of course there are exceptions: file of bills, but I digress).

If the original interfaces had talked about rooms (maps to a disk or logical volume) and filing drawers (maps to directories), then people would be less confused. Of course, when trying to map these concepts from the real world that supports only a limited number of hierarchical levels to one that supports a huge number can be hard, but users don't need to know or do that. They will be happy with keeping things in a shallow wide hierarchy.

The real problem here isn't that files are obsolete. They are insuffient to described a massively hypertexed environment. When you have pointers to other things in a complex way, you need some way to reffer to them. It is the responsibility of the thing that is collecting them to deal. when I save a word document, say, I don't care if it goes to 1 file or 100000 files, just so long as I can get it back by the same token I gave it.

One problem with this approach is that you move knowledge of collections out of the file system (where directories collect agregation) into the applications. The more that you have there, the more every damn tool that deals with moving information around needs to cope. With file systems and dierctories, if you save your complex document in one directory tree, it becomes very easy to package up and move around from one place to another. If you get Object DataBASE back ends involved, or any other complex mechanism to try to agregate the files, it gets complex in a hurry.

Why reinvent another complex sysmtem when you have one that already does these tasks. A filesystem does it job well and the layering that it enforces is a good thing. Other areas where layering is a good thing: networking. Does every FTP client need to know tcp? No, the lower layers of the stack just deal for you and gives the FTP client bits to deal with. A file system is no different than that.

Now, a GUI interface to all of this could be interesting. I think that's where the research needs to happen. Arguing that the filesystem is obsolete because programmers provide such a horrible interface is bad.

No distinction between in-memory and on-disk formats == bad idea!, posted 29 Aug 2000 at 00:36 UTC by dto » (Journeyer)

I think it would be really nice to deal with data in its native in-memory structures, rather than duplicating that data on disk in some sort of convoluted file format.

That is very backward---UNIX learned this lesson in the 70's by using text-based formats (at a time when the space/speed penalty was much worse than it is now!)

When you just mirror the contents of memory to a file, you immediately render the document data nonportable even across minor program versions. You may save programmer time by using the same structure as in memory, but this is highly brittle because the data file's format is now entirely dependent on tiny and non-obvious parts of your application code--structure sizes, layouts, data types, etcetera.

The result is that effectively one and only one application can read your file. We all know how much Microsoft has exploited this idea, as it is nearly impossible to extract everything from MS Word files (especially when you have embedded things into the document. COM documents are essentially on-disk copies of C structs.)

The magic object store would be fantastic in a world where there was only one kind of computer and application but in the real world people need to save and exchange files with each other, convert between different formats, and work in heterogeneous environments. Files may be a lowest-common-denominator solution to this problem, but that's what's needed.

memory vs disk, posted 29 Aug 2000 at 04:04 UTC by tetron » (Journeyer)

dto: you're right, on-disk formats which are essentially literal dumps of in-memory data are a terribly bad idea. However, there is no reason that standardized serialization formats cannot be adopted, so that when objects are stored on disk they are in a portable format. The problem then becomes, however, that you need much tighter integration between the language and memory-management system responsible for doing disk <--> memory mappings. This is not really possible in statically-compiled languages like C.

imp: the high-level interface is a direct reflection of the low-level design of the system. If the low-level design is lacking, it becomes far more difficult to present a truely powerful and flexible abtraction. In fifteen years of GUI development, no one has managed to break out of the MacOS desktop finder paradigm to present file management. Even Nautilus, at the bleeding edge of GNOME development, still consists of a window with some icons representing files and folders in it. Where's the revolution?

On dumbly dumping data to disk, posted 29 Aug 2000 at 04:06 UTC by witten » (Journeyer)

dto :

When you just mirror the contents of memory to a file, you immediately render the document data nonportable even across minor program versions. You may save programmer time by using the same structure as in memory, but this is highly brittle because the data file's format is now entirely dependent on tiny and non-obvious parts of your application code--structure sizes, layouts, data types, etcetera.

Perhaps I gave off the wrong impression or didn't supply adequate detail, but I did not mean to suggest that it would be sufficient to perform what amounts to a coredump and pass that off as a file format, like Microsoft Word does. For robust object storage, you'd have to go through all the standard data serialization procedures that are common with persistent object stores: pointer pickling, endian conversions, storage of object layout information, etc. This addresses all of the problems you speak of, and yes, it has been done in the real world. (However, C++ is not what I would call the most elegant language in which to implement object persistence.) And the benefit of doing data storage this way, as I already mentioned, is that it's completely implicit as far as the application programmer is concerned. The OS or the persistence library or whatever is responsible for all the transparent storage. Joe Coder just goes about his merry business playing with all the objects in their native in-memory format, while magic is worked behind the scenes in order to make objects happy and portable on disk.

And by the way, text files are not exactly the robust and stable data wherehouses you make them out to be. :) Ever download and install a new version of a Unix program that isn't quite compatible with its old data file format? I don't really enjoy going in and hand-massaging the various fields of these files to make them compliant with a different file format. I would think that properly-described object store data would prove much more robust, as the data would be truly structured, and thus more easily passed from one version of an application to another.

Fancy tricks do not a solution make, posted 29 Aug 2000 at 08:10 UTC by jlbec » (Master)

Coredumps of the current process state don't make good file formats. We know that. An effective, endian/alignment/size reistant format could be devised. Object storage, RDBMS storage, and other solutions can possibly make that transparent.

You'll still have the same file format problem.

In the old days, you had BMP, GIF, and JPEG. BMP was huge, but it contained all the data. GIF was smaller, and still contained all the data, though it had that nagging license issue. JPEG was tiny, but you didn't get all the data, just enough.

In todays super-large disk/memory/broadband world, a 600K PNG isn't as much to worry about (though ask the people on 56Kbps, and they'll still be pissed at the poor web designer). But video and audio have the same issue. We've all accepted MP3 and Ogg Vorbis as good-as-new solutions to audio compression, but they do not contain all the data that the WAV does. Do a few rounds of CD -> ogg -> CD -> etc and you'll see lossage. Someone is going to want or need the data in the WAV or the movie format or whatever comes along, and someone else is going to care about space, bandwidth, or some reverse concern.

If you store a GIF in an object system, you may be able to do fancier things as a programmer than you would storing it in a file. But that GIF is going to be a choice of size/lossage/etc that it always was. You can store PNG, BMP, JPEG, GIF, or XWD files in that object system, and you've gained nothing over the current mess (wealth) of formats. The concerns and the tradeoffs will still be there.

I can see where new paradigms for the programmer can help. But the real change, the real revolution, will be the shift for the user. HTTP wasn't revolutionary for the backend. You can do the same with all local storage and file:// URLs. It was revolutionary because of the way the front-end was put together for the user. It created a whole new level of interaction. This is the goal we really need to be searching. This is the goal of the Anti-Mac Interface (as I read it, at least). Once the new paradigm is discovered, you'll likely be able to do it with files, databases, objects, whatever. It's all 1's, 0's, and xor gates, remember.

Perhaps the problem is lack of Meta Data?, posted 29 Aug 2000 at 16:33 UTC by cmacd » (Journeyer)

As I understand the question, there are folks who find that having to put their documnets in files, and then retrive them later is a confusing concept. I wonder if at least part of the problem is the lack of MetaData expresed by the average FileSystem

While both Windows and Linux allow a long file name, and so we can be for more discriptive that we used to be in the days of MS-DOS and the 8.3 format. (Itself an improvment on other systems with as few as six character file names.) It is quite often hard for someone to pick a name that will allow easy retrival of the information later.

The system of nested folders (or directories) can be of assistance, or hinderance in this connection.

For example lets say you have a document concerning the hiring of a new java programer for your Kanata office in the 4th Quarter of this year. Does it go in $HOME/hr, $HOME/kanata, or $HOME/java or even $HOME/2000/Q4 ?

I suspect that everyone here has at least a few directories that have sub-directories that have the same name as another subdirectory elsewhere in your file system.

the tradional way to deal with this in the office setting was to create somthing called a records system, you might have:

  • 8000 Human Resources
    • 8000-10 Human Resources - Ottawa
    • 8000-11 Human Resources - Nepean
    • 8000-12 Human Resources - Kanata
      • 8000-12-1 Human Resources - Kanata - Programing staff
        • 8000-12-1(J) Human Resources - Kanata - Programing staff - Java
          • 8000-12-1(J)-00357 Human Resources - Kanata - Programing staff - Java - Fred Smith

All the paperwork connected with the mythical Mr. Smith would end up in a nice case made of Manilla, with that number on it, and somewhere there would be a book that listed Mr. Smith and the fact that details of his existance were also under

  • 5000-12 Payroll -Kanata
  • 6754-60-00357 Learning Plan - Fred Smith
  • 9845-000129-6 Project file - xyz.com web site.
  • .....

All the paperwork was controled by someone who was paid to do just that, and so to the officer who was deciding if Mr. Smith should get a raise, it would be a matter of asking "what have we got on Fred's Performace?"

Without such a formal system, the information grows to the point where it is difficult to find without using the File|Search tool.

The hard part is that by the time you are ready for formal organization, the system is far out of hand. AND the user will resist having such a system imposed on them. The answer may require some use of AI techniques to index all the users documents, and allow related ones to pop to the surface,

Even having a chance to create a link to a the file by another name in another folder, when the user was creating the file may help delay this problem.

text formats wasn't point of post, posted 1 Sep 2000 at 00:58 UTC by dto » (Journeyer)

you're right, on-disk formats which are essentially literal dumps of in-memory data are a terribly bad idea. However, there is no reason that standardized serialization formats cannot be adopted, so that when objects are stored on disk they are in a portable format.

What would this change, though? The issue of standardized data formats has been around since the dawn of software. The reason it's so hard to progress is that it is very difficult to make data formats automatically implicit without also embedding too many specifics about the application. That is, any "implicit" data format will neccessarily mirror in some way the in-memory structure, whether it is XML tags with fieldname+value or a binary dump. Computers do not have the creativity to do much more, so humans still need to invent data formats.

All that matters is that we agree on their syntax and semantics.

As for the portability of the format, I was talking about more than endian-ness. An application from vendor B won't read vendor A's files unless they either use the same code, or agree on a format that doesn't dictate the structure of the client application.

And the benefit of doing data storage this way, as I already mentioned, is that it's completely implicit as far as the application programmer is concerned. The OS or the persistence library or whatever is responsible for all the transparent storage. Joe Coder just goes about his merry business playing with all the objects in their native in-memory format, while magic is worked behind the scenes in order to make objects happy and portable on disk.

I am very skeptical of plans that require everyone to switch to a new (usually interpreted) language and new operating system in order to work. All this machinery needs to be in place in the OS/environment to be able to either write or read these files. (Otherwise we'd be doing it now.) The data format is implicitly constructed by the "persistence library", so I ask, how would any other application or other operating system read this data?

As for the link witten provided, yes I can see where persistent memory-type systems can be useful and neccessary, but none of this solves the problem of data interchange. ColdStore is not "binary xml," the page says that it's for dynamic databases, object caches, and so on.

And by the way, text files are not exactly the robust and stable data wherehouses you make them out to be. :) Ever download and install a new version of a Unix program that isn't quite compatible with its old data file format?

I mentioned human-readable text files as just one example of alternatives to just dumping C structs, there are others. And, in answer to your question, no that hasn't ever happened to me. Again, data interchange presupposes common data formats, and it takes care to keep things consistent. Notice XML hasn't solved this problem.

Data standardization, posted 1 Sep 2000 at 08:48 UTC by witten » (Journeyer)

dto :

What would this change, though? The issue of standardized data formats has been around since the dawn of software. The reason it's so hard to progress is that it is very difficult to make data formats automatically implicit without also embedding too many specifics about the application. That is, any "implicit" data format will neccessarily mirror in some way the in-memory structure, whether it is XML tags with fieldname+value or a binary dump. Computers do not have the creativity to do much more, so humans still need to invent data formats.

All that matters is that we agree on their syntax and semantics.

As for the portability of the format, I was talking about more than endian-ness. An application from vendor B won't read vendor A's files unless they either use the same code, or agree on a format that doesn't dictate the structure of the client application.

I think what tetron was talking about was standardization in terms of implicit storage formats: endianness, pointer layout, etc... standardization in such a manner that objects are stored on disk in a portable format. Such things can be fairly well automated by computers.

What you seem to be talking about instead, and which I think is a very valid point, is standardization in terms of what particular structures are chosen to store the given data. And you're right.. people would still be required to agree on these data structures when different programs need to interoperate and share information. However, even though this is still necessary, it is much less of a bother when all the grunt work of splatting data to your harddrive is done for you (the pointers, etc.)

So yes, transparent storage wouldn't make everything involved in data interchange completely automatic, but it would, in my opinion, make the process a whole hell of a lot easier. And when people can agree on a common format for a particular type of data (which, who knows, might be easier to do when you have a standard set of data structures with to work with), then things really can be implicit.

I am very skeptical of plans that require everyone to switch to a new (usually interpreted) language and new operating system in order to work. All this machinery needs to be in place in the OS/environment to be able to either write or read these files. (Otherwise we'd be doing it now.) The data format is implicitly constructed by the "persistence library", so I ask, how would any other application or other operating system read this data?

Why of course, the same way that traditional operating systems read most data now: manual, laborious translation of files formats into memory data structures. After all, it would follow a set format for the underlying pointers and primitives. That's the part of this serialization that computers are good at automating.

You should be skeptical of anything that requires all kinds of new-fangled machinery in order to even function. But sometimes that's a good way to make progress.. trying something new, even if only as an experiment.

As for the link witten provided, yes I can see where persistent memory-type systems can be useful and neccessary, but none of this solves the problem of data interchange. ColdStore is not "binary xml," the page says that it's for dynamic databases, object caches, and so on.

Granted. But I think that it's easier to pass around and deal with an object or struct, from a programmer's perspective, than it is to pass around and deal with a dead data file. Even if they both require humans to agree on some data structures.

ok, posted 2 Sep 2000 at 23:10 UTC by dto » (Journeyer)

These are valid points, I see where I was misreading above. One thing I'm still not clear on is how any of this relates to whether or not data is stored in files (i.e. "abolish the filesystem, long live files.")

More thoughts about files, posted 3 Sep 2000 at 21:31 UTC by nymia » (Master)

Files and file systems remind me of the noun-verb relationship wherein data files are nouns and verb files are programs. Verb programs act on noun files, resulting to the creation of a verb or noun file. Of course, in the computer world, verbs can become nouns too wherein a verb program uses a verb file as its input. But, noun files are just passive objects, they have no mechanism of filtering data. With that, I think there is no problem with how verbs and nouns are stored. And the separation of verbs and nouns are a Good Thing. But the problem I see is the issue of mapping verbs and nouns and vice versa, the way how a normal non-tech savvy user would see it. I think these users are still trying to understand how verb affect noun files and vice versa. So, IMHO, the improvement must be done at the mechanism level. The mapping function or mechanism should be obvious to the user. The improvement could come in the form of a layer on top of the existing. Or it could be just graphical devices called views.

One thought I would to share is: do adjectives and adverbs, files that have the capability to describe or modify, are they considered important in file systems?

That's my two cents.

Relationship to the filesystem, posted 4 Sep 2000 at 01:39 UTC by witten » (Journeyer)

dto :

These are valid points, I see where I was misreading above. One thing I'm still not clear on is how any of this relates to whether or not data is stored in files (i.e. "abolish the filesystem, long live files.")

Here's my take on the relationship between between these object stores and traditional filesystems... The need for a standard filesystem is greatly decreased when your fundamental interaction with data is done via objects in some sort of persistent store. Sure, you could make your persistent store a single file in a traditional filesystem, the same way that you can make a relational database a single file on top of ext2fs. But this approach is not really desired, because doing so merely adds an unnecessary, unutilized layer between the object store and the disk drive. The objects don't care about file names or symbolic links or directories or resource forks or any of the other things you'd find in traditional filesystems; the object store has its own special semantics.

And furthermore, the performance requirements of an object store would likely be quite different than that of a normal filesystem. Typical stores work by sequentially spitting out dirty/altered in-memory data to a journal on disk every five minutes or so, rather than continually updating on-disk information and seeking all over the place as the system runs. I don't know whether such a difference would be enough to warrant the use of a special disk storage/retrieval layer created especially for persistent object stores, rather than simply using something like ext2fs, but it should certainly be a consideration when designing a system for high-performance object persistence.

Several years ago I read a paper on an object store that was actually faster than a normal filesystem for certain common operations, because less disk seeking was required. I can't recall whether this was from Eros or Grasshopper or something else, but I wish I could find the URL to post here..

good point, posted 7 Sep 2000 at 05:13 UTC by dto » (Journeyer)

This is a response to the last couple of comments.

witten: I agree with you fully about developing custom file-systems, perhaps for persistent objects with high speed, when the situation requires them. I'm sorry I was so misled about what is really being discussed here; this is largely a result of the main article title being bogus, i.e. "filesystems are obsolete." It's hard to parse what this notion could mean because almost any layer of abstraction imposing structure on the raw disk for the purposes of storage and retrieval is a filesystem, and basically any named persistent resource is a file, and there the story ends with the definition of these terms. "Abolishing files and file systems" is not a coherent concept, it is a misunderstanding of terms. BTW This is not directed at you, as you are arguing for the development of new filesystems, which is great. But the tone of this whole article reminds me a bit of that TUNES project! :-)

Jacob Nielsons take on filesystems, posted 29 Sep 2000 at 11:46 UTC by caolan » (Master)

Just throw The Death of File Systems in here in case anyone wants to see Jacob Nielsons take on filesystems. From a UI point of view files are a complete dead loss, I've seen it time and time again, people have absolutely no concept of where their files are in the filesystem hierarchy.

C.

Jacob Nielsons take on filesystems, posted 29 Sep 2000 at 11:46 UTC by caolan » (Master)

Just throw The Death of File Systems in here in case anyone wants to see Jacob Nielsons take on filesystems. From a UI point of view files are a complete dead loss, I've seen it time and time again, people have absolutely no concept of where their files are in the filesystem hierarchy.

C.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page