Shared Functions: A Replacement for Shared Libraries

Posted 9 Mar 2000 at 08:28 UTC by aaronl Share This

I would like to propose an idea I had recently that could serve as a replacement for shared libraries. Since it would break compatibility and would take a lot of work to implement, I don't believe that this should be used in existing operating system. Consider this essay as my view of the what shared libraries should have been.

From a casual perspective, shared libraries seem like a good idea. Often times they are. For example, when my program is writen to be used with a widget set and a C library, it makes sense to link those dynamically. However, there are other situation where the question of what libraries to depend on becomes harder to decide upon. If someone writes a library of quick utility functions, should you copy the few functions you want, or link against the library and require users to download, possibly compile, and install the shared library. This can really get out of hand when using several rare libraries. Copying/pasting functions will result in code being duplicated across applications on the system, and linking to many uncommon shared libraries can make people who want to use your application go through extra steps to get it working (unless it is a Debian package :) ). Let's face it, in this age many of the people using open source programs are not hackers, and even if they are advanced programmers it is no reason to make their lives harder.

I would like to propose an idea I had recently that could serve as a replacement for shared libraries. Since it would break compatibility and would take a lot of work to implement, I don't believe that this should be used in existing operating system. Consider this essay as my view of the what shared libraries should have been.

First, let us return to the example of the library of the utility functions. This could be a library several megabytes large with every C utility function you could ever imagine. Just because you want to use several of these functions doesn't mean that a user should be forced into installing a large shared library. What if shared libraries were distributed at the function level? A program could depend on certain functions rather than libraries. Combind this with a good software dependency management system like APT, and you cut down a lot of the cruft that must be installed. In this example, only the functions that the program actually used would be installed, saving a lot of space and network bandwidth. These functions would be shared across applications through some clever usage of namespaces, perhaps as a unique prefix prepended to each function defining what "collection" it belongs to. Some libraries like GTK+ and OpenGL already employ this namespace collision avoidance scheme.

The real beauty of this scheme is that no code ever has to be duplicated to avoid dependencies which would be an inconvenience. If the code needs a function, that will be grabbed off a server instead of a complete shared library.

If shared libraries are a pain to download and install, how can individual functions possibly make it easier. It would not be easier, _unless_ the system was designed to accomodate this using a smart package management system. Since this essay suggests standardizing on a new library format, it is not taking it much further to sugest standardising on a package manager :). Using APT would mean that the distribution center for the libraries would NOT have to be standardized, since a user could just add the distribution center for a particular application to sources.list and grab the application. Dependencies would be handles automatically, searching for the latest versions of all of the libraries containing the functions it depends on in all of the distribution sources. Having an official source for getting functions would not be a good idea in the free software world, but people could set up archives similar to metalab which would amass large volumes of functions and would have APT package list files. In this way any functions that a program depends on could be downloaded from one of the few megaarchives listed in APT's configuration file.

Of course, APT can download and install shared libraries automatically now. But the problem is that people are weary of adding dependencies becasue not many people use automated depency systems like this. RPM will tell you that you have unmet dependencies, but it is up to you to find the packages. Since my idea is a complete fantasy anyway, I can add a standardized package distribution system to the list of other dreams it depends on without making it any less accessible :).

Note that when I talk about APT I am not trying to say that it would be the only eligible program for the task. Personally, I think it's not up to the task I'm describing, but I don't know of any similar programs. So let's not start a war.

I'd be interested in any comments that people have. I am essensially a newcommer to the Unix/Linux world and this idea may be dumb and boring to read about. However, I thought it was worth sharing. Well, it seemed like a good idea at the time ;-).


Not so simple..., posted 9 Mar 2000 at 09:32 UTC by lolo » (Journeyer)

Well in fact the suggestion you are making turns out to be very complex to implement. Here are a few technical objections to it:

  • For the C language a breakup of libraries at a function level may seem appropriate, but with C++ the problem is a little bit more difficult. Shall we split the libraries at the function level, or at the C++ class level? What about the function that are defined implicitely like constructors/destructors? or the constructors calls for static objects that are invoked at program startup...
  • Even without considering other languages than C I think that the breakup at function level may not be appropriate. How do you handle global variables in this scheme? Also I think that there would be some complex dependencies issues inside libraries themselves, where functions depend on each other.
  • Who would responsible for specifying the dependencies versus the various functions your program needs? This is a complex task compared to just specifying the libraries you depend on.

So IMHO the change you are proposing can be summarized as replacing a few macro dependency versus a big bag of functionality (a library) with a lot of micro dependencies versus smaller entities that are themselves inter-dependent. You introducing more complexity in the system.

The initial problem you were trying to address with this proposal was "the complexity of the dependency management for end-users" (those installing the software). And the solution you proposed in the end ("shared functions" and relying on a tool) has a flaw because it has introduced more complexity on the system, and there it has made it more failure-prone (is that english :-).

A possible solution to the initial problem is the generalization of the use of tools like Debian Apt just like you said it. When using this kind of tool the complexity is handled by the tool and the people creating installable packages. That way the interface offered to end-users (those installing the software) is kept reasonnably simple.

It could be made even more simple by using expert-system technology to translate statements like:

  • My name is John Q. Random.
  • I'm a teacher (math).
  • I've an interest for electronics.
  • I like playing chess and go.

Into:

  • Create a login for John Q. Random.
  • Install a set of educational software (a gradebook, math software, ...)
  • Install some board games (and provide the user info on how to play such games over the internet).
  • Install tools for drawing and simulating electronic circuits.

Hmm, well that's all for now.

Bundling is an issue for weakly-connected computers, posted 9 Mar 2000 at 10:21 UTC by Raphael » (Master)

What is a library? Basically, this is a collection of object files (.o files) that are bundled in a single package. Although it would be possible to distribute the object files separately, I think that it would be difficult to split these at the function level as you are proposing. In many cases, the functions contained in an object file must be kept together because they have cross-dependencies and they also depend on static functions that are not exported. So you cannot split these object files unless you decide to make all internal functions and data structures visible to the "outside", which is usually not a good choice because that goes against clean APIs and it makes it impossible to change the internals of the library without breaking the old applications.

So a more realistic version of your proposal would be to distribute object files separately, not functions.

That would be possible, but I am not sure that you would really gain anything with that. You explain that the main advantage of your proposal would be to save disk space by only installing the functions that are needed by the applications, instead of installing the whole libraries. But I think that you forgot one thing: in order to make it possible to distribute the parts separately, you also have to make sure that you get all the needed parts if there are some cross-dependencies between them. Shared libraries are a collection of object files; distributing them in a single package ensures that all cross-dependencies between these object files are satisfied. If you distribute the object files separately, then you must have a way of tracking down the dependencies, so that a mechanism similar to APT can get all the required files. This means that the object files (or some separate files) would have to contain not only the names of the external symbols used by the code, but also some version numbers and some other information to make the automatic retrieval easier. This takes some extra space, and if you end up needing all the files that were in the original library, you would have consumed more disk space than if you had installed the whole library as a single package.

Another problem is that some mechanisms like APT are fine as long as your computer has a direct connection to the Internet, but are painful if you are not connected. If you have to transfer all files on floppy disks, resolving the dependencies for some packages can be a nightmare. For example, if you get a Linux distribution on CD-ROM, you can install everything and get a working system. But if you download and install some new package later and you discover (only after having transfered the files) that it requires some updates in other packages, then you will usually prefer to have to download two or three updated libraries instead of twenty or thirty object files.

The interdependence will be a mess, posted 9 Mar 2000 at 12:14 UTC by ralsina » (Master)

Suppose you have two functions that depend on each other, directly or through some other path (and there's a lot of them in a large library), you need to ensure that compatible versions of the pair are installed.

Considering that you would have to check every possible dependence path between two functions, it starts to get difficult quickly.

For instance, take a current library with 100 functions. Each function has 98! (give or take a few) ways on which it could depend on each other. So you have almost 100! possible dependencies to check.

In a system as complex as a current linux system, there wouldn't be enough disk space to store the dependency information. Not to mention that it would take quite a bit of work to figure them out ;-)

Of course you can simplify this by defining groups of functions and only caring about inter-group dependencies, but then it's the same as it is now, isn't it ;-)

there are two issues here., posted 9 Mar 2000 at 13:35 UTC by cmacd » (Journeyer)

I think that your idea has a couple of semi-related points.
The first is to advocate the use of tools such as the debian dependancy system to eliminate the need for a user to have to worry about which shared code a given application uses. One of the problems with the dominant closed source desktop software is its tendancy for new applications to distribute fresh incompatible versions of its shared code resulting in the breaking of previously installed applications. The so called DLL Hell
I would not disagree with many of the points raised above. In a networked enviroment, where there can be trust of the distribution point, it is very nice to be able to give one command and get the software brought up to the currrent version. For users who are on slow links, or have to use sneakernet for upgrades, the missing link is a tool that could walk the dependancy chain and get all the required updates in one swoop. This might consist of a downloadable database of the current status of all packages recognised by the distributor, and a local quiry program that would give the user a list of what supporting packages would have to be downloaded/installed from the original CD in order to install a new package.
The second point I think you are trying to make concerns the planning of the structure for shared code. One plan would be to make all singing all dancing lib-everything packages, on the model of the c lib. This plan has the advantage that one download will install many functions. The disadvantages include a large download for any change, and the posibility of a new version breaking something else. Your proposal seems to be asking for many small libraries, each basicaly containing only one or two related functions. this has the advantage that only the needed code would have to be downloaded, and stored but has the requirment for many more small files, and more dependancy checks and tools.
I suspect that the real issue is for planners to resist the temptation to throw every idea that they have into a library, but instead to plan the uses that the library will serve. If there are functions that are unrelated in a library that you are writing they should probaly be moved into a separate package. - of course one does not have that choice if the library is already used by another program.
The general rule of life applies - Keep It simple. and I suspect that in many cases that means starting a new library rather than adding unrelated functionality into an existing one. IF that means creating librarys that contain only one function, then that is a valid choice.

Start up time ..., posted 9 Mar 2000 at 13:54 UTC by jamesh » (Master)

As your program requires more shared libraries, it takes longer to start up. For each library the program requires, the dynamic linker has to find the file where it is stored, possibly follow a few symbolic links and finally load the library into the program's address space.

With some of the programs in GNOME, this is already starting to become a problem. Increasing the number of libraries that need loading this much (a `shared function' is pretty much just a single function shared library) will definitely have an effect on program startup time.

Also splitting up libraries can have negative consequences -- currently libraries are internally consistent version wise due to the way they are installed. By splitting things up like this, there is the possibility of having different inconsistent versions of parts of the library, unless the system is designed very well.

Dependancies., posted 9 Mar 2000 at 14:06 UTC by caolan » (Master)

I have to say Im in favour of the goals

One example of the sort of thing that a layout like this would sort out was a something I have with libwmf where I want to use libxpm to read in an xpm from file to xpmdata. libxpm depends on X which of course requires the usual 3 to 5 libraries. the configure script bulges at the seams to find libxpm (which could be anywhere) along with everything that X requires. A bit of overkill to read in a xpm file. The xpm function in libxpm being used does not need X at all.

On related topics its a real pain in the ass to keep dependancies together in other projects. Especially optional dependancies, lets take wv, it needs nothing, but it would like to have an iconv implementation to convert charsets, if its not there then it works fine but can only output native utf-8. Fine this is the way I want it, but if I want others to link against wv I have to also install a script along the lines of the gnome-config thing from make install, so that a configure script searching for wv knows whether or not it has to link against libiconv. (not to mention the countless other optional things that libwv might want, libwmf and its optional dependancies, libMagick, etc etc.

Its not ideal by any measure. Now i suspect that libtool can handle some of the workload of working out dependancies for me. But I haven't tracked down a simple example of what I have in mind. I'd really like something like this for me. A configure line like...
AC_SUPER_CHECK_LIB(wv, main),
which runs off and finds wv, works out its dependancies for me and hands me back that list in LD_FLAGS or whatever (aside: I want my complete list of libraries to be sorted for me so that duplicates are removed), while I am at it I want the autoconf macros for CHECK_LIB to do more for me, look in the X library place for stuff as well as /usr and /usr/local. Suprising the amount of stuff that ends up there!, I also want it to search for ordinary includes in the X location

On a related idea wouldn't it be nice if every program was basically a tiny executable with all of its functions in a library. Say for instance even ls, libls would be nice. Get all the nice dir parsing for free. Get better fine control over the functioning of it that you could get with piping. I have been thinking recently about pulling down the source to some of the basic gnu utilities which are very stable and pretty much unchanged for years and librarising the whole lot of them and see what kind of speed difference it makes. With an eye to making the majority of the internal functions public and documented. Im forever writing code to strip out parts of a pathname for instance, there are a host of other simple examples which would be found floating around in common binaries.

This is just some freeform thinking, no practical problems allowed to rain on my parade here. (programs with 100dependacies come to mind, slow. No other system would have these libraries and a program being compiled under solaris would end up requiring about have the gnu project to print "hello world", but still it appeals to me.)

C.

more tecnical objections, posted 9 Mar 2000 at 15:38 UTC by graydon » (Master)

  • if there's static data in the .so, even if it's "proper" singleton data with a mutex on it and everything, you will need initialization and usage information to get things loading at the right time. look at global constructors in .so's. it's not trivial to decide at which point / in which order these sort of thing get run, and you'll need to embed that information in your APT workalike.

  • C++ on linux lacks a "decoupleable" binary object model anyway. if your parent changes, you are stuck needing to re-layout the children. so frequently "whole .so" upgrades of C++ programs, at the moment, are necessary if you want the thing to run, because some superclass somewhere changed slightly.

my god, it's full of libraries ..., posted 9 Mar 2000 at 20:45 UTC by dan » (Master)

On a related idea wouldn't it be nice if every program was basically a tiny executable with all of its functions in a library. Say for instance even ls, libls would be nice. Get all the nice dir parsing for free. Get better fine control over the functioning of it that you could get with piping.
... and end up with some really odd licensing issues, quite probably

Yes, that aside, it would be lovely if there were a slightly more sane interface to most common functions than the "everything is a string" approach that the shell command line provides.

I'd go a step further, and eliminate the need for the shell altogether by using something like gdb to call the functions interactively.

I'm not sure that C makes the best language to call these functions in, though. Much of the power of the shell is from being able to meddle with strings at low cost and I guess some amount of that would still be necessary. So, perhaps you'd need convenient string operators, which implies dynamic memory allocation - ideally with reference counting or some kind of GC.

You'd ideally want some syntax (like XML, maybe) that would let you create structures of objects at the command line

... Stop me when you realise this is another poorly disguised ad for a Lisp listener.

Is this solving the wrong problem?, posted 10 Mar 2000 at 00:22 UTC by Ankh » (Master)

For years I've been interested in the idea of reducing program size; I don't think finer granularity is the answer.

One answer might be an object-code optimiser that can inline functions from libraries, reorder functions in the code to improve demand paging performance, and optimise function calls. There's a lot an optimiser can do at this level. For example, some systems have calling conventions that require registeres to be saved before entering a library function, but if you know the library function doesn't use the registers, you can remove that code and get a speedp.

This whole area is soething I know Bjarne Stroustrop was hoping would develop for C++, because a C++ compiler would really benefit from a database of information about classes, methods and functions in an application. It's crazy that the source to a function has to be in a header file in order for it to be inlined.

I think Graydon mentioned to me he had seen a first attempt at a binary code optimiser at a Linux conference a year or two ago.

You could run such a program on a network server, so that when you run a program for the first time, if necessary, it gets optimised for you.

I think there is a difficulty with C in that as we build larger and larger software, we start to need a "libray of libraries" concept. A finer granularity of object libries only makes sense to me if you have tools and infrastructure to manage it. Right now, linking against over 100 libraries will for one thing run into limitations in software on some platforms (eg. command line length limit of 512bytes if you're typing into a Unix tty driver on many systems!) but, worse, it gets impossible for the humans to manage. Today, humans can compile Unix programs. Well, OK, I'm subhuman and I can do it. If you lose that ability, do you risk damaging the growth of Unix?

If file size and speed are important to you, work on a binary code optimiser. Or optimzer for USers :-)

Shared libs, posted 10 Mar 2000 at 01:51 UTC by djm » (Master)

I am not sure that the system you propose would give much of a net gain over shared libraries. To implement such a scheme would be complex and, as you mentioned, incompatible with what we are currently using.

The good thing about shared libs is that the cost (storage & memory) is amortised over the whole system. We could achieve a good deal of the advantages of shared functions simply by making the shared libs more granular.

Neat idea..., posted 10 Mar 2000 at 02:16 UTC by DaveD » (Observer)

Plenty of technical hurdles to overcome. Probably even more developer mindshare hurdles.

A better idea might be shared or distributed objects instead of functions. A function is pretty useless out-of-context. But being able to snag an object (e.g., attenuation response spectrum) on demand, without grabbing an entire library for, say, non-linear seismic site response, could be mighty handy. All the code relevant to the spectrum is self-contained within the class object file, which is smaller than an equivalent library.

Check out NetSolve for a working implementation of something that may be very similar to the topic of discussion.

Java classes vs jarfiles, posted 10 Mar 2000 at 16:39 UTC by mbp » (Master)

This rather reminds me of the situation in Java, where people have a choice of distributing and linking against .class files, each containing the bytecode for one class, or .jar files, containing a bundle of classes. Generally people find it simpler to distribute jarfiles, but not always.

On the other hand the startup time to dynamically link all the little fiddly bits is one of the annoyances of working in Java, so...

FreeBSD ports, posted 29 Mar 2000 at 18:00 UTC by imp » (Master)

You should consider a slightly different approach to this problem. Your main objection to shared libraries seems to be that they are hard to download. I will grant this is true for programs built on gtk, especially multi-media ones that need multiple other support libraries for sound, graphics, video, etc.

However, you should be aware that the FreeBSD ports system makes this almost painless. All the intra-library dependencies are encoded into the ports system, so when you want the latest cool video player that will also display pictures, you type make all install in the right ports directory, and all dependencies are downloaded, built and installed.

This certaily is a much less radical solution than the shared functions that you are talking about. There is also much less of a chance for name space collision, not to mention the global constructor problem or the shared/non-shared data problems that others have talked about.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page