Advogato's Number: Peer to Peer

Posted 11 Oct 2000 at 21:44 UTC by advogato Share This

Advogato is back, and this week he takes a look under the current hype wave surrounding peer to peer networking (known as P2P among the power elite), and suggests that it might be worthwhile to free software hackers, after all.

Unless you've been under a rock the past few months, you know that P2P is the Next Big Thing. It's part of Intel's marketing campaign for the P4, O'Reilly is doing summits on it, and Dave Winer has been spending a lot of time writing about it on Scripting News.

What's driving all this, of course, is the phenomenal growth of Napster. What's more, engineers immediately see how crappy the Napster protocols are, and obviously want to do better. Thus, the race is on for the Napster-killer. Dozens of contenders for this position are listed on the resources link of infoanarchy.org.

P2P is not exactly new - many of the RFC's which make up the design of the Internet have a peer to peer flavor to them. A great many classical Internet services are comprised of servers in a peer network, but users accessing those servers through clients. Email, news, and IRC fall into this category.

Each new service, you might imagine, falls somewhere on the spectrum between fully centralized, and distributed in peer-to-peer fashion. Whether it's the appropriate spot on the spectrum influences the popularity of the service, at least through indirect consequences such as availability, performance, and so on.

The explosive growth of the Web has generally been a movement towards more centralized services. With a few exceptions such as RSS, each Web server is an island unto itself. The distributed feel of the Web comes from the fact that the client aggregates all the Web servers in the world under a consistent interface with reasonable navigation between different servers. Why not a more distributed network, as envisioned by many hypertext visionaries?

I believe the main reason for the popularity of the Web is that providing a Web service is really easy. You just plug your functionality into some server and hook it up to the Internet. Yes, you have to pay for bandwidth and processing power, but that's cheap these days, especially compared with the added complexity of building a truly distributed service.

So why is Napster such a big hit? Basically, mass-scale music piracy is an application that centralized Web servers are not all that well suited to. First, because they're centralized, they're easy for the RIAA and other copyright enforcers to shut down. Second, because MP3 files are many megabytes, compared with the few hundred K typical of a Web interaction, the bandwidth costs are nontrivial. Peer to peer networking addresses both these problems.

That said, in Napster only the file transfer itself is peer to peer. All other services, including keeping track of who's out there, searching, and so on, are central. That makes the service pretty simple, but on the other hand makes it somewhat vulnerable. There are lawsuits underway to shut down Napster completely.

Thus, much of the effort on P2P networking is directed towards making services that are fully distributed, with no centralized server of any sort. Gnutella was an excellent proof of concept, but has been running into scaling and other practical limitations. Making a robust, scalable peer to peer network turns out to be a hard problem. We don't know how to do it yet. Thus, research is called for, and the experiments are indeed being done, by the many P2P projects out there. The different directions being explored are fascinating, but beyond the scope of this essay (someone else, perhaps?). Well, ok, I will put in a plug for Mojo Nation as one of the more interesting projects, and an anti-plug for the crappy Napster clones that clutter up freshmeat these days. In my opinion, a new file sharing system is not worth doing unless you're going to learn something interesting from the effort.

Thus, it seems likely that the current demand for a robust music piracy service will fuel a working, high quality peer to peer network infrastructure before too long. What are the consequences for free software?

The most interesting thing about peer to peer networking is that it can provide services that do not need a business model. All centralized services, including Web-based ones, either require revenue or a subsidy to pay for the server and bandwidth. A lot of companies are going into the open source services business these days, using open source software as a way to deliver centralized, for-pay services. Peer to peer networking provides an alternative way to provide similar services.

What kind of services do I have in mind? Music piracy is one of the easiest, but once that's solved, I can see a broad variety of interesting ones, including:

  • ISO's and packages for updating free software systems.

  • Message systems. NNTP was generally not vulnerable to the failure of a single server, as Advogato is.

  • Online backups. These are potentially much more convenient and up-to-date than classical backup techniques, but have been expensive in the commercial world.

  • CVS. Wouldn't it be cool if it didn't matter when a CVS server went down or got hosed?

Thus, I see a very bright future for peer to peer networking. Serious challenges lie ahead, though.

First, peer to peer networking is inherently much more complicated than a corresponding central service. Applications will not start happening until there is a good infrastructure in place, with reasonable interfaces. Most of the P2P services are fairly monolithic beasts with a single application (music piracy) hardwired in, but there are definite signs of movement towards more general solutions.

The second major set of issues is trust. I'll break these down into two categories: trust that a node is contributing to the overall functioning of the system, and trust in the quality of the data. These are two somewhat separate problems, but it is likely that that similar tools can be used to solve both. For example, the Advogato trust metric was originally designed to make sure that servers in a distributed Public Key Infrastructure were "playing by the rules", but has actually found it main application (so far) as a way of authenticating users posting to message board.

I'll give a more concrete example of trust in the context of online backups. Let's say you just bought a shiny new 60G hard drive and want to donate 10G's of it to backups for your friends. Obviously, if you just open it up to everybody, there's a good chance of your 10G's being immediately consumed, so your friends don't have access. On the other hand, manually configuring the access list could be a major pain, say if there are a hundred people each backing up an average 100M of files. I believe that an Advogato-like trust metric could be a good solution in cases like this.

Of course, if a few stray people get access to backup services you're offering, you don't really care. Authenticating write access to things like CVS repositories is far more critical. This is a really tricky problem. Existing public key infrastructure implementations are woefully inadequate. Perhaps a good solution will develop, but for now, it seems like manual configuration is the right way to go. Making manual configuration convenient and reliable remains as a challenge.

The third killer issue is robustness. In a distributed network, individual systems come and go randomly, fail, and become unresponsive. In Napster, one consequence is the relatively high likelihood that a file transfer will be interrupted partway. This kind of behavior is unacceptable for many applications, so any real solution will distribute the resources around the net so that individual nodes can fail without disrupting service.

The robustness issue shares many goals with high availability (HA) clustering. However, peer to peer networking has a refreshing emphasis on dynamic and automatic configuration, while most HA work involves manual configuration of the cluster. This is a showstopper for consumer applications; I believe that either the HA world will have to learn how to do automatic configuration, or peer to peer networking will gradually displace it as it becomes more robust and capable.

Fourth, if it is to be successful, any real peer to peer network has to scale. And as we know from experience, scaling is a hard problem. In a truly scalable system, it shouldn't be necessary for all nodes to keep track of all other nodes in the network. This requires protocols that split the world up nicely so that you only have to know about a subset of the other nodes, as well as mechanisms for finding needed resources outside the subset you're tracking.

These are all fascinating problems, and the amount of energy and talent that's going into finding solutions is exciting. An ideal outcome would be a robust set of protocols, and library that implements all the hard parts while presenting a clean simple interface for applications. This could well serve as the free middleware for distributed applications of the world.


Existing technologies, posted 11 Oct 2000 at 23:44 UTC by djm » (Master)

Authenticating write access to things like CVS repositories is far more critical. This is a really tricky problem. Existing public key infrastructure implementations are woefully inadequate.

Some of the infrastructure to do this already exists, in the form of KeyNote and SDSI. Both of these offer delegation of authorisation with cryptographic signatures, and both are cleaner than the X.509 model which deals more with authentication than authorisation.

Using SDSI or KeyNote, Alice can delegate access, subject to restrictons of her choice, to Bob. Bob can in turn delegate subject to both Alice's restrictions and any of his own (and so forth).

This still leaves the problem of "Trust" that Advogato describes: Alice may trust Bob, but how can she trust the people who Bob introduces. I would contend that we cannot rely on heuristic trust metrics to authorise people for free software development (particulaly security critical stuff) and that vigilence will always be important.

http://research.microsoft.com/os/millennium/mgoals.html, posted 12 Oct 2000 at 06:05 UTC by lkcl » (Master)

The Millenium Project.
Other refs: http://research.microsoft.com/sn/Millennium/
http://research.microsoft.com/os/Farsite/

Hm. strange. i'm looking for the original stuff i saw there two years ago. they were planning a distributed, dynamic filesystem. if a single server, on an ISDN line, suddenly became popular, it would ask faster servers to help out, replicate the site (or refer to pre-replicated sites).

opinion. the concept of peer-to-peer that is "getting hyped" is that of decentralising at, i think, the "application layer". this should be distinguished from decentralisation at the protocol layer. you can have a client-server protocol that can be used at the application layer _both_ ways. i.e. one machine connects to another, and the other machine connects _back_ to you.

writing a peer-to-peer protocol is... well... you can do it, but it's much easier to double-up a client-server protocol, even if it's over the same TCP socket.

hm.

typical nerdy response from luke.

P2P applied to searching, posted 12 Oct 2000 at 12:14 UTC by dnm » (Master)

Bias disclosure: I currently work as a Research Scientist for openCOLA

Another interesting application of P2P (I'm getting sick of these terms, I propose we just use _2_ [blank2blank] and get on with the real work) is in building reputation markets to provide increased relevance for searching the 'Net for content. The basic premise is that, through the use of a trust metric, you create a reputation market of people who have consistently provided you with relevant results on a given topic. When searching for information, you can spontaneously create larger neighboorhoods of relevant interests.

Should I search for information on something which my current neighboorhood has no relevant results for, I create an immediate metric, the basis of which is that the current neighboorhood has nothing I want, therefore, look for other people on the opposite end of the spectrum from my current peers to bring into my neighboorhood who do.

There's no shortage of technical implementation issues here, a major one being traffic handling. With so many requests being thrown back and forth, handling effcient communications is an issue, and getting the network to dynamically create and destroy peer groups effectively without adding undue latency is tough.

Other potential avenues? I can think of a lot, but I think it's a far more practical exercise to simply look to human interactions and the social fabric. There's a rich pool of tools there, and not all of them map well to an abstract implementation suitable for computing. Many do, and those that are already there could use additional work to make them useful.

FreeNet, posted 12 Oct 2000 at 19:26 UTC by sab39 » (Master)

FreeNet is another P2P product which has some really interesting ideas about dealing with the issues you address. An interesting aspect of FreeNet, though, is that some of the things that were first to be added to Gnutella and Napster seem to be much lower on FreeNet's priority list. For example, FreeNet can't do searching at all yet; you have to know the "key" of the document you want to retrieve.

FreeNet seems to have many of the technical issues well in hand (especially scalability and attack-proofing) but it will be interesting to see whether it is possible to create an easy-to-use interface to go with their complex infrastructure.

P2P and P3P?, posted 13 Oct 2000 at 09:10 UTC by Raphael » (Master)

Using acronyms can be confusing sometimes, especially when you replace letters by numbers. So let's not confuse P2P (Peer-to-Peer) with P3P (Platform for Privacy Preferences). This may be a stupid joke, but maybe not. The P2P protocols could use a bit of P3P so that each peer knows what the other is going to do with the information they are exchanging.

And of course, this is totally unrelated with P4P (Pedals for Progress), PPP, PDP and other P*P TLAs...

Also, in reply to the last comment about FreeNet, I recommend that you read the previous article about FreeNet that was posted here a few weeks ago. It contains lots of interesting comments, especially mettw's reply to graydon. Personally, I think that a system encouraging people to do things that are outside any judicial control will probably do more harm than good (if you cannot find any democratic country in the world in which the things that you are doing are at least tolerated, then there is probably something wrong with you, not with the rest of the world).

Routing, posted 14 Oct 2000 at 11:24 UTC by Malx » (Journeyer)

With P2P (heterogenous network-HN?) we actually have network of nodes over IP network. And there we must solve same problem as we have with routing now.
In FreeNet (and some others) flood routing algorithm is used. The simplest one.
Does it means p2p protocol could use RIP,OSPF,BGP to inter HW routing of messages and queries?

If yes - why not to use IP existed already (esp. multicast) ;)
If no (anonimity) - could it be posible to advance algorithms using data contained in messages in addition to from/to values.

IRC, jabber, distrib-advogato must have something like this.

How about sertification and DSign? Current PKI infrastructure relay on sertain IP adreses to validate certs and on tree-like structure.
Would it be simpler to have algorithm to detect intrusion and restore data quickly in place of infrastructure to make shure access rights would be absolute?

Keynote, posted 15 Oct 2000 at 22:52 UTC by lkcl » (Master)

How about sertification and DSign? Current PKI infrastructure relay on sertain IP adreses to validate certs and on tree-like structure.
keynote's bindings are to digitally-signed "assertions", e.g "application == sendmail -> true && uid < 1000 -> true". you _could_ create a keynote binding that controlled by IP addresses.

whilst i am still having some conceptual difficulty integrating keynote into my head, i get it to an extent that tells me it's very useful.

digital signatures on regular expressions. gotta be worth something.

A plea for metadata, posted 16 Oct 2000 at 00:28 UTC by rillian » (Master)

I humbly suggest that a decent metadata scheme be added to the general requirements pool for this topic. Many of these 'peer-to-peer' transfer systems amount to a next-generation web. Wouldn't it also be nice if we got next-generation searching?

One of the worst things about the web is the 'keyword only' search system. Of course, the web per se doesn't have any built-in search capacity, but in practice that's what most everyone uses outside of limited domains where there are reasonable indexes (e.g. freshmeat or the IMDB)

Of course, people have been talking about adding better index capabilities for a long time. This is good because there's lots of work to draw on, but has been ineffective on the web for lack of universal adoption. The situation with napster and friends is even worse, where often all you have to go on is a filename, and 'just click and see' is really tedious with non-text data.

We have a great chance here to implement a reasonable, robust, abuse-resistant metaindexing system and make it universally useful. The metadata can go in as an integral part of the trust and review metric and maintain itself the same way as the rest of the data in the system. I want to be able to search for "all materials with Suzanne Ciani as a creator or contributor dated after 1995, sorted by musicological recommendation, then quality."

Please, make friends with a librarian. Meditate before dublincore.org until you achieve enlightenment. We'll all thank you later.

Search NG, posted 17 Oct 2000 at 19:48 UTC by Malx » (Journeyer)

It's good idea to make distributed searches with trust metrics.
I thinks I would try to create this system this year

Also www.google.com have idea of using links from other sites to calculate relevance - it is a little like trust metrics :)
Other thing I want to note - you can search for pages, but you can't search for answer still. You can't ask "What is the distance from Earth to Sun?" or "How many people using Mozilla browser in Australia"....

Last note - there is no question/answer system except USENET. And last have no relevance mechanism - so hard questions will see main developers only.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page