Advogato
I finally got around to fixing the locks so that lock
contention won't
cause huge delays in reading pages. Writing (updating
diaries and the
like) can still be affected, but this is less urgent to fix.
LotR wrote:
We need a diary-writing trust metric!
Okay. I think you're right. I might well be motivated to
write a
generic metadata engine and apply it to the specific
application of
"how interesting is diary X?". Here's roughly how it will work.
When you're logged in, you'll get a chance to enter a
one-to-ten score
for another user's diary page. I might put this right under the
"Certify <user> as:" selection at the bottom of individual
person pages, but I'm also inclined to make it more
accessible, for example
allowing bulk updates on a customized version of the
recentlog page.
This goes into the database as generalized
assertions. At
first, the
only assertions that will be allowed are of the form
"<user>'s
diary is 7 on a one-to-ten scale", but the engine doesn't
care what
kind of assertions are present. "Roquefort is a particularly
fine
cheese" is also plausible. The reason for limiting the
assertion space
is to avoid scaling problems, which can become quite severe
as the
number of assertions scales up.
Then, roughly nightly, there will be a process that
computes
metadata
scores, using the method I presented in my HOWTO.
This
will compute a confidence value for each user in the trust
graph and
each assertion. You can see where the scaling problems come
from. I am
sure there exist techniques for storing this data more
sparsely, but
I'm not interested in doing that research now.
Finally, the recentlog display will be annotated
with the
metadata
scores. I'll probably also put in an threshold option.
bytesplit
I am trying my best to be patient with bytesplit.
I realize
he is a human being like all of us, but for whatever reason
driven by
demons causing him to antagonize people here. I sincerely
wish that he
is able to tame these demons, and interact positively with
Advogato.
At the same time, I realize this is unlikely. As such,
bytesplit is
providing an opportunity to look at the trust metrics and
the dynamics
of this site more critically. The current trust metric
certainly has
limitations, and is definitely not a magic bullet for making
this site
an interesting read and a comfortable place. That's up to us.
What the trust metric does do is automatically
compute
membership in the community based on peer certifications.
While I
personally feel that bytesplit's contributions to free
software are
marginal at best, ten people here feel that his level of
interest is
high enough to rate an Apprentice cert. And, he does show
interest in
learning more, and his on-topic writings are perfectly
reasonable for
an aspiring apprentice. Given that, I don't think the trust
metric
should reject bytesplit's ranking.
All this is good motivation to implement the generalized
metadata as
proposed above. Unlike the existing trust metric, this
metadata system
would directly address quality and relevance of
writing. I'll be
very interested to see how it goes.
Cert inflation
We definitely have cert inflation here. Part of that is
because the trust metric is generous, part of it is that
people here are generally doing an inaccurate job of
evaluating peer cert levels. This is useful information for
people trying to design metadata systems: a significant
fraction of the information input will simply be wrong.
I could certainly make the trust metric less generous. The
easiest way to do this would be to have negative
certifications as well as positive ones. But I'm not
convinced that cert inflation is the most important problem
in the world to solve.
Asynchrony
David McCusker called again, and we had another nice
chat,
this time
focussing on writing programs in asynchronous style. I think
it's a
hard problem. I think it's even worse for library writers,
because it
may not be realistic to assume that most users of your
library will
understand asynchronous programming very well. I told David
of X as a
cautionary tale. X actually has very sophisticated logic for
dealing
with asynchrony properly. For newcomers to X, this all seems
very
intimidating and complex (asynchronous grabs are a good case in
point). In fact, I think there is widespread failure in
levels above X
to deal with race conditions and the like correctly.
Every time you do something over the network, it's
asynchronous
whether you like it or not. Yet, event-driven programs seem
a lot more
complex than their simple, synchronous cousins. David would
like to
recapture that simplicity in asynchronous programs. A lot of
other
people have tried things in this direction, without very
happy results
so far. I feel that CORBA is a cautionary tale in this
regard. It
pretends that method calls are really local, when in reality
they're
decomposed into two asynchronous events, and of course all
kinds of
things can happen in the meantime.
I haven't seen any of the details of Mithril yet, but
I'm fairly
skeptical that it will make asynchronous programming
accessible to
less-skilled programmers. On the other hand, I am perfectly
willing to
believe that it will be a good tool for expressing asynchrony
concisely, and thus useful for people who know what they're
doing.
One detail we touched on but didn't really go into was
whether the
fundamental message sending operation on channels should be
synchronous (as in CSP) or asynchronous. In CSP, if you send
a message
on a channel, but there is nobody ready to readon the
channel, you
block. The other way to do it is to append the message to a
queue.
Both are reasonable primitives, in that it's quite
straightforward to
simulate one in terms of the other. So which do you choose?
I mentioned that the CSP way might be easier to reason
about. There's
another issue that came to mind after our call: the queue
required for
the fully asynchronous case requires unbounded resources in the
general case. Obviously, in tiny embedded systems, this can
be a real
problem. On desktops, it's less clear. But if a system is
operating
under very high load, you probably want to worry about
whether the
queues will keep growing. Of course you can always implement
flow
control on top of async messages, but that's not really the
point. On
CSP, the default is not to grow unboundedly.
mwh: I haven't been
following Stackless Python
closely, but I am
aware of it. Looking briefly at the site, I see they are now
implementing a concurrency and channel approach directly
inspired by
Limbo and CSP. That could be very cool.