Older blog entries for titus (starting at number 139)

More trustiness

On Sunday, I hacked together a quick Python wrapper for raph's 'net_flow' implementation. Another hour or two of hacking has produced a Python implementation of the advogato trust metric (which actually consists of three distinct trust flows).

Steven Rainwater of robots.net

graciously gave me access to his actual robots.net configuration file, and I verified that my net_flow Python code reproduces the actual robots.net certifications. So I'm fairly sure that the code functions properly -- this isn't too surprising, because it really is a very simple wrapper around raph's code.

In any case, you can download a tar.gz here; or, visit the README. It's fun to play with... note that the distribution contains HTML-scraped advogato and robots.net certifications from this Monday, so you can play with the actual network yourself. (Please don't scrape the sites yourself without asking raph or Steven; yes, I transgressed with advogato, but that doesn't mean you should ;)

Relative to raph's recent "ranting", I hope this little package inspires people to play with trust metrics. There are a couple of easy hacks people could do to with this code:

  • Write Ruby, Perl, etc. wrappings (mebbe with SWIG);
  • Liberate the code from the GLib 2.0 dependency;
  • Look at the actual topology of the advogato.org network in a variety of ways;
etc.

Incidentally, it seems like I really do think best in code. This little exercise has given me a bunch of ideas, most of which only popped up once I got a working Python API and it was clear just how easy it would be to implement them...

--titus

Link Madness

Bill Moyers on this administration, the press, and secrecy.

Torture's Long Shadow by Vladimir Bukovsky. Whoo.

And, finally, as an antidote: waaaay too cute.

Trustiness

A spot of hacking tonight produced gratifying results. In Python,

from net_flow import TrustNetwork

capacities = [ 20, 7, 2, 1 ] network = TrustNetwork()

network.add_edge("-", "el_seed") network.add_edge("el_seed", "test1") network.add_edge("test1", "test2") network.add_edge("test3", "test4") network.add_edge("-", "test4")

network.calculate(capacities)

for user in network: print user, '\t', network.is_auth(user)

produces

-               True
el_seed         True
test1           True
test2           True
test3           False
test4           True

which looks more-or-less correct to me; c.f. http://www.advogato.org/trust-metric.html.

The net_flow Python module contains a class wrapper around a hand-written wrapper for mod_virgule's net_flow.c. Using net_flow with a scraped download of the current advogato certification network (sorry, raph...) I can reproduce much of the current list of certified masters. I get 602; the actual list contains 766 members. So obviously I'm still doing something wrong, but probably it's just a matter of reading the mod_virgule source code a bit better.

raph, if you happen to read this, could you verify that the capacities:

capacities = [ 800, 200, 200, 50, 12, 4, 2 ]

and the seeds

network.add_edge("-", "raph")
network.add_edge("-", "miguel")
network.add_edge("-", "federico")
network.add_edge("-", "alan")

are what advogato is currently using, please?

thanks,
--titus

p.s. I'll make net_flow available once I clean it up and validate it a bit more.

16 Dec 2005 (updated 16 Dec 2005 at 19:55 UTC) »
Collaborative Agile Test-Driven Development

Yeah, OK, that's a bit buzzwordy, hey? ;)

In addition to doing a bunch of time-critical experiments, applying for several independent positions, finishing up a consulting job, and trying to stay in shape with running, swimming, and ultimate frisbee, I've been working with Grig on an application for our PyCon tutorial.

I'm in a good mood because we just finished the prototype, which does everything that I want, albeit slowly and badly with an ugly interface.

Our first public release is scheduled for January 5th, at which point we will unveil the application & people can check out the development site and tests.

In the spirit of Grig's post, here are a few observations.

  • Trac does rock. The "view milestone ticket progress, by component" is my favorite view.

  • Being able to bounce ideas off of Grig is fantastic.

  • Being able to assign tasks to Grig via the Trac ticketing system is even more fun, even if it's a bit of a guilty pleasure ;).

  • There are lots of types of testing tools out there, and I don't know if anyone has ever sat down and implemented all of the different types on an open-source project. (e.g. not py.unit and nosetest on the same project, but unit tests, Web tests, performance tests, acceptance tests, log tests, pester-style code-rewriting tests, random form-filling tests, etc.) I doubt it's ever been done for a project this small: I don't think it will be much more than 1000 lines of core application code by the time we're done. The test code will easily outweight the core application by a factor of 2-3, I bet.

  • Prototyping, ticketing, tests, and wikis all work together in an amazingly synergistic fashion.

Of course, now Grig and I have to worry that we'll unveil this app and people will go "ehh" and wonder why we're so excited. O well, that's a risk we'll have to take... those tutorials are non-refundable, right? ;)

Trac and SCGI/WSGI patches -- or, why to use Darcs

It's mildly annoying (and, more to the point, inconvenient to me ;) that the Trac people aren't checking in the WSGI support stuff. Right now I'm running several trac instances off of subversion latest, with the WSGI/SCGI patches in the development directory. I can't check the patches in myself, because I don't have (or want) Trac subversion access; I don't know if it's even possible to set up a patch stream with svn; forking is stupid; and I sure as hell don't want to set up tailor to convert between svn and darcs. What to do? Wait for Trac 2.0, I guess...

(As my advisor would say, "whine, bitch, moan, complain" (add a dismissive hand-wave in your imagination).)

This leads to a separate point: two of the best Python projects of all time, Trac and Mailman, don't use any of the Python Web frameworks, near as I can tell. From this, I conclude that either all Python Web frameworks suck (because we compulsively (re)invent different wheels) or alternatively Python core is 80% of a Web framework in itself (so people can roll their own in less time than it takes to understand an existing one).

Discuss amongst yourselves. (Add another dismissive hand-wave ;).

--titus

Laying eggs

I'm now firmly committed to PJE's setuptools/easy_install. It's invaluable as a way to make precompiled Python distributions available.

As part of our agile development tutorial at PyCon, Grig and I are developing an application. (More on the app anon: our first release is due by the end of the month.) The app depends on Durus and CherryPy, neither of which "just work" with easy_install on Windows.

For Durus, the problem is that it has a binary extension; you need a C compiler to build it.

For CherryPy, there's some issue with SourceForge download page breakage, and maybe something problematic with paths on Windows XP. (I haven't isolated the problem.)

So, I built eggs. I found a colleague upstairs, Brandon King, who had just gotten Windows compilation working for Python packages; after patching in

from setuptools import setup
to the top of setup.py for both packages, 'python setup.py bdist_egg' produced nice, functional eggs.

They are available at

http://issola.caltech.edu/~t/dist/

and can be grabbed with


easy_install -f http://issola.caltech.edu/~t/dist/ Durus

or
easy_install -f http://issola.caltech.edu/~t/dist/ CherryPy

I'm still having some issues installing things on Python-naive boxes (that is, Windows boxes with just a standard install of Python) but that will have to wait for the first release. (FWIW, the problem will probably be fixed by building eggs for the pywin32 code.

PyPi

Now that I've been using PyPi and easy_install for a few weeks, I'd guess that about 80% of packages are directly and immediately installable via easy_install by typing 'easy_install package_name'. I've run across a bunch that aren't, though. Those include Zope3 and CherryPy; zope.testbrowser also had a problem, but I think that was an issue with the '.' in the middle of the name.

I would be very happy if it were possible to install every package on PyPi with easy_install, and it might be a worthwhile project to highlight those that can't, for whatever reason. Hmm, could become part of the Cheesecake project... a list of all the projects that don't work with easy_install, separated into lists of those that are easy_install's fault vs those that are the author's fault. (The latter would, of course, be in bright red with BLINK tags.) Perhaps we could even call the list "Sweaty Cheese" ;).

Another fantastically useful project would be to automatically download and build Windows and Mac OS X eggs for all of the PyPi projects. Hmm, there'd be some security issues, but I bet you could work something out with public keys where only packages authorized by some key authority would be automatically downloaded. Humm.

--titus

7 Dec 2005 (updated 7 Dec 2005 at 19:15 UTC) »
Oblique Strategies

I'm a big fan of Oblique Strategies; so once I found robin parmer's python implementation, I thought why not write a quick Web site for it?

Clearly I need to spend more time on work. But it was so quick and easy... ;)

Speaking of which...

It's just too easy

The discussion on Aaron Swartz's blog about rewriting reddit & web.py illustrates a few amusing points about Python. Apart from the downright absurdity of some of the discussion so far -- general Lisp snarkiness, and Aaron's assertion that all bajillion Python Web frameworks suck (except for his, which isn't available yet...) -- I think a few truths emerge.

The main truth is that it's clearly too easy to write your own Web framework in Python. It's less work to code a few hundred lines of Python than it is to understand someone else's few hundred lines of Python; it's also easier to continue thinking like you already do than it is to adapt your thinking to someone else's API. And, most important of all, a few hundred lines of Python is really all you need for a fully functional Web app framework. Thus, our massive proliferation of Web frameworks. (As Harald Massa writes: Python: the only language with more Web frameworks than keywords</a>.)

Clearly, the only way to cut down on the number of Web frameworks is to make it much harder to write them. If Guido were really going to help resolve this Web frameworks mess, he'd make Python a much uglier and less powerful language. I can't help but feel that that's the wrong way to go ;).

Another truth that I stumbled over yesterday: it's much harder to write good, clean, maintainable, documented, tested code than it is to write a functional Web framework. Partly this is a matter of withstanding the test of time; partly it's a matter of development practices. If there's one thing I'd like to explore for my own projects, it's how to keep tests, documentation, and code all in sync. (Want to know how to do it? Come to our tutorial!)

I think the test for the future will be simple survival; this will be based on things like documentation more than on functionality. For example, Quixote, though powerful, suffers from poor documentation. CherryPy, which enforces a similar coding approach on apps, has an attractive, busy Web site. Getting started with CherryPy is simple; getting started with Quixote is not so simple. This really matters in the long run.

Picking a Web framework

The above thoughts occurred partly in the context of my own choice of frameworks for a new project.

I'm starting a new project for our Agile Development and Testing tutorial at PyCon, and I wanted to try something other than Quixote (just for novelty's sake, ya know?). My rough requirements were: must run on Windows; must not commit me to a database other than Durus, by e.g. requiring SQLObject; shouldn't be a huge framework (I'd like to distribute the entire thing in an egg); and needn't (shouldn't) be a full content-management system. So I took a look at QP, browsed around Django and TurboGears, visited web.py, and settled on CherryPy. (Actually, I started

on CherryPy and then discovered that their introductory "hello, world" example didn't work with their latest stable release, 2.1.0, and gave up in frustration. Then I came back to it after striking out with the others.)

So, why CherryPy?

Django and TurboGears are too much. Django is more of a CMS, and TurboGears commits me to too many packages.

QP is largely undocumented and doesn't run on Windows. (The former is a real problem: I spent 15 minutes trying to figure out how to run QP on something other than localhost, and couldn't manage.)

web.py? Not available yet. Maybe it will be, maybe it won't be... but it's awfully hard for me to evaluate a package based on 50 lines of documentation and no code, although I applaud the attitude.

CherryPy is just what I need: lightweight, fairly obvious code on my side, nothing complex required. We'll see how I feel in a week or two ;). I've already broken it once or twice, and I think the internal code is more complex than it needs to be (based solely on ~30 minutes of browsing around to fix the breakage) but I needed to commit to something before Grig keel hauls me... and CherryPy is clearly the best-suited of the things I looked at.

And it's got a really attractive Web site...

In other news...

Commentary

is one of the coolest things to cross my radar screen in a while. (It's an AJAX-y way of adding comments to Web sites.)

I think there's a really good application somewhere in here; something combining Raph's trust metric stuff with Commentary to make a community-wide commenting/annotation system. (I put some work into such a thing earlier this year, but decided to focus on twill first.) I hope someone else writes it so I don't have to ;).

--titus

It's just too easy

The discussion on Aaron Swartz's blog about rewriting reddit & web.py illustrates a few amusing points about Python. Apart from the downright absurdity of some of the discussion so far -- general Lisp snarkiness, and Aaron's assertion that all bajillion Python Web frameworks suck (except for his, which isn't available yet...) -- I think a few truths emerge.

The main truth is that it's clearly too easy to write your own Web framework in Python. It's less work to code a few hundred lines of Python than it is to understand someone else's few hundred lines of Python; it's also easier to continue thinking like you already do than it is to adapt your thinking to someone else's API. And, most important of all, a few hundred lines of Python is really all you need for a fully functional Web app framework. Thus, our massive proliferation of Web frameworks. (As Harald Massa writes: Python: the only language with more Web frameworks than keywords</a>.)

Clearly, the only way to cut down on the number of Web frameworks is to make it much harder to write them. If Guido were really going to help resolve this Web frameworks mess, he'd make Python a much uglier and less powerful language. I can't help but feel that that's the wrong way to go ;).

Another truth that I stumbled over yesterday: it's much harder to write good, clean, maintainable, documented, tested code than it is to write a functional Web framework. Partly this is a matter of withstanding the test of time; partly it's a matter of development practices. If there's one thing I'd like to explore for my own projects, it's how to keep tests, documentation, and code all in sync. (Want to know how to do it? Come to our tutorial!)

I think the test for the future will be simple survival; this will be based on things like documentation more than on functionality. For example, Quixote, though powerful, suffers from poor documentation. CherryPy, which enforces a similar coding approach on apps, has an attractive, busy Web site. Getting started with CherryPy is simple; getting started with Quixote is not so simple. This really matters in the long run.

Picking a Web framework

The above thoughts occurred partly in the context of my own choice of frameworks for a new project.

I'm starting a new project for our Agile Development and Testing tutorial at PyCon, and I wanted to try something other than Quixote (just for novelty's sake, ya know?). My rough requirements were: must run on Windows; must not commit me to a database other than Durus, by e.g. requiring SQLObject; shouldn't be a huge framework (I'd like to distribute the entire thing in an egg); and needn't (shouldn't) be a full content-management system. So I took a look at QP, browsed around Django and TurboGears, visited web.py, and settled on CherryPy. (Actually, I started

on CherryPy and then discovered that their introductory "hello, world" example didn't work with their latest stable release, 2.1.0, and gave up in frustration. Then I came back to it after striking out with the others.)

So, why CherryPy?

Django and TurboGears are too much. Django is more of a CMS, and TurboGears commits me to too many packages.

QP is largely undocumented and doesn't run on Windows. (The former is a real problem: I spent 15 minutes trying to figure out how to run QP on something other than localhost, and couldn't manage.)

web.py? Not available yet. Maybe it will be, maybe it won't be... but it's awfully hard for me to evaluate a package based on 50 lines of documentation and no code, although I applaud the attitude.

CherryPy is just what I need: lightweight, fairly obvious code on my side, nothing complex required. We'll see how I feel in a week or two ;). I've already broken it once or twice, and I think the internal code is more complex than it needs to be (based solely on ~30 minutes of browsing around to fix the breakage) but I needed to commit to something before Grig keel hauls me... and CherryPy is clearly the best-suited of the things I looked at.

And it's got a really attractive Web site...

In other news...

Commentary

is one of the coolest things to cross my radar screen in a while. (It's an AJAX-y way of adding comments to Web sites.)

I think there's a really good application somewhere in here; something combining Raph's trust metric stuff with Commentary to make a community-wide commenting/annotation system. (I put some work into such a thing earlier this year, but decided to focus on twill first.) I hope someone else writes it so I don't have to ;).

--titus

socal-piggies dinner meet

Anyone in the LA area who is interested in Malaysian food & Python conversation on Wednesday, contact me soon -- I'm confirming reservations for Kuala Lumpur in Pasadena, at 7:30pm. (So far we have 11 people!)

--titus

5 Dec 2005 (updated 5 Dec 2005 at 18:44 UTC) »
Two interesting interviews.

How BioWare Makes Game Communities Work

An Interview with Scott Bakker

The Lone Software Developer's Methodology

Entertaining and interesting rant (with explanation).

When unit testers fail...

At some point over the last week, my nose unit tests for twill started flailing. Yes, not just failing, but flailing. There was so much output that it was tough to figure out exactly what was going on. I spent a few minutes across a few days trying to figure out what had changed; I'm pretty sure it was due to a nose version upgrade, now, but I could never actually figure out what version of nose recognized all my tests *and* still worked.

Whatever the proximal cause, it turns out I designed my unit tests incorrectly. The tests are in modules (separated in individual files), with 'setup', 'test', and 'teardown' functions in each module. The latest version(s) of nose was discovering the 'setup' and 'test' functions and running (as near as I can tell) 'setup' twice, 'test' once, and 'teardown' zero times.

I'm still not sure if this is a bug; I couldn't figure out what the behavior of nose should be, because it's not explicitly documented on the nose page. (Maybe it's on the py.test page?)

Finally, I discovered 'setup_module' and 'teardown_module' and renamed setup and teardown in all my tests. That solved my problems.

I also learned (from the nose documentation) that you could specify a test collector in setup.py when using setuptools. So now 'python setup.py test' will run all the tests.

The shocker for me in all of this was how uncomfortable I felt even thinking about modifying twill without the unit test framework. Even though I could execute the tests individually, I knew that I wouldn't run them nearly as frequently as I do when 'nose' is working. Automated tests are a really important security blanket now...

What users want, and what they'll get.

So, it appears that an increasingly common use case for twill is Web browsing.

When I started developing twill, I intended to develop a replacement for PBP. PBP was aimed at Web testing, and that's what I needed myself. Roughly, this meant:

  • a simple scripting language, with an interactive shell;
  • Functional cookies/formfilling/etc, so that interacting with a Web site actually worked;
  • a fair amount of assert stuff, like "code" (to make assertions about return codes) and "find" (to make assertions about presence of specific text/regexps).

Then I made a few design choices: to use python and python's Cmd for the interactive shell, and to use a metaclass to automagically import all twill.commands functions into the interactive shell. This meant that all of the functions accessible from twill were also directly accessible via Python, and the twill language docs functioned equally well as Python language docs.

Thus, twill could be used from Python code to do most Web browsing tasks that didn't involve JavaScript.

I was happy with this result, but it was largely unintended. I mostly use twill via the scripting language.

However, the early press on twill was from people like Grig and Michele, who talked glowingly about the ability to use twill from Python. This has led to people wanting more functionality: especially, sensible return values from twill commands. This, in turn, has led to a bit of caviling on my part about this direction, because I haven't really thought it through.

Anyway, the upshot is that I have to rethink my priorities for twill a bit. I was going to focus on recording functionality and extension management for the next release, but it seems like I should also work on simplifying and documenting the browser interaction code. Given the one-to-one mapping between twill.commands and the scripting language, I don't want to add things like return values and Python-specific documentation to the commands; perhaps I can satisfy people with a browser-object interface...

The big surprise for me -- and it really shouldn't have been a surprise -- is that people really seem to want to do Web browsing via a trivially simple interface: go, follow, submit, and get_page contain 90% of the desired functionality. mechanize and mechanoid are serious overkill for this level of Web browsing, and the fact that twill puts a high priority on "just working" (albeit at the expense of customizability) probably helps contribute to users' interest.

The simplest route for me to go is probably to work on two applications I've been planning: twill-crawler, and mailman-api. (Presumably the names give away the function ;).) Then I'll have some idea of what people need for Web browsing; right now I'm feeling a bit underplanned, so to speak.

--titus

Review of "Endless Forms Most Beautiful"

A review of "Endless Forms Most Beautiful" by Sean Carroll. (This is a book on evolutionary developmental biology, which is one of the things my lab works on.)

In-process WSGI testing

Reached a stable point with a little side project: wsgi_intercept. This is a broken-out version of my in-process WSGI testing stuff

that works for all of the Python Web testing tools I could find. Specifically, I monkey-patched webtest and webunit; provided a urllib2-compatible handler; and subclassed the mechanoid, mechanize, and zope.testbrowser Browser classes.

I ran into some minor but nonetheless annoying problems along the way. For example, Zope (which is needed for parts of zope.testbrowser) cannot be installed with "easy_install zope"; the PyPi page doesn't link to a download page, and even when downloaded, Zope calls 'setup.py' 'install.py' instead. This confuses easy_install.

I also couldn't figure out contact info for either Zope or CherryPy (the owners of webtest). I didn't look terribly hard, but the PyPi contact e-mail for Zope is a subscriber's-only list, and <team at cherryp.org> (which is the e-mail address at the top of webtest.py) doesn't exist. CherryPy folks -- someone, please contact me if you want the patches to webtest (or just grab them from the wsgi_intercept code yourself, of course!)

And easy_install seems to be confused by packages with '.' in their names; zope.testbrowser doesn't install with just an 'easy_install zope.testbrowser'. (Spoiled, aren't I, to expect it all to work so easily!)

But these are only minor gripes. On the whole, the packages I downloaded and modified had nice, clean source code. I think there's something about people who write testing tools that leads them to clean up their code ;).

A few quick off-the-cuff opinions, while I'm at it:

  • zope.testbrowser is indeed a simple, clean interface to mechanize. (Python2.4 only, however.)

  • I like the way mechanoid (a fork of mechanize) has broken out the class structure of mechanize into files.

  • If I needed a minimal Web testing tool, I'd use webtest.

  • funkload (based on webunit) looks pretty neat. (In the Python world, it's probably the major competitor to twill; they're focusing a bit more on load-testing, though.)

  • people should use ClientCookie rather than urllib2, I think. It does more, and it's written by the same person ;).

  • One of mechanize's big problems is its retention of backwards compatibility. John seems intent on keeping python 2.2 and up working in mechanize; I think that complicates the code more than it should.

Anyhoo, g'nite.

--titus

twill 0.8

"85% unit tested". ;)

PyPi entry, announcement, ChangeLog.

--titus

130 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!