Older blog entries for follower (starting at number 31)

Work
Be happy for me, I have a job I enjoy... :-) (For at least the next month and a bit anyway.)

Development continues on the uncaught exceptions handler for Python, although mostly on the unit testing side which leads to:

  • A (possibly) interesting question: How do you unit test a replacement for the system exception handler? If you use the unittest module as it is, it always traps exceptions, so if you want to check whether your exception handler captures uncaught exceptions you can't, because they're not uncaught, 'cos unittest's exception handler captures them... Got it? :-) My answer was to run the tests outside the standard unittest framework and somehow link them into it. Which leads to...
  • External unit tests I'm probably reimplementing the wheel here but I now have a new ExternalUnitTestMixin class which allows the standard unittest framework to run an outside command and testing its output against the "correct" output (using os.popen4()). In my situation the command is simply running a Python script which tests that the exception handler prints the correct results. (e.g. nothing, an incident id or a traceback depending on the debug mode and where the exception occurred.) (Of course when you're capturing exceptions the line numbers can change if you're editing the code, my comparison checker optionally filters out line number references to avoid this issue.)

    The kinda nifty thing about this implementation is that the external unit tests are performed by running the same Python file that contains the "normal" unit tests as a script instead of importing it into the test runner. The external tests are all functions within the module which are extracted dynamically by a customised TestSuite class. I like it being done this way because it reduces the number of places you have to look for test code.

    Unfortunately I discovered that the unittest module's loadTestsFromModule() function doesn't automatically load TestSuites only TestCases, which leads us to...

  • Importing test suites I came up with a two line patch to loadTestsFromModule() to make it import TestSuite subclasses it finds in addition to TestCases. This means you can use "tricky" custom TestsSuites but they're imported automatically, and treated as just another TestCase. Admittedly this is all a somewhat vague description, but hopefully the code's clearer, and it'll be better when/if I can put up some example somewhere.

  • All of this is being controlled by our unit test runner which helpfully searches, loads and runs all our unit tests with just one command, which is handy, although not that original, hopefully I'll be able to put this up somewhere too.
7 Feb 2003 (updated 7 Feb 2003 at 08:02 UTC) »
Talkback ID functionality in Python (continued)

Thanks for your comments Adrian. Here's some responses... :-)

  • The final "complete" incident id that is generated will probably have the/a version number prepended; I think it's more useful than making it part of the hash itself. The final form is probably something like:
    <ver>--<id>--<line> (e.g. 501-33425-22) where <id> is generated from the hash of the function path. Keeping the version number separate means that exceptions from the same call path in different versions maintain the same id which I think is useful.
  • The final purpose of all this is to create a "catchall uncaught exception handler" so you might not have to make time to do it yourself... :-) Essentially you'll be able to import the handler module, start it (which hooks the exception handler), and register some Reporters. The reporters determine how the otherwise unhandled exception is reported (surprise!). For example, the current very basic reporters display the incident id in the console and email a report to an email address. Other possibilities include writing a log file, paging a tech or displaying a message dialog--which are reporters we're likely to implement. (Could have a direct-to-web option too...)
  • For the purposes we're using the module we don't want to re-raise the exception--we want to supress it completely as far as the user is concerned--but a reporter could be created to display the traceback as per normal behaviour.
  • I also briefly considered embedding a mini-stack trace in the incident id, but decided it would probably make it too long and enumerating the functions would make the system more difficult to use overall.
  • Thanks for mentioning anonymous functions and lambdas, I hadn't specifically thought about them. I haven't checked what the module does with them at the moment, but will have to do so, it depends on how Python reports them in the stack trace...
The module is mostly production ready in its current state, could be ready for release early next week--my employers are happy for it to be GPL'd.

Other stuff...

  • Having an employer that uses and is happy to release open source is cool.
  • Being paid to work with Python is (still) cool.
  • I have no idea what other people think about reading it, but looking at what I've written for this & my last entry maybe I will yet be able to get interested enough about something to actually write something significant about it.

Later...

CodeCon How come it's taken me until now to realise that most of CodeCon takes place over the weekend, not during the week? That sudden realisation has greatly increased the likelihood of me going... (One other question, is it just my imagination or was the original discount price $75?)

5 Feb 2003 (updated 5 Feb 2003 at 09:27 UTC) »
Talkback ID functionality in Python
Interesting task at work today... While working on a wrapper to catch otherwise unhandled exceptions and deal with them "nicely", I started wondering how Full Circle Software's TalkBack (their website is surprisingly difficult to find using Google...) software and other similar products calculate their unique "Incident ID".

By my understanding, the key attributes of an incident id are:

  • It is short, say 5 to 7 digits.
  • Unique for the incident, i.e. no matter what machine the error occurs on, the same cause generates the same ID. (With some degree of certainty anyway...)

I had a look around on the net but after discovering that TalkBack is actually a closed source product, didn't find any implementation details. I eventually found Anet: A Network Game Programming Library which includes crash logging functionality. After a quick glance at the code (note: only the tar file seems to exist now) I couldn't readily identify a routine that calculates an incident id.

After further reading on crash signatures and the like I've hypothesized that the TalkBack ID and other similar incident ids are probably calculated by running some sort of hash algorithm over items (e.g. relative function call addresses) from the stack trace. But, I don't know for sure. Does anyone around here have any details on how TalkBack or BugToaster crash signatures/incident ids are created?

Anyway, I wanted to come up with some way to calculate a similar incident ID for otherwise uncaught Python exceptions.

The two approaches (with different trade-offs) I ended up with were to generate an incident id from either of the following sets of information found in the trace-back:

  • Line numbers
  • Function names

Using line numbers has the advantage that each exception in a particular function has a unique incident id, but the disadvantage that changes in the code (even the addition or deletion of comments) affects the id dramatically. This method is most suitable for stable final-release type code.

I think we've decided to use function names to generate the ids. Given a series of calls to the following functions (with an uncaught exception handled in the last function):

f1(), f2(), f3

we assemble a string of the form "f1f2f3" (i.e. the function names concatenated together) and then get a hash from the string with Python's hash() function. Then, in order to get a "nicer" number we do a mod 100000 operation on the hash to get our incident id.

(Actually I ended up adding a line number (mod 100) to the end of the incident id also. If you had some standard way of enumerating exception types you could probably throw the exception type into the mix too.)

I think this will serve our purposes for a start, I figured the hashing algorithm can afford to be relatively simple (so using Python's builtin hash() is probably overkill...) as one would hope that the number of items in the hash space would be small (being the total number of paths to uncaught exceptions after all!).

Would be interested in comments on this from anybody who's had more experience with this type of thing than me.

P.S. Being paid to work in Python is nice... :-)

salmoni: Glad you found the reference to Pychecker useful. It's a fair trade, I've been finding your diary entries on SalStat's development an interesting read. :-)

If you're working with packages at all let me know and I can send you my in-progress patch.

Oh, and you might be interested in looking at this comp.lang.python posting if you haven't seen it already. Word of mouth is where it's at!

Pychecker
Have been working at work on extending Pychecker ("lint for Python") to handle packages in addition to modules/scripts. It has been somewhat frustrating--partly due to cross-platform issues (see previous diary) and partly due to dealing with multiple layers of import fun...

On the plus side, the code seems to be well modularised, as I've just realised that I've pretty much only had to look at one file in order to add the functionality.

The current status is that it doesn't really work under Windows, but works most of the time under Linux. It now handles packages "good enough" that I've started actually applying it to code at work. Now starts the process of going through the results & working out what needs to be changed.

I like the fact that you can specify what code "issues" you want to be alerted to and which you want to ignore, even down to the module, method and variable level. Nice.

I've sent an "in-progress" patch to the developers, we'll wait and see if it works for them...

On that note, here's some Google bait if anyone finds themselves in the same situation as I was today:

When the documentation for Pycheck says "If you want to suppress warnings on a module/function/class/method, you can define a suppressions dictionary in .pycheckrc.", an example of how it looks in your 'rc' file is:

suppressions = {'foo.bar' : 'unusednames=foo,bar'}

i.e. the name of the suppressions dictionary actually *is* 'suppressions'...

balug
Went to the Bay Area Linux User Group last night. Nice sweet & sour pork. Won a book. What more can you ask for? :-)

codecon
Hmmm, CodeCon looks interesting. Seeing as I'm actually *in* San Francisco at the time I'm sorely tempted to go. The question is can I afford the money & the time off work? I think I'll wait and see how the month goes... Had a look over the conference web site, couldn't see any mention of when registrations close, if they do...

Cross-platform Python frustration

Hmmm, I seem to have come across a difference in the way Python (2.2ish) works on Linux compared to Windows.

Under Python 2.2.1 on Mandrake 9.0 the following code (in interactive mode):

__import__("test/test_ds")

results in:

<module 'test/test_ds' from 'test/test_ds.pyc'>

But, under Windows it results in an ImportError, even if I change the directory separator used.

Now, I realise it's illegal to specify a directory path when using the import statement, but which of the two above behaviours is "correct" for the __import__ function?

sej: I downloaded Safari the other day and remember seeing something about the LGPL at the very bottom of the "click-thru" license agreement upon installation.
7 Jan 2003 (updated 7 Jan 2003 at 04:28 UTC) »
Employed!
I started my new job today! Yay. So it appears the Bay Area (did) have a job vacancy after all. (Thanks craigslist...)

Better still, I get to work under Linux (had the option between that & Windows), and use Python + wxPython. Sweet.

Used Gnome2 today for the first time. Some of the changes are kinda annoying. (Anyone know what the keyboard short-cuts are to switch/move to a different virtual desktop are now?)

How to use open source tools to create a scientific presentation?

ladypine:
You might want to look at Bruce's LaTeX Handout System (v.1.0.1) which may also do what you want.

The system was used extensively at my University by my lecturers to produce slide presentations and hand-out notes.

26 Dec 2002 (updated 26 Dec 2002 at 04:43 UTC) »
San Francisco

Well, here I am in San Francisco at Christmas.

I'm still looking for a job until March 25, 2003.

I'm also keen to meet up with tech-related people in the Bay Area. I've been to a few gatherings already and it's been good to be around people who are into similar areas of interest.

Email providers

mascot: You might want to consider FastMail, I've been really impressed with them--even paid money for a lifetime membership. Nice, no-advert web interface and IMAP access too.

22 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!