Older blog entries for nconway (starting at number 11)

This thread on llvm-dev is worth reading if you're interested in compiler internals.
18 Feb 2005 (updated 19 Feb 2005 at 11:42 UTC) »

I've been thinking about static analysis for finding bugs off and on for the past 18 months or so; recently, I've been looking for a good open source static analysis tool. Unless I've managed to miss it, ISTM there isn't one.

Uno is the closest I've found, but it is pretty unpolished, and I don't believe it is DFSG free.

sparse, Linus' checker, may or may not be cool; I've tried to see what it's capable of, but wasn't able to make it catch anything more significant than minor stylistic errors in C code (e.g. extern function definitions, 0 used in a pointer context (rather than NULL), that sort of thing). (Side note: sparse doesn't even have a website, and it's primarily available via bk. Does Linus not want people to use his software?) I'll definitely take a closer look at this, anyway.

There are some more academic tools — like Uno only even less practical). There's also Splint, but last I tried it, it emitted way too many bogus error reports, and required tons of annotations to be of any use.

Some random thoughts about the design of an open source static analysis tool:

  • A tool that hides a handful of legitimate error reports within thousands of bogus ones is essentially useless. Given the choice, it is better to miss a few problems than to warn the user about everything that might be bogus — false positives are bad.
    • A reasonable substitute would be some effective means of sorting error reports by their likelyhood of legitimacy; if the tool generates thousands of bogus errors but places the legitimate errors at the top of the list, I'd be willing to live with it.
  • It ought to be easy to check an arbitrary base of code. That means understanding C99 plus all the GNU extensions, and providing an easy way to run the checker on some source (while getting header #includes right, running the necessary tools to generate derived files, and so on). Perhaps the easiest way to do that is have the checker mimick the standard $CC command-line arguments; then the user could run the checker via make CC=checker.
    • This also means no annotations. They are ugly, they tie the source into one specific analysis tool, and they are labour intensive; the whole point is to find bugs with the minimum of labour by the programmer.
  • It ought to be possible to write user-defined extensions to the checker, to check domain-specific properties of the source. I've got no problem with annotations in this context — that's a sensible way to inform your checker extension about domain-specific properties.
  • The theory behind Dawson Engler's work on MC is a good place to start; it is more or less the start of the art, AFAIK. Unfortunately the tool they developed was never released publicly (from what I've heard it was somewhat of a kludge anyway, implementation-wise), and Engler's now commercialized the research at Coverity.
  • ckit might be worth using. Countless people have implemented C compiler frontends in the past, so it would be nice to avoid needing to reinvent that particular wheel.

Speaking of tools for finding bugs, I've got to find some time to make valgrind understand region-based memory allocation.

Vacation

I took about a month off work. I was in Perth for about two weeks to celebrate Christmas with my aunt's family, and then in Cairns for about 10 days, doing some scuba diving with a friend who was over from Canada.

PostgreSQL

Started back at work last Monday. 8.0.0 got released, which is great -- this release has a ton of new functionality that I'm really happy about.

The tree is now open for 8.1 work, so I got a chance to check in some stuff that's been sitting on my hard drive for a while. Sped up rtree scan performance by about 10%; I have similar patches for GiST which I'll commit soon. The GiST stuff also overhauls memory management: GiST user-provided functions will now always be invoked in a short-lived memory context, so people implementing GiST-based indexes won't need to worry about freeing palloc'ed memory. One of the lessons of working on the PG source: region-based memory allocation is a Good Thing.

While cleaning up various things in PL/PgSQL (mostly memory management related), I noticed a buffer overrun in the parsing of refcursors. Patched that for 7.4 and 8.0.

I took a look at adding support for GCC's profile-guided optimization to the build system. I'm a little confused -- why don't more projects take advantage of this? Particularly when, say, building RPM packages, it would make sense to trade some extra compile-time for a few % improvement in runtime performance. On the other hand, I ran into some problems actually using the PGO support (e.g. this), so perhaps that's one reason PGO support hasn't (AFAICS) taken off.

robocoder: Thanks for mentioning the pending patent on ARC. Unfortunately, that came as quite a surprise. I passed on the bad news to the pgsql-hackers list, which started a spirited discussion of the topic. I'm not sure what the resolution to the problem is going to be; personally I think we ought to replace ARC with a simple LRU scheme in 8.0.1, and worry about a better, unencumbered replacement for 8.1. But in any case I'm glad we found out about the problem sooner rather than later.

Books

Just started reading Paul Graham's Hackers and Painters, but I'm really enjoying it so far. I also have Conrad Black's biography of FDR to start (1300 pages, yum).

An interesting statistic from the Economist:

One of the best statistics of the campaign is that people worth $1m-10m supported Mr Bush by a 63-37% margin, whereas those worth more than $10m favoured Mr Kerry 59-41%.

Robert Kagan's article Power and Weakness in Policy Review was written in 2002, but it's still a fascinating read. Choice quote:

Today's transatlantic problem, in short, is not a George Bush problem. It is a power problem. American military strength has produced a propensity to use that strength. Europe's military weakness has produced a perfectly understandable aversion to the exercise of military power. Indeed, it has produced a powerful European interest in inhabiting a world where strength doesn't matter, where international law and international institutions predominate, where unilateral action by powerful nations is forbidden, where all nations regardless of their strength have equal rights and are equally protected by commonly agreed-upon international rules of behavior. Europeans have a deep interest in devaluing and eventually eradicating the brutal laws of an anarchic, Hobbesian world where power is the ultimate determinant of national security and success.
Random Thought

Nothing can be more fallacious than to found our political calculations on arithmetical principles. Sixty or seventy men may be more properly trusted with a given degree of power than six or seven. But it does not follow that six or seven hundred would be proportionably a better depositary. And if we carry on the supposition to six or seven thousand, the whole reasoning ought to be reversed. The truth is, that in all cases a certain number at least seems to be necessary to secure the benefits of free consultation and discussion, and to guard against too easy a combination for improper purposes; as, on the other hand, the number ought at most to be kept within a certain limit, in order to avoid the confusion and intemperance of a multitude. In all very numerous assemblies, of whatever character composed, passion never fails to wrest the sceptre from reason. Had every Athenian citizen been a Socrates, every Athenian assembly would still have been a mob.

-- James Madison, The Federalist #55

13 Oct 2004 (updated 13 Oct 2004 at 08:24 UTC) »

There was an interesting thread on the GCC development list about what kind of optimizations can legally be performed on "explicit storage" (e.g. malloc in C, operator new in C++). Various folks raised concerns about how this changes programmer expections and whether it is allowed by the C or C++ standards. Interestingly, Chris Lattner pointed out that LLVM actually implements this optimization, at least for malloc (as usual, C++ makes things more complicated, but even then LLVM could theoretically perform the optimization at link-time).

Since my last blog entry, I:

  • Finished my summer internship
  • Spent a few weeks in Toronto
  • Decided that I wanted to take twelve months off university.
  • Took a twelve month contract to work full-time on PostgreSQL for Fujitsu Australia Software Technologies.
  • Moved to Sydney, Australia

So, a lot is new :)

25 Aug 2004 (updated 26 Aug 2004 at 02:02 UTC) »
I have a problem

Okay, I'll admit it: I'm completely, helplessly addicted to editing Wikipedia. I've always thought the project was cool and I've contributed a few edits in the past, but the habit has really gotten out of hand recently:

  • over the summer months I made about 2,500 edits (that said, most of them were small stuff like spelling fixes or changes for policy compliance).
  • I feel the urge to frequently advocate Wikipedia to friends, family, and random strangers.
  • I own Wikipedia clothing.
  • I've begun reading up on random subjects I know nothing about solely for the purpose of contributing to WP on it.

Winding up the summer

I took a break from the OSS world this summer to do another internship at a commercial software firm in Seattle (I did the same thing last summer). The group I was working in was doing some really cool work, although unfortunately the details are NDA. As fun as that was, I must confess it's a pleasure to get back to working on OSS.

Poker

I'm getting increasingly annoyed playing low-limit hold'em at casinos. Like any good geek, before playing poker for sizeable amounts of money I read a few books on the subject and learnt how to play "properly" -- tight and aggressive. "Get your money in when you've got the best of it, protect it when you don't," as they say. While I think I'm playing well, the results haven't been favourable: I've ended down the last four times I've gone to a casino. The most annoying thing is that I can't find any fault in my play -- given a second chance to play all those hands again, I'd play them mostly the same way. I'm tempted to blame my losses on bad luck / cold cards, but of course that's always easy to do. On the other hand, I've been cleaning up playing no limit online, so at least that's something.

Garden State

I saw Garden State recently and absolutely loved it. Natalie Portman stole the movie, I think. I bought the soundtrack the next day, which is great too. See this movie!

2 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!