Older blog entries for robertc (starting at number 154)

What do I do @ work?

I recently moved withing Canonical from being a paid developer of Bazaar to take on a larger challenge  Technical Architect for Launchpad. Its been two months now, and its time to put my head up out of the coal face, have a look around and regroup.

When I worked on Bazaar, every day when I started work got up I was working on a tool anyone can use, designed for collaboration upon sourcecode, for people writing software. This is a toolchain component right at the heart of the free software world. Bazaar and tools like it get used everyday to manage, distribute and collaborate on the sourcecode that makes up the components of Ubuntu, Debian, Fedora and so forth. Every time someone new starts using Bazaar for a new free or open source project, well I felt happy – happy that in my small part I’m helping with this revolution we’re carrying out.

Launchpad is pretty similar to Bazaar in some ways. Obviously they are both free software, both are written in Python, and both are sponsored by Canonical, my employer. And they both are designed to assist in collaboration and communication between free software developers – albeit in rather different ways.

Bazaar is a tool anyone can install locally, run as a command line, GUI, or local webserver, and share code either centrally (e.g. by pushing to Launchpad), or in a peer to peer fashion, acting as their own server.

Launchpad, by contrast is a website which (usually) folk will use as a service – in their browser, from the comand line – FTP (for package building), ssh (for Bazaar branch pushing or pulling), or even local GUI programs using the Launchpad API service. This makes it more approachable for first time collaborators, but its less able to be used offline, and it has all the usual caveats of web sites : it needs a username and password, it’s availability depends on the operators – on the team I’m part of. So there’s a lot less room for error: if we do something wrong, the system is unavailable, and users can’t just ‘apt-get install’ an older release.

With Launchpad our goal is to to get all the infrastructure that open source need out of the way, so that they can focus on their code, collaboration within their team – and almost uniquely – collaboration with other teams. As well as being open source, Launchpad is free for all open source projects to use – Ubuntu is our single biggest user – they use it for all bugtracking, translation and package building, and have a hugefraction of the total storage overhead in the database.

Launchpad is a pretty nice system, so people use it, and as a result (on a technical basis) it is suffering from its own success: small corner cases in the code turn up every day or two, code written years ago to deal with a relatively small data set now has to deal with data sets a thousand or more times larger (one table, for instance, has over 600,000,000 rows in it.

For the last two months then, I’ve been working on Launchpad. As Technical Architect, I need to ensure that the things that we (users, stakeholders and developers of Launchpad) want to do are supported by the structure of the system : the platform(s) we’re building on, the way we approach problems, coding standards and diagnostic tools. That sounds pretty dry and hands off, but I’m finding its actually very balanced. I wrote a presentation when I started the job, which encapsulated the challenges I saw in front of the team on this purely technical front, and what I thought I needed to do.

I think I was about right in my expectations: On a typical day, I’ll be hands on in a problem helping get it diagnosed, talking long term structural changes with someone around how to make things more efficient / flexible / maintainable, and writing a small patch here or there to help move things along.

In the two months since I took on this challenge, we’ve made significant headway on the problem of performance for Launchpad : many inefficient code paths have been identified and removed, some new infrastructure has been created as is being rolled out to make individual pages faster, and we’ve massively increased the diagnostic data we get when things go wrong. We’ve introduced facilities for responding more rapidly to issues in the software (but they have to be rolled out across the system) and I hope, over the next 4 months we’ll reach the first of my performance goals: for any webpage in Launchpad, it will complete rendering in 99% of the time. (Note that we already meet this goal if you measure the whole system, but this is biased by some pages being very frequently hit and also being very small).


Syndicated 2010-09-13 03:49:48 from Code happens

Subunit and nose

Looks like someone has come up with a nose plugin for subunit – excellent! http://www.liucougar.net/blog/projects/nose-subunit

In their post the author notes that subunit is not easy_installable at the moment. It will be shortly. Thanks to Tres Seaver there is a setup.py for the python component of Subunit, and he has offered to maintain that going forward. His patch is in trunk, and the next release will include a pypi upload.

The next subunit release should be pretty soon too – the unicode support in testtools has been overhauled thanks to Martin[gz], and so we’re in much better shape on Python 2.x than we were before. Python3 for testtools is trouble free in this area because confused strings don’t exist there :)


Syndicated 2010-07-01 21:59:32 from Code happens

Scary thought for the weekend

Reprap generation 20 or so + proprietary objects with embedded viruses. Real ones. (Consider what you can do in postscript…)


Syndicated 2010-05-21 20:21:30 from Code happens

Maintainable pyunit test suites

There’s a test code maintenance issue I’ve been grappling with, and watching others grapple with for a while now. I’ve blogged about some infrastructural things related to it before, but now I think its time to talk about the problem itself. The problem shows up as soon as you start writing setUp functions, or custom assertThing functions. And the problem is – where do you put this code?

If you have a single TestCase, its easy. But as soon as you have two test classes it becomes more difficult. If you choose either class, the other class cannot use your setUp or assertion code. If you create a base class for your tests and put the code there you end up with a huge base class, and every test paying the total overhead of your test needs, rather than just the overhead needed to test the particular system you want to test. Or with a large and growing list of assertions most of which are irrelevant for most tests.
The reason the choices have to be made is because test code is just code; and all the normal issues there – separation of concerns, composition often being better than inheritance, do-one-thing-well – all apply to our test code. These issues are exacerbated by pyunit (that is the Python ‘unittest’ module included with the standard library and extended by various projects)
Lets look some (some) of the concerns involved in a test environment: Test execution, fixture management, outcome decision making. I’m using slightly abstract terms here because I don’t want to bind the discussion down to an existing implementation. However the down side is that I need to define these terms a little.
Test execution – by this I mean the basic machinery of running a single test: the test framework calling into user code and receiving back an outcome with details. E.g. in pyunit your test_method() code is called, success is determined by it returning successfully, and other outcomes by raising specific exceptions. Other languages without exceptions might do this returning an outcome object, or passing some object into the user code to be called by the test.
Fixture management – the non trivial code that prepares a situation where you can make assertions. On the small side, creating a few object instances and glueing them together, on the large end, loading data into a database (and creating the database instance at the same time). Isolation issues such as masking out environment variables and creating temp directories are included in this category in my opinion.
Outcome decision making – possibly the most obtuse label I’ve ever given this, I’m referring the process of deciding *what* outcome you wish to have happen. This takes different forms depending on your testing framework. For instance, in Python’s doctest:
>>> x
45
provides a specification – the test framework calls str(x) and then compares that to the string ’45′. In pyunit assertions are typically used:
self.assertEqual(45, x)
Will call 45 == x and if the result is not True, raise an exception indicating a Failure has occured. Unexpected exceptions cause Errors, and in the most recent pyunit, and some extensions, other exceptions can signal that a test should not be run, or should have failed.
So, those are the three concerns that we have when testing; where should each be expressed (in pyunit)? Pragmatically the test execution code is the hardest to separate out: Its partly outside of ‘user control’, in that the contract is with the test framework. So lets start by saying that this core facility, which we should very rarely need to change, should be in TestCase.
That leaves fixture management and outcome decision making. Lets tackle decision making… if you consider the earlier doctest and assertion examples, I think its fairly clear that there are multiple discrete components at play. Two in particular I’d like to highlight are: matching and signalling. In the doctest example the matching is done by string matching – the reference object(s) are stringified and compared to an example the test writer provides. In the pyunit example the matching is done by the __eq__ protocol. The signalling in the doctest example is done inside the test framework (so we don’t see any evidence of it at all). In the pyunit example the signalling is done by the assertion method calling self.fail(), that being the defined contract for causing a failure. Now for a more complex example: testing a float. In doctest:
>>> “%0.3f” % x
0.123
In pyunit:
self.assertAlmostEqual(0.123, x, places=3)
This very simple check – that a floating point number is effectively 0.123 exposes two problems immediately. The first, in doctest, is that literal string comparisons are extremely limited. A regex or other language would be much more powerful (and there are some extensions to doctest; the point remains though – the … operator is not enough). The second problem is in pyunit. It is that the contract of assertEqual and assertAlmostEqual are different: you cannot substitute one in where the other was expected without partial function application – something that while powerful is not the most obvious thing to reach for, or to read in code. The JUnit folk came up with a nice way to address this: they decoupled /matching/ and /deciding/ with a new assertion called ‘assertThat’ and a language for matching – expressed as classes. The initial matcher library, hamcrest, is pretty ugly in Python; I don’t use it because it tries too hard to be ‘english like’ rather than being honest about being code. (Aside, what would ‘is_()’ in a python library mean to you? Unless you’ve read the hamcrest code, or are not a Python programmer, you’ll probably get it wrong. However the concept is totally sound. So, ‘outcome decision making’ should be done by using a matching language totally seperate from testing, and a small bit of glue for your test framework. In ‘testtools’ that glue is ‘assertThat’, and the matching language is a narrow Matcher contract (in testtools.matchers) which I’m going to describe here, in case you cannot or don’t want to use the testtools one.
class Matcher:
    def __str__(self):
        "Describe this matcher."""
    def match(self, something):
        """Determine if something is matched.
        :param something: Something to match.
        :return: None if something matched, or a Mismatch object otherwise.
        """
class Mismatch:
    def describe(self):
        """Describe a mismatch that has occured."""
This permits composition and inheritance within your matching code in a pretty clean way. Using == only permits this if you can simultaneously define an __eq__ for your objects that matches with arbitrarily sensitivity (e.g. you might not want to be examining the process_id value for a process a test ran, but do want to check other fields).
Now for fixture management. This one is pretty simple really: stop using setUp (and other similar on-TestCase methods). If you use them, you will end up with a hierarchy like this:
BaseTestCase1
 +TestCase1
 +TestCase2
 +BaseTestCase2
   +TestCase3
   +TestCase4
   +BaseTestCase3
     +TestCase5
     ...
That is, you’ll have a tree of base classes, and hanging off them actual test cases. Instead, write on your base TestCase a single glue method – e.g.
def useFixture(self, fixture):
      fixture.setUp()
      self.addCleanup(fixture.tearDown)
      return fixture
And then rather than having a setUp function which performs complex operations, define a ‘fixture’ – an object with a setUp and a tearDown method. Use this in tests that need that code::
def test_foo(self):
      server = self.useFixture(NewServerWithUsers())
      self.assertThat(server, HasUser('fred'))
Note that there are some things around that offer this sort of convention already: thats all it is – convention. Pick one, and run with it. But please don’t use setUp; it was a conflated idea in the first place and is a concrete problem. Something like testresources or testscenarios may fit your needs – if it does, great! However they are not the last word – they aren’t convenient enough to replace just calling a simple helper like I’ve presented here.
To conclude, the short story is:
  • use assertThat and have a seperate hierarchy of composable matchers
  • use or create a fixture/resouce framework rather than setUp/tearDown
  • any old TestCase that has the outcomes you want should do at this point (but I love testtools).

Syndicated 2010-05-09 16:21:35 from Code happens

LibrePlanet 2010 day 3

Free network services – A discussion session led by Bradley Kuhn, Mako & Matt Lee : Libre.fm encouraged last.fm to write an API so they didn’t need to screen scrape; outcome of the network services story still unknown – netbooks without local productivity apps might now work, most users of network office apps are using them because of collaboration. We have a replacement for twitter – status.net, distributed system, but nothing like facebook [yet?]. Bradley says – like the original GNU problem, just start writing secure peer to peer network services to offer the things that are currently proprietary. There is perhaps a lack of an architectural vision for replacing these proprietary things: folk are asking how we will replace ‘the cloud’ aspects of facebook etc – tagging photos and other stuff around the web, while not using hosted-by-other-people-services. I stopped at this point to switch sessions – the rooms were not in sync session time wise.

Mentoring in free software – Leslie Hawthorne: Projector not working, so Leslie carried on a discussion carried on from the previous talk about the use of sexual themes in promoting projects/talk content and the like. This is almost certainly best covered by watching the video. A few themes from it though:

  • for anyone considering joining a community, they are assessing whether that community is ‘people like us’ – and for many people, including both women *and* men, blatant sexuality, isn’t something that fits the ‘people like us’ assessment. Note that this is in addition to offensive and inappropriate aspects of the issue.
  • respect is a key element here: respect your community, respect potential contributors, and don’t endorse (even silently) disrespectful behaviour
  • Codes of conduct might be a good idea
  • The lack of support in the community has for at least one project led to a complete loss of the women contributors to that project – and they are still largely lacking many years later.

We then got Leslies actual talk. Sadly I missed the start of it – I was outside organising security guards because we had (and boy it was ironic) a very loud, confrontational guy at the front who was replying to every statement and the tone in the room had gotten to the point that a fight was brewing.

From where I got back:

  • Check your tone
  • help people be productive in your community
  • cultivate creativity
  • know yourself
  • do not get caught up in perfectionism
  • communicate – both big stuff, but also just take the time to talk – how are you going, etc.
  • Share your mistakes
  • Guide don’t order
  • Recognition = Retention
  • Recognition = Delegation – its ok to let other people be responsible for stuff
  • http://bit.ly/MentorGuide
  • http://bit.ly/MentoringArticle

Chris Ball, Hanna Wallach, Erinn Clark and Denise Paolucci — Recruiting/retaining women in free software projects. Not a unique problem to women – things that make it better for women can also increase the recruitment and retention of men. Make a lack of diversity a bug; provide onramps – small easy bugs in the bug tracker (tagged as such), have a dedicated womens sub project – and permit [well behaved :) ] men in there – helps build connections into the rest of the project. Make it clear that mistakes are ok. On retention… recognise first patches, first commits in newsletters and the like. Call out big things or long wanted features – by the person that helped. Regular discussion of patches and fixes – rather than just the changelog. CMU did a study on undergrad women participation in CS : ‘Lack of confidence preceeds lack of interest/partipation’. Engagement with what they are doing is a key thing too. ‘Women are consistently undervaluing their worth to the free software community’. ‘Its the personal touch that seems to make a huge difference’. ‘More projects should do a code of conduct – kudos to Ubuntu for doing it’ — Chris Ball.

I found the mentoring and women-in-free-software talks to have extremely similar themes – which is perhaps confirmation or something –  but it wasn’t surprising to me. They were both really good talks though!

And thats my coverage of LibrePlanet – I’m catching a plane after lunch :( . Its a good low-key conference, and well put together.


Syndicated 2010-03-21 17:03:28 from Code happens

LibrePlanet 2010 Day 2

John Gilmore keynote – What do we do next, having produced a free software system for our computers? Perhaps we should aim at Windows? Wine + an extended ndiswrapper to run other hardware drivers + a better system administration interface/resources/manuals. However that means knowing a lot about windows internals – something that open source developers don’t seem to want to do. We shouldn’t just carry on tweaking – its not inspiring; whats our stretch goal? Discussion followed – reactos, continue integrating software and people with a goal of achieving really close integration: software as human rights issue! ‘Desktop paradigm needs to be replaced’ : need to move away from a document based desktop to a device based desktop. Concern about the goal of running binary drivers for hardware: encourages manufacturers to sell hardware w/out specs; we shouldn’t encourage the idea that that is ok. Lots of concern about cloning, lots of concern about what will bring more freedom to users, and what it will take to have a compelling vision to inspire 50000 free software hackers. Free software in cars – lots of safety issues in .e.g brake controllers, accelerators.

Eben Moglen – ‘We’re at the inflection point of free software’ – because any large scale global projects these days are not feasible without free software. Claims that doing something that scales from tiny to huge environment requires ‘us’ — A claim I would (sadly) dispute. Lots of incoming and remaining challenges. ‘Entirely clear that the patent systems relationship to technology is pathological and dangerous’ – that I agree with! Patent muggings are a problem – patent holders are unhappy with patents granted to other people :) . Patent pools are helping slowly as they grow. Companies which don’t care about the freedom aspect of GPLv3 are adopting it because of the patent protection aspects. Patent system is at the head of the list of causes-of-bad-things affecting free software. SFLC is building coalitions outside the core community to protect the interests of the free software community. We are starting to be taken for granted at the high end of mgmt in companies that build on free software. … We face a problem in the erosion of privacy. We need to build a stack, running on commodity hardware that runs federated services rather than folk needing centralised services.

Marina Zhurakhinskaya on GNOME Shell: Integrates old and new ideas in an overall comprehensive design. Marina ran through the various goals of the shell – growing with users, being delightful, starting simply so new users are not overwhelmed. The activities screen looks pretty nice ;) The workspace rearrangement UI is really good. The notifications thing is interesting; you can respond to a chat message in-line in the notification.

Richard Stallman on Software as a Service – he presented verbally the case made in the paper. Some key quotes… “All your data on a server is equivalent to total spyware” – I think this is a worst-case analogy; it suggests that you can never trust another party: kindof a sad state of paranoia to assume that all network servers are always out to get you all the time. And I have to ask – should we get rid of Savannah then (because all the data is stored there) – the argument for why Savannah is not SaaS is not convincing: its just file storage, so what makes it different to e.g. Ubuntu One? “If there is a server and only a little bit of it is SaaS, perhaps just say don’t worry about it – because that little bit is often the hardest bit to replace.”  ”Lets write systems for collaborative word process that don’t involve a central server” — abiword w/the sharing plugin ? :) RMS seems to be claiming that someone else sysadmining a server for you is better than someone else sysadmining a time-shared server for you: I don’t actually see the difference, unless you’re also asserting that you’ll always have root over your’ own machine’. The argument seems very fuzzy and unclear to me as to why there is really a greater risk – in particular when there is a commercial relationship with the operator (as opposed to, say, an advertising supported relationship).


Syndicated 2010-03-20 21:09:34 from Code happens

LibrePlanet – GNU Hackers Meetup

GNU Hackers meetups are a face to face meeting to balance the online collaboration that GNU maintainers and contributors do all the time. These are  a recent (since 2007) thing, and are having a positive effect within GNU and the FSF.

The LibrePlanet 2010 GNU Hackers meetup runs concurrent with the first day of LibrePlanet.

We started with some project updates:

  • SipWitch – a project to do discovery of SIP endpoints and setup encryption etc. This looks quite interesting, and is looking for contributors.
  • Bazaar – I presented an update on where Bazaar is at and what we’re focusing on now and in the future:
    • short term: merging and collaboration:
      • merge behaviour
      • conflict behaviour
      • develop a rebase that can combine unrelated branches
      • looms to be polished, or pipelines extended – something to manage long-standing patches for distributions, or other environments that need long lived patch sets.
    • long term
      • continuing optimisation of network and local perf
      • meta-branch operations – mirror collections of branches,
      • work with many branches at once (many branches in one dir (a-la git, hopefully less confusing)
      • easier ‘get up and go’ for new contributors
    • now and forever
      • keep fostering community growth
      • we’re aiming for negative bug growth- get on top and stay there

Felipe Sanches presented his list of things that should be on the high priority project list:

  • accessibility since 1st boot
  • reconfigurable hardware development (FPGA tools) – this is particularly relevant for handling e.g. wifi cards that have a FPGA in the card, so we can replace the non-free microcode.
  • nonfree firmware issue

–lunch–

John Eaton on Octave. John compared the octave contributors – 30 or so over the years, and never more than 2 at a time. The Proprietary product Matlab that Octave is very similar to has 2000 staff working at the company producing it. Users seem to expect the two products to be equivalent, and are disappointed that Octave is less capable, and that the community is not as able to do the sort of support that a commercial organisation might have done. Octave would like to gain some more developers and be able to educe users more effectively – convert more to become developers.

Rob Myers, the chief GNU webmaster gave a description of his role: The webmasters deal with adding new content, dealing with mail to webmaster@, which can be queries for the GNU project, random questions about CDs, and an endless flood of spam. The webmasters project is run as a free software project – the site is in CVS (yes CVS), visible on Savannah. Templates could be made nicer and perhaps move to a CMS.

Aubrey Jaffer on cross platform. There is a thing called Water which is meant to replace all the different languages used in web apps – generates html, css, alters the DOM, does what you’d do with javascript. So there is a Water -> backend translator that outputs Java for servers, C# for windows, and so on. (I think, this wasn’t entirely clear). He went on to talk about many of the internals of a thing called Schlep which is used as a compiler to get scheme code running in C/C#/Java so as to make it available to Water backends in different environments.

Matt Lee spoke about GNU FM – GNU FM is a free ‘last.fm’ site. The site is running at http://libre.fm/.  24ish devs, but stalle after 6 months – whats next? Matt has started GNU Social to build a communication framework for GNU projects to talk to each other – e.g. for each GNU FM site to communicate on the back end, with a particular focus on doing social functionality – groups, friendships, personal info. The wiki page needs ideas!

GNU advisory board discussion…   too much to capture, but focused GNU wide issues – things like how projects get contributors, contributions, coordination. Teams were a big discussion point, bug trackers – how to coordinate teams followed up of that, and there is s ‘GNU Source Release Collection’ project to do coordinated releases of GNU software that are all known to work together.


Syndicated 2010-03-19 21:33:02 from Code happens

LCA 2010 videos are showing up

Not all the videos are there yet, but they are starting to show up . Yay. See http://mirror.internode.on.net/pub/linux.conf.au/2010/index.html or your local LA mirror.


Syndicated 2010-03-17 03:19:40 from Code happens

Yay Dell-with-Ubuntu down under


Dell has been offering Ubuntu on selected models for a while. I had however nearly given up hope on being able to buy one, because they hadn’t started doing that in Australia. I am very glad to see this has changed though – check out their notebook page. Not all models yet, but a reasonable number have Ubuntu as an option.

Yay!

Syndicated 2010-02-15 05:48:12 from Code happens

145 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!