Older blog entries for fxn (starting at number 519)

20 Aug 2009 (updated 20 Aug 2009 at 17:54 UTC) »

Ruby Regexps and Unicode

In Ruby 1.8 strings have no encoding associated, they are only a handful of bytes from Ruby's view. Regexps are agnostic in that sense as well they match bytes against bytes. Unless you pass one of the flags /u for UTF8, /s for SJIS, or /e for EUC-JP. By the way note that /s in Ruby has a different meaning than in Perl, and it is not the only flag that conflicts.

If you set $KCODE to "u" then source code itself is assumed to be UTF8 and Ruby turns the /u flag on. Ruby on Rails does that since version 1.2 for example.

AFAICT it is not clearly defined which support does Ruby 1.8 provide for Unicode in regexps. For example Flanagan & Matz have little about it except for some vague descriptions. You could say it is just not supported, but some things do work. For example, it is a known trick that counting /./ matches gives you the length of a UTF8 string, whereas #length returns number of bytes.

A couple of important bits with definitely partial support are the character classes \w and \s (and thus their negations \W and \S).

In general, the definition of a word char depends on the locale. In Catalan "ò" is a word char. Regexp engines are locale-aware and the meaning of \w depends on it. That is, \w is equivalent to [a-zA-Z0-9_] only in ASCII-like locales. In Ruby, if source code is UTF8 and /u is enabled "ò" matches \w.

That's important of course, a Rails application that validates domain or account names against \w for example is permitting accented letters. If they should not be allowed you need to write the character class explicitly: [a-zA-Z0-9_].

On the other hand, since "ò" and friends match \w you could be tempted to validate Unicode against \w, I certainly have beed more than tempted :-). Wrong! There are characters that match but shouldn't. For example "¿" or "¡", or "·".

With whitespace there's also poor support. NEL (U+0085) belongs to \s, but it doesn't in Ruby 1.8. A string that consists of NELs not only is not blank in Rails, but it in addition matches \w in Ruby 1.8! Two gotchas for the price of one!

If you need proper Unicode support, among other goodies, you switch to using Oniguruma. That's the regexp engine used in Ruby 1.9, which is available for 1.8 as a gem:

    sudo gem install oniguruma

That needs a C library available as a tarball, and also packaged for Ubuntu (at least):

    sudo apt-get install libonig-dev

The API is here.

3 Aug 2009 (updated 6 Aug 2009 at 08:46 UTC) »


I am excited to announce I joined Terry Jones and esteve in building FluidDB.

Very happy, Terry and Esteve are terrific, and I sincerely think FluidDB might be something revolutionary. I believe there's something latent there related to data sharing that it could be big.

17 Jul 2009 (updated 17 Jul 2009 at 00:14 UTC) »

What is a browser?

Have a look at this video of a Google guy asking what is a browser in Times Square. People have basically no idea.

I don't know whether the interviews in this video can really be extrapolated, but my instict says there's something into it. When you are into technology you need to look ahead and construct the future, but a corner in your head has to keep you balanced and take into account the man in the street is very very far from your view. Just a reality check, your duty is to be ahead, as is the duty of any specialist in any field.

PS: I saw this video in a post of Seth Godin, which uses it to depict an unrelated point.

13 Jul 2009 (updated 13 Jul 2009 at 18:51 UTC) »

Rails Contributors

Almost a year ago I started to work on a script to count the number of people that have contributed to Ruby on Rails, the aim was to be able to give a good approximation in my keynote at Conferencia Rails 2008.

That's not a direct count because since Subversion does not track authors credit was given by hand following a few conventions. The committer typically put your name/email/nick whatever at the end of the commit message for example. Even nowadays with Git the author of a commit to Rails is not always the Git author, some munging is still needed for fine tracking.

So the script identified authors where day appear, and normalized the names to identify every handler, typo, etc. and map them to a "canonical" name.

I am very happy that effort took shape in the official Rails Contributors index, with design by José Espinal. It has been online for a while but didn't blog about it yet.


After three years from its foundation I left ASPgems at the beginning of June. I have no plan B, it was simply something I felt I had to do.

I have done some contract work for the rest of June but I am currently taking a break with my family in the seaside. I am gonna have fun with my daughter in the beach, read, sleep, walk, ride our bikes, open source, and take perspective to think what's next.

16 May 2009 (updated 16 May 2009 at 23:19 UTC) »

EuRuKo 2009

EuRuKo 2009 is over!

SRUG is very very happy about the outcome, we put effort and organised the conference with illusion, and people felt it and had a really great time. Talks were interesting, and most important people had the chance to chat, sit in the grass, go to the beach at night....

We were honoured Matz came to the conference to give the opening keynote, he made a 22-hours flight from Japan! We tried to make him feel at home. Matz actually attended the conference, I mean, you know those stars that give their keynote and then go to do sightseeing. Not Matz, he stayed at the conference and talked with everytbody, we was at the conference and if you took a perspective of the hall he was mixed with the audience as any other attendee. Hat tip at him.

Next year EuRuKo goes to Kraków, our best wishes to the organisers. We met them in Barcelona and we are sure they are going to run an extraordinary conference.

/me waves from Scotland on Rails.

14 Mar 2009 (updated 14 Mar 2009 at 10:15 UTC) »

Why Did I Write Acme::Pythonic

Acme::Pythonic is a Perl module of mine that allows the user to write Pythonic code as valid Perl code. I mean, you feed this code to perl:

    use Acme::Pythonic; # this semicolon yet needed
     sub delete_edges:
         my $G = shift
         while my ($u, $v) = splice(@_, 0, 2):
             if defined $v:
                 $G->delete_edge($u, $v)
                 my @e = $G->edges($u)
                 while ($u, $v) = splice(@e, 0, 2):
                     $G->delete_edge($u, $v)

and perl executes it right away, directly. There's no intermediate file being generated or anything. Sounds like magic unless you know what's a source filter.

But some people don't get that even with the work behind this module, the test suite, etc. this module is just a fucking joke! That's why it belongs to the Acme:: namespace in the first place.

It is a joke about taking programming languages too seriously, to the hell with that, there you have Python and Perl mixed together. Sublimation. Climax. You can put that code against a wall and do vipassana contemplating it, release your attachments to this mundane world!

Rails Documentation Team

Rails has now an official documentation team! That's Pratik, Mike, and me. I am very happy this converged this way, there has been a great deal of work in docrails and Rails Guides that finally takes shape.

12 Feb 2009 (updated 12 Feb 2009 at 01:16 UTC) »

Busy. Organizing two conferences: EuRuKo 2009 and RailsDevConf. I am also seen armed with a red pen in Rails Guides. The first semester at the University of Barcelona is over.

7 Jan 2009 (updated 7 Jan 2009 at 12:47 UTC) »


I have some stuff in the buffer to blog about. One of the entries will be about my brand new ebook reader iRex DR 1000S, which motivated my new year's pet project: unmac.

From a Mac you just pass documents to the ereader by drag and drop, and that clutters the file system with ghost "._*" files, Spotlight stuff, FSEvents stuff, etc. You don't see them because the interface hides dot files, but wanted to have a clean SD card anyway.

In addition, Mac archivers like zip(1) or tar(1) and utilities like cp(1) and friends put resource forks and other stuff in hidden files as well. So, for instance, if you untar one of those on Windows/Linux/whatever you'll get that HFS baggage. Ever saw a directory "__MACOSX" out of nowhere? I learnt this the hard-way.

Solution: I wrote unmac, a portable command-line cleaner of those Mac-related spurious files.

510 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!