Recent blog entries for fxn

15 Nov 2011 (updated 16 Nov 2011 at 14:17 UTC) »

A Virtual Machine-based Development Environment

As many independent software developers, I normally have several projects going on: clients, side-projects, open source projects... Over the last years I have refined a development environment optimized for that use case that has worked really well for me. Let me share it in this post.

Isolated Environments

In a normal day I may work for a client that has a Rails 2.3.5 application with Solr, memcached, some custom vhosts, PostgreSQL 8.4, and a CAS running on Tomcat. Later at night I may work on Ruby on Rails and be able to, e.g., run the Active Record test suite with Ruby 1.9.3 and PostgreSQL 9.1.

If you multiply that by some number of projects, it becomes clear that a single machine can't possibly handle such a variety of environments in any predictable way. Well, you can predict chaos perhaps :).

And not only for today. If client C needs something three months after our last collaboration, I want to be able to launch his exact environment right away. No matter whether in the meantime I installed/uninstalled a gazillion things, upgraded the operating system, or got a new machine. I want to be able to launch the environment of client C anytime.

In summary, you need a robust setup that provides isolation to every project you work on. Virtual machines are a solution to that problem that allow you to have one single development (real) computer.

My development environment is totally based on virtual machines.

Software Choices

My laptop is a 13'' MacBook Pro from mid-2009, 4 GB of RAM, running Lion nowadays.

I have VMware Fusion and all the virtual machines run Linux. Since as a user I love Mac OS X I do most of my work in the host (I'll tell you how in a minute). In the virtual machine I basically need just a console, a light desktop is hence enough. My distro of choice nowadays is Lubuntu running the open-vm-tools Ubuntu package. That package provides desktop resize on window resize, copy & paste between the guest and the host, etc.

As I said, I do not work inside the virtual machine. I launch the virtual machine and there I have the complete runtime I need. Web servers and test suites run in the virtual machine, but editing, browsing, etc. happens in the Mac. To accomplish that I have a couple of tricks.

First, sharing the file system. VMware allows you to mount your home in the guest via what they call Shared Folders. But the virtual machines should be round and complete, so the code of my clients should be there, not in the host. In addition to that, Shared Folders do not work well with git, I think because of the hard linking going on. I rather go the other way around: I mount the guest file system in the host via SSHFS. If you use MacPorts just run sudo port install sshfs and you're golden. Looks like Homebrew also has a sshfs package. It surely works, but I don't use Homebrew.

Now, SSHFS mounts the file system through (local) SSH, not as fast as the hard drive. But I only need the files for editing and I don't really care if saving a file takes 5 ms. So no big deal. The only detail you probably want to tweak is disabling automatic project tree sync on focus in your editor/IDE if it has the concept of a project tree and tree sync on focus.

Second, I have a personal rule: One project at a time. If I am working for client C, I am totally focused on C's project. That's all I want to have up. Given that rule, I can implement a couple of convenient simplifications: All virtual machines have the same IP, and all virtual machines have the same mount point. Let me explain that in a dedicated section.

One Single IP, One Single Mount Point

VMware has several network modes. My virtual machines are configured to run under the default NAT mode, in which VMware does DHCP for the guest. That by itself is a little bit cumbersome because you get different IPs in different sessions, and sometimes you even get the IP changed while the machine is running. I've found that fixed addresses work better for my needs, up to a point where all the virtual machines have the same one.

The configuration for DHCP as of this writing lives in the file /Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf, and to assign a fixed IP to a virtual machine you just need to know its MAC address. The MAC address can be manually set in the virtual machine settings, but I just grab whatever is printed by ifconfig eth0 after a standard installation.

Then, open the config file mentioned earlier and add towards the bottom something like this:


host rails {
hardware ethernet 00:0c:29:0a:98:b8;
fixed-address 172.16.132.127;
}

That's the DHCP configuration for the virtual machine where I have the development environment for Ruby on Rails. The host name "rails" is an arbitrary string. It has MAC address 00:0c:29:0a:98:b8 and a fixed IP of 172.16.132.127. I have a host configuration like that one per virtual machine.

To choose the IP have a look at the subnet block generated by VMware towards the middle of the file:


subnet 172.16.132.0 netmask 255.255.255.0 {
range 172.16.132.128 172.16.132.254;
...
}

According to the first line you have to choose an IP within 172.16.132.* that it is outside the specified range. My choice is 172.16.132.127.

Since we have the same IP for any virtual machine, we can create an entry in /etc/hosts that gives us a single hostname to rule them all:


172.16.132.127 vm

Mounting the file system is easily scriptable:

sshfs -o StrictHostKeyChecking=no -o reconnect -o workaround=rename fxn@vm:. $HOME/vm

Unmounting is also easily scriptable:

diskutil umount $HOME/vm 2>/dev/null

Spurious Files

Some programs in the host may create metadata files for the Mac like .DS_Store and friends. I don't like such files in the virtual machines. For those few programs that do that there's normally a configuration option or somesuch to disable it. There's a shell one-liner for TextMate for example, but I no longer remember it because it's been a while since I used TextMate, but you can Google for it. AFAICT, Emacs, Vim, Sublime Text 2, and RubyMine leave no spurious files out of the box.

As a last resort, if I get any of these spurious files for whatever reason I just run unmac. That's a little utility of mine implemented as a Ruby gem that cleans a given directory. To install it gem install unmac, possibly with admin privs. In my experience I rarely need to run unmac though.

Backups

You do not want Time Machine to do incremental backups of your virtual machines because they are big files on disk that change continually. For backups I use Carbon Copy Cloner.

That's another big win. Backups are trivial, no matter whether your hard disk breaks, or you get a new computer, you are ready to work in no time and with the guarantee that all those complicated environments are consistent and safe. The peace of mind that gives is invaluable.

3 Oct 2010 (updated 3 Oct 2010 at 19:48 UTC) »

Ruby, C, and Java are pass-by-value, Perl is pass-by-reference

Call semantincs in languages that manage references often confuse people. It is a recurring thread in Java and Ruby. The reason is simple: "pass-by-reference" has the word "reference" in it and thus people assume it has something to do with the language's "references". Not really.

In Ruby and Java "reference" is a term that is close to the concept of "pointer" in C. You have a handler that somehow points to something, rather than being that very something. The language may hide the indirection for you. Usage is straight in Ruby or Java. Not so in C or Perl, where to go from a pointer/reference to their objects you need an arrow.

When we talk about "pass-by-value", though, the fact that the language has references is irrelevant. We are really talking about where's the value associated with the parameters' name. Particularly we mean that the value is stored in an area unrelated to the storage in the callee. Let's see this.

Say we have an assignment


    a = 1

This assignment means that somewhere there's an association between the name "a" and the value "1", which is itself stored somewhere:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+

If you assign


    b = 1

conceptually we get two associations to two different value storages:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  1  |
    +-----+       +-----+

In particular, if in the next line we change b:


    b = 2

we all know the situation becomes


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  2  |
    +-----+       +-----+

In particular a still holds 1.

In languages like Perl you can have this other diagram:


    +-----+
    |  a  | --+
    +-----+   |    +-----+
              +--> |  1  |
    +-----+   |    +-----+
    |  b  | --+
    +-----+

In Perl jargon you say that a and b are aliases. In that situation, any assignment to a is reflected in b, and any assignment to b is reflected in a. Those names are associated with the same storage area.

The terms "pass-by-value" and "pass-by-reference" are about names linked to storage. And with those pictures you can understand what they mean. I am gonna obviate scope to simplify this and use different variable names on purpose, so this is not exact, but the essence is there.

Say you have


  def foo(b)
    ...
  end
  
  a = 1
  foo(a)

In a pass-by-value language the situation is:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  1  |
    +-----+       +-----+

The interpreter or whoever runs your language performs a copy behind the scenes of the storage area associated with "a", and associates the new one with "b". That's why if you reassing to b inside foo a is unaffected.

On the other hand, in a pass-by-reference language the situation is:


    +-----+
    |  a  | --+
    +-----+   |    +-----+
              +--> |  1  |
    +-----+   |    +-----+
    |  b  | --+
    +-----+

That's why you can implement swap in such languages.

But I can change the state of a mutable object in Ruby/Java because I pass a reference!

That is true, and it has no bearing with this. Since Ruby is pass-by-value, you can be certain that when the method returns your variable will refer to the same object. object_id is guaranteed to be the same after a method invocation (modulo black magic). Same for Java.

But I can change the integer a variable holds by passing a pointer in C!

That is true, but you are not passing the integer, you are passing a pointer to the integer. Since C is pass-by-value, if you had a variable holding the pointer before the call, you can be totally certain the variable will hold the same exact pointer after the call.

Summary

The terms pass-by-value and pass-by-reference are about links from names to storage areas, they have nothing to do with the references or pointers of your language.

That's a bit simplified, in Perl for example the aliases happen within @_, but that's the key idea.

8 Aug 2010 (updated 8 Aug 2010 at 19:43 UTC) »

When Classes Leak Into Ruby Contracts

Sometimes Ruby APIs document non-Rubyesque expectations that artificially separate "classes" and "objects", a la Java. I'd like to give you a few examples to depict what I mean, and explain why that's artificial later.

First example, the Rack specification says that

A Rack application is an Ruby object (not a class) that responds to call.

That's not very idiomatic, why classes are banned? A Ruby programmer would expect this shorter contract:

A Rack application is an Ruby object that responds to call.

That's it, the nature of the object is irrelevant to Rack, the only thing that matters is that the object responds to call (with such and such signature). Indeed a class that responds to call is a perfectly valid Rack application. The implementation is Rubyesque, but the wording in the docs is not.

Another example taken from the chapter on routing of O'Reilly's Rails 3 in a Nutshell:

Constraints may either be specified as a hash, a class implementing a matches? class method, or a class that responds to a call method, such as a Proc object.

The suspicious bit in that contract is "a class implementing a matches? class method". Indeed there's no requirement in the routing system that you pass a class. All it matters is that you pass any object that responds to matches?, see:


    constraint.respond_to?(:matches?) && !constraint.matches?(req)

That's idiomatic Ruby, where the interface is the only thing that matters, classes are irrelevant.

Classes Are Ordinary Objects

Technically in Ruby there are no "class methods" as opposed to "instance methods". Ruby only has instance methods. Let me summarize how this works.

If you define a Person class having a name instance method, instances of Person respond to name. No surprises here. But individual person instances can respond to more stuff:


    def person.custom_method
      ...
    end

In the example above, the object stored in the person variable also responds to custom_method. Such a method targeted to a particular instance is called a singleton method. You can for example build simple mocks this way:


    o = Object.new
    def o.name
      "John"
    end
    # Now pass o to code that expects anything responding to #name.

And you can also override methods defined in the class of the object.

In Ruby classes are objects. When you write Person, that's an ordinary constant. Totally ordinary. It is the same kind of ordinary constant as


    X = 1

No difference. You can think of


    class Person
    end

as being equivalent to


    Person = Class.new

The Ruby interpreter then processes the class definition body, but as far as the constant is concerned that's it. In fact, if you have an anonymous class and assign it to a constant later, then it gets its name automatically after the constant's name.

So, this is a key point, the Person constant is ordinary, it happens to evaluate to a class object, the same way X above evaluates to an integer. And here is when Ruby deviates from other OO languages: classes are ordinary objects also, objects of type Class:


    klass = Person
    person = klass.new

That works because Person just evaluates to a class object, and as with any other object you can store classes in variables and pass them around, and that object responds to the new method, the same way person responds to name. Why? Because klass is an object of the class Class, which defines new among its instance methods. Simple and elegant.

So at this point you need to forget a bit mental schemas coming from other languages and open your mind to accept the derivations of this particular OO model.

Since classes are ordinary objects, you can also define singleton methods on them, the same way we did with person before. Do you recognize now this idiom?


    class Person
      def self.find_by_name(name)
        ...
      end
    end

In the body of a class self is the class object in scope, and so that is just defining a singleton method on it.

All classes are instances of Class, a "class method" is any method a class responds to, which may come from Class, or be defined for particular classes, that is, singleton methods.

Class Methods, Fine

I am fine with the term "class method" in Ruby as long as we know what we are talking about. Technically Ruby has no such things, everything are instance methods, but you can take "class method" as short for "an instance method of the class object". Depending on their intended usage, class methods are also referred to as "macros", also a convenient term, think has_many in Active Record.

But even if you can talk about "class methods" in that sense, you rarely need to tell classes from non-classes in API contracts based on interfaces.

12 Jun 2010 (updated 12 Jun 2010 at 23:20 UTC) »

Ruby Hero 2010

I was awarded Ruby Hero 2010 in RailsConf this week. I am deeply grateful and honored by this recognition I received together with José Valim (devise, Rails core team), Nick Quaranto (RubyGems.org), Aaron Patterson (Nokogiri, SQLite driver, and a ton of other software), Wayne Seguin (RVM awesomeness), and Gregory Brown (Ruport, Prawn, Ruby Best Practices...).

6 May 2010 (updated 6 May 2010 at 11:28 UTC) »

Progress [on The Last Supper] was steady but slow, as the artist worked on in his typical thoughtful and meditative way. He spent considerable time roaming the streets of Milan looking for suitable models for faces of the apostles. By 1497 the only part left to complete was the head of Judas. At that point, the prior of the convent became so impatient with Leonardo's slowness that he complained to the duke, who summoned the artist to hear his reasons for the delay. According to Vasari, Leonardo explained to the Moor that he was working on The Last Supper at least two hours a day, but that most of his work took place in his mind. He went on, slyly, to say that, if he did not find an appropriate model for Judas, he would give the villain the features of the petulant prior. Ludovico was so amused by Leonardo's reply that he instructed the prior to be patient and let Leonardo finish his work undisturbed.

The Science of Leonardo, Fritjof Capra, pages 95--96.

Rails Committer

I've been contributing to Ruby on Rails on a regular basis, had almost 100 code patches, and about 500 doc patches, and was recently granted commit right. That's so great! It will allow me to work with more agility and have a little more freedom to do stuff.

10 Mar 2010 (updated 10 Mar 2010 at 15:05 UTC) »

An around filter to temporarily apply changes

A client of mine has an admin tool where people can edit stuff, but modifications need approval. They are represented in the application rather than applied right away as you normally do.

Changes are basically stored in the form of method names, ID of receiver, and serialized arguments, there are a handful of them. Applying changes is dynamic invocation, you get the idea.

My client wanted an approval interface where he could select a handful of changes to the same model, submit, and get a split view with the current website on the left, and the resulting website on the right.

The tricky part is that views may access the database, for example using named scopes (from a MVC viewpoint that's fine, the view is clean it happens that a named scope triggers a query). And the 3-line solution was to write an around filter that started a transaction, yielded to the action, and rolled the transaction back.

30 Jan 2010 (updated 30 Jan 2010 at 18:23 UTC) »

Tracking Class Descendants in Ruby (II)

My previous post explains a way to keep track of a class' descendants, and encapsulates the technique into a module.

There are two things you may want to do different: Since all descendants inherit the descendants class method you may prefer them to be functional. On the other hand, the module defines the inherited class method into the base class because it needs it to be a closure. That may work for some particular need, but it is not good for a generic solution. The inherited hook is the business of your client's code.

Now we'll see a different approach that addresses both concerns. Using the same hook any class in the hierarchy may easily keep track of its direct subclasses, and compute its descendants:


    class C
      def self.inherited(subclass)
        subclasses << subclass
      end
 
      def self.subclasses
        @subclasses ||= []
      end
 
      def self.descendants
        subclasses + subclasses.map(&:descendants).flatten
      end
    end

In the previous solution the inherited hook needed to ensure descendants was invoked on the root of the hierarchy. In this solution it doesn't care because we precisely take advantage of polymorphism. The way it is written a class pushes into its own @subclasses instance variable, which is what we want.

The module that encapsulates that pattern is much simpler:


    module DescendantsTracker
      def inherited(subclass)
        subclasses << subclass
        super
      end
 
      def subclasses
        @subclasses ||= []
      end
 
      def descendants
        subclasses + subclasses.map(&:descendants).flatten
      end
    end
 
    class C
      extend DescendantsTracker
    end

You know extend is like doing an include in the metaclass of C. In particular we are not defining C.inherited, we are defining a method with the same name in an ancestor of the metaclass. That way C can still define its own inherited class method. A call to super within such a C.inherited will go up the next ancestor of the metaclass, eventually reaching the inherited from DescendantsTracker.

29 Jan 2010 (updated 30 Jan 2010 at 18:25 UTC) »

Tracking Class Descendants in Ruby

I am going through all Active Support core extensions lately because I am writing the Active Support Core Extensions guide, due for Rails 3. There are some patches in master as a result of that walkthrough, and I am now focusing on keeping track of descendants in a class hierarchy.

A known technique uses ObjectSpace.each_object. That is a method that receives a class or module as argument and yields all objects that have that class or module among their parents. Since classes are instances of the class Class, you can select descendants of class C this way:


    descendants_of_C = []
    ObjectSpace.each_object(Class) do |klass|
      descendants_of_C << klass if klass < C
    end

That is a brute force approach, it works, but it is inefficient. JRuby even disables ObjectSpace by default for performance reasons.

A better approach is to leverage the inherited hook. Classes may optionally implement a class method inherited that is called whenever they are subclassed. The subclass is passed as argument:


    class User
      def self.inherited(subclass)
        puts 0
      end
    end
 
    class Admin < User
      puts 1
    end
 
    # output is
    0
    1

That's a perfect place to keep track of descendants:


    class C
      class << self
        def inherited(subclass)
          C.descendants << subclass
          super
        end
 
        def descendants
          @descendants ||= []
        end
      end
    end

In that code we have an array of descendants in @descendants. That is an instance variable of the very class C. Remember classes are ordinary objects in Ruby and so they may have instance variables. It is better to use an instance variable instead of a class variable because class variables are shared among the entire hierarchy of the class and we need an exclusive array.

Another fine point is that we force descendants to be the one in the C class. If we didn't and we had A < B < C, the hook would be called when A was defined, but by polymorphism it would be B.descendants what would be called, thus setting B's instance variable @descendants. That is not what we want.

The call to super is just a best practice. In general a hook like this should pass the call up the hierarchy in case parents have their own hooks.

That pattern can be implemented in a module for reuse indeed:


    module DescendantsTracker
      def self.included(base)
        (class << base; self; end).class_eval do
          define_method(:inherited) do |subclass|
            base.descendants << subclass
            super
          end
        end
        base.extend self
      end
 
      def descendants
        @descendants ||= []
      end
    end
 
    class C
      include DescendantsTracker
    end

A class only needs to include DescendantsTracker to track its descendants.

When the module is included in a class Ruby invokes its inherited hook. The hook receives the class that is including the module, and we leverage that to inject the class methods we saw before. For inherited we open the metaclass of base and define the method in a way that has base in scope, which is something we saw before we need. After that we add the descendants class method with an ordinary extend call.

Update: There's a followup to this post.

527 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!