Older blog entries for fxn (starting at number 535)

3 Oct 2010 (updated 3 Oct 2010 at 19:48 UTC) »

Ruby, C, and Java are pass-by-value, Perl is pass-by-reference

Call semantincs in languages that manage references often confuse people. It is a recurring thread in Java and Ruby. The reason is simple: "pass-by-reference" has the word "reference" in it and thus people assume it has something to do with the language's "references". Not really.

In Ruby and Java "reference" is a term that is close to the concept of "pointer" in C. You have a handler that somehow points to something, rather than being that very something. The language may hide the indirection for you. Usage is straight in Ruby or Java. Not so in C or Perl, where to go from a pointer/reference to their objects you need an arrow.

When we talk about "pass-by-value", though, the fact that the language has references is irrelevant. We are really talking about where's the value associated with the parameters' name. Particularly we mean that the value is stored in an area unrelated to the storage in the callee. Let's see this.

Say we have an assignment


    a = 1

This assignment means that somewhere there's an association between the name "a" and the value "1", which is itself stored somewhere:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+

If you assign


    b = 1

conceptually we get two associations to two different value storages:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  1  |
    +-----+       +-----+

In particular, if in the next line we change b:


    b = 2

we all know the situation becomes


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  2  |
    +-----+       +-----+

In particular a still holds 1.

In languages like Perl you can have this other diagram:


    +-----+
    |  a  | --+
    +-----+   |    +-----+
              +--> |  1  |
    +-----+   |    +-----+
    |  b  | --+
    +-----+

In Perl jargon you say that a and b are aliases. In that situation, any assignment to a is reflected in b, and any assignment to b is reflected in a. Those names are associated with the same storage area.

The terms "pass-by-value" and "pass-by-reference" are about names linked to storage. And with those pictures you can understand what they mean. I am gonna obviate scope to simplify this and use different variable names on purpose, so this is not exact, but the essence is there.

Say you have


  def foo(b)
    ...
  end
  
  a = 1
  foo(a)

In a pass-by-value language the situation is:


    +-----+       +-----+
    |  a  | ----> |  1  |
    +-----+       +-----+
 
    +-----+       +-----+
    |  b  | ----> |  1  |
    +-----+       +-----+

The interpreter or whoever runs your language performs a copy behind the scenes of the storage area associated with "a", and associates the new one with "b". That's why if you reassing to b inside foo a is unaffected.

On the other hand, in a pass-by-reference language the situation is:


    +-----+
    |  a  | --+
    +-----+   |    +-----+
              +--> |  1  |
    +-----+   |    +-----+
    |  b  | --+
    +-----+

That's why you can implement swap in such languages.

But I can change the state of a mutable object in Ruby/Java because I pass a reference!

That is true, and it has no bearing with this. Since Ruby is pass-by-value, you can be certain that when the method returns your variable will refer to the same object. object_id is guaranteed to be the same after a method invocation (modulo black magic). Same for Java.

But I can change the integer a variable holds by passing a pointer in C!

That is true, but you are not passing the integer, you are passing a pointer to the integer. Since C is pass-by-value, if you had a variable holding the pointer before the call, you can be totally certain the variable will hold the same exact pointer after the call.

Summary

The terms pass-by-value and pass-by-reference are about links from names to storage areas, they have nothing to do with the references or pointers of your language.

That's a bit simplified, in Perl for example the aliases happen within @_, but that's the key idea.

8 Aug 2010 (updated 8 Aug 2010 at 19:43 UTC) »

When Classes Leak Into Ruby Contracts

Sometimes Ruby APIs document non-Rubyesque expectations that artificially separate "classes" and "objects", a la Java. I'd like to give you a few examples to depict what I mean, and explain why that's artificial later.

First example, the Rack specification says that

A Rack application is an Ruby object (not a class) that responds to call.

That's not very idiomatic, why classes are banned? A Ruby programmer would expect this shorter contract:

A Rack application is an Ruby object that responds to call.

That's it, the nature of the object is irrelevant to Rack, the only thing that matters is that the object responds to call (with such and such signature). Indeed a class that responds to call is a perfectly valid Rack application. The implementation is Rubyesque, but the wording in the docs is not.

Another example taken from the chapter on routing of O'Reilly's Rails 3 in a Nutshell:

Constraints may either be specified as a hash, a class implementing a matches? class method, or a class that responds to a call method, such as a Proc object.

The suspicious bit in that contract is "a class implementing a matches? class method". Indeed there's no requirement in the routing system that you pass a class. All it matters is that you pass any object that responds to matches?, see:


    constraint.respond_to?(:matches?) && !constraint.matches?(req)

That's idiomatic Ruby, where the interface is the only thing that matters, classes are irrelevant.

Classes Are Ordinary Objects

Technically in Ruby there are no "class methods" as opposed to "instance methods". Ruby only has instance methods. Let me summarize how this works.

If you define a Person class having a name instance method, instances of Person respond to name. No surprises here. But individual person instances can respond to more stuff:


    def person.custom_method
      ...
    end

In the example above, the object stored in the person variable also responds to custom_method. Such a method targeted to a particular instance is called a singleton method. You can for example build simple mocks this way:


    o = Object.new
    def o.name
      "John"
    end
    # Now pass o to code that expects anything responding to #name.

And you can also override methods defined in the class of the object.

In Ruby classes are objects. When you write Person, that's an ordinary constant. Totally ordinary. It is the same kind of ordinary constant as


    X = 1

No difference. You can think of


    class Person
    end

as being equivalent to


    Person = Class.new

The Ruby interpreter then processes the class definition body, but as far as the constant is concerned that's it. In fact, if you have an anonymous class and assign it to a constant later, then it gets its name automatically after the constant's name.

So, this is a key point, the Person constant is ordinary, it happens to evaluate to a class object, the same way X above evaluates to an integer. And here is when Ruby deviates from other OO languages: classes are ordinary objects also, objects of type Class:


    klass = Person
    person = klass.new

That works because Person just evaluates to a class object, and as with any other object you can store classes in variables and pass them around, and that object responds to the new method, the same way person responds to name. Why? Because klass is an object of the class Class, which defines new among its instance methods. Simple and elegant.

So at this point you need to forget a bit mental schemas coming from other languages and open your mind to accept the derivations of this particular OO model.

Since classes are ordinary objects, you can also define singleton methods on them, the same way we did with person before. Do you recognize now this idiom?


    class Person
      def self.find_by_name(name)
        ...
      end
    end

In the body of a class self is the class object in scope, and so that is just defining a singleton method on it.

All classes are instances of Class, a "class method" is any method a class responds to, which may come from Class, or be defined for particular classes, that is, singleton methods.

Class Methods, Fine

I am fine with the term "class method" in Ruby as long as we know what we are talking about. Technically Ruby has no such things, everything are instance methods, but you can take "class method" as short for "an instance method of the class object". Depending on their intended usage, class methods are also referred to as "macros", also a convenient term, think has_many in Active Record.

But even if you can talk about "class methods" in that sense, you rarely need to tell classes from non-classes in API contracts based on interfaces.

12 Jun 2010 (updated 12 Jun 2010 at 23:20 UTC) »

Ruby Hero 2010

I was awarded Ruby Hero 2010 in RailsConf this week. I am deeply grateful and honored by this recognition I received together with José Valim (devise, Rails core team), Nick Quaranto (RubyGems.org), Aaron Patterson (Nokogiri, SQLite driver, and a ton of other software), Wayne Seguin (RVM awesomeness), and Gregory Brown (Ruport, Prawn, Ruby Best Practices...).

6 May 2010 (updated 6 May 2010 at 11:28 UTC) »

Progress [on The Last Supper] was steady but slow, as the artist worked on in his typical thoughtful and meditative way. He spent considerable time roaming the streets of Milan looking for suitable models for faces of the apostles. By 1497 the only part left to complete was the head of Judas. At that point, the prior of the convent became so impatient with Leonardo's slowness that he complained to the duke, who summoned the artist to hear his reasons for the delay. According to Vasari, Leonardo explained to the Moor that he was working on The Last Supper at least two hours a day, but that most of his work took place in his mind. He went on, slyly, to say that, if he did not find an appropriate model for Judas, he would give the villain the features of the petulant prior. Ludovico was so amused by Leonardo's reply that he instructed the prior to be patient and let Leonardo finish his work undisturbed.

The Science of Leonardo, Fritjof Capra, pages 95--96.

Rails Committer

I've been contributing to Ruby on Rails on a regular basis, had almost 100 code patches, and about 500 doc patches, and was recently granted commit right. That's so great! It will allow me to work with more agility and have a little more freedom to do stuff.

10 Mar 2010 (updated 10 Mar 2010 at 15:05 UTC) »

An around filter to temporarily apply changes

A client of mine has an admin tool where people can edit stuff, but modifications need approval. They are represented in the application rather than applied right away as you normally do.

Changes are basically stored in the form of method names, ID of receiver, and serialized arguments, there are a handful of them. Applying changes is dynamic invocation, you get the idea.

My client wanted an approval interface where he could select a handful of changes to the same model, submit, and get a split view with the current website on the left, and the resulting website on the right.

The tricky part is that views may access the database, for example using named scopes (from a MVC viewpoint that's fine, the view is clean it happens that a named scope triggers a query). And the 3-line solution was to write an around filter that started a transaction, yielded to the action, and rolled the transaction back.

30 Jan 2010 (updated 30 Jan 2010 at 18:23 UTC) »

Tracking Class Descendants in Ruby (II)

My previous post explains a way to keep track of a class' descendants, and encapsulates the technique into a module.

There are two things you may want to do different: Since all descendants inherit the descendants class method you may prefer them to be functional. On the other hand, the module defines the inherited class method into the base class because it needs it to be a closure. That may work for some particular need, but it is not good for a generic solution. The inherited hook is the business of your client's code.

Now we'll see a different approach that addresses both concerns. Using the same hook any class in the hierarchy may easily keep track of its direct subclasses, and compute its descendants:


    class C
      def self.inherited(subclass)
        subclasses << subclass
      end
 
      def self.subclasses
        @subclasses ||= []
      end
 
      def self.descendants
        subclasses + subclasses.map(&:descendants).flatten
      end
    end

In the previous solution the inherited hook needed to ensure descendants was invoked on the root of the hierarchy. In this solution it doesn't care because we precisely take advantage of polymorphism. The way it is written a class pushes into its own @subclasses instance variable, which is what we want.

The module that encapsulates that pattern is much simpler:


    module DescendantsTracker
      def inherited(subclass)
        subclasses << subclass
        super
      end
 
      def subclasses
        @subclasses ||= []
      end
 
      def descendants
        subclasses + subclasses.map(&:descendants).flatten
      end
    end
 
    class C
      extend DescendantsTracker
    end

You know extend is like doing an include in the metaclass of C. In particular we are not defining C.inherited, we are defining a method with the same name in an ancestor of the metaclass. That way C can still define its own inherited class method. A call to super within such a C.inherited will go up the next ancestor of the metaclass, eventually reaching the inherited from DescendantsTracker.

29 Jan 2010 (updated 30 Jan 2010 at 18:25 UTC) »

Tracking Class Descendants in Ruby

I am going through all Active Support core extensions lately because I am writing the Active Support Core Extensions guide, due for Rails 3. There are some patches in master as a result of that walkthrough, and I am now focusing on keeping track of descendants in a class hierarchy.

A known technique uses ObjectSpace.each_object. That is a method that receives a class or module as argument and yields all objects that have that class or module among their parents. Since classes are instances of the class Class, you can select descendants of class C this way:


    descendants_of_C = []
    ObjectSpace.each_object(Class) do |klass|
      descendants_of_C << klass if klass < C
    end

That is a brute force approach, it works, but it is inefficient. JRuby even disables ObjectSpace by default for performance reasons.

A better approach is to leverage the inherited hook. Classes may optionally implement a class method inherited that is called whenever they are subclassed. The subclass is passed as argument:


    class User
      def self.inherited(subclass)
        puts 0
      end
    end
 
    class Admin < User
      puts 1
    end
 
    # output is
    0
    1

That's a perfect place to keep track of descendants:


    class C
      class << self
        def inherited(subclass)
          C.descendants << subclass
          super
        end
 
        def descendants
          @descendants ||= []
        end
      end
    end

In that code we have an array of descendants in @descendants. That is an instance variable of the very class C. Remember classes are ordinary objects in Ruby and so they may have instance variables. It is better to use an instance variable instead of a class variable because class variables are shared among the entire hierarchy of the class and we need an exclusive array.

Another fine point is that we force descendants to be the one in the C class. If we didn't and we had A < B < C, the hook would be called when A was defined, but by polymorphism it would be B.descendants what would be called, thus setting B's instance variable @descendants. That is not what we want.

The call to super is just a best practice. In general a hook like this should pass the call up the hierarchy in case parents have their own hooks.

That pattern can be implemented in a module for reuse indeed:


    module DescendantsTracker
      def self.included(base)
        (class << base; self; end).class_eval do
          define_method(:inherited) do |subclass|
            base.descendants << subclass
            super
          end
        end
        base.extend self
      end
 
      def descendants
        @descendants ||= []
      end
    end
 
    class C
      include DescendantsTracker
    end

A class only needs to include DescendantsTracker to track its descendants.

When the module is included in a class Ruby invokes its inherited hook. The hook receives the class that is including the module, and we leverage that to inject the class methods we saw before. For inherited we open the metaclass of base and define the method in a way that has base in scope, which is something we saw before we need. After that we add the descendants class method with an ordinary extend call.

Update: There's a followup to this post.

3 Jan 2010 (updated 3 Jan 2010 at 20:13 UTC) »

Rails Tip: Avoid mixing require and Rails autoloading

I've seen in a few Rails applications warnings about constants being redefined at some point. Problem was always the same: a file was autoloaded and required afterwards, and this results in the file being interpreted twice. If the class or module defines ordinary constants you may be lucky and see a warning, but if not you may not even be aware of it. Let me explain why this happens.

For example, given:


  # lib/utils.rb
  module Utils
    X = 1
  end

If we autoload the module and then require the file:


   $ script/runner 'Utils; require "utils"'

a warning is issued:


   warning: already initialized constant X

This is an artificial example, but in practice it may be the case for instance that some initializer autoloads Utils and later a model requires lib/utils.rb. Of course you don't need and shouldn't do that, but perhaps the model was written by someone not conversant in Rails or whatever.

OK, the warning is telling us lib/utils.rb is being interpreted twice. That happens both in development and production modes, but for different reasons.

In development mode Rails autoloading uses Kernel#load by default to be able to reinterpret code per request. So, the usage of the constant Utils triggers the interpretation of lib/utils.rb with load, and since require knows nothing about that file it happily interprets its content again.

In production mode Rails autoloading uses require, and that is supposed to run the file once, what's the matter?

When require loads something, it stores its path in the array $":


   $ ruby -rdate -e 'puts $"'
   enumerator.so
   rational.rb
   date/format.rb
   date.rb

require checks that array to see whether a given file was already loaded before it attempts to go for it. If it was there's nothing to do and just returns (false). The point is that require does not detect whether two different paths point to the same file, and Rails autoloading passes absolute pathnames to autodiscovered source files:


   $ script/runner -e production 'Utils; require "utils"; puts $"' 
   ...
   /Users/fxn/tmp/test_require/lib/utils.rb
   utils.rb

Since they do not match, the file is again loaded twice.

The solution to this gotcha is as simple as removing the call to require. An idiomatic Rails application names its files after the classes or modules they define and delegate their loading to the dependencies mechanism. Generally, the only calls to require load external libraries.

526 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!