<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for avassalotti</title>
    <link>http://www.advogato.org/person/avassalotti/</link>
    <description>Advogato blog for avassalotti</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Thu, 23 May 2013 20:42:58 GMT</pubDate>
    <item>
      <pubDate>Wed, 30 Dec 2009 00:07:57 GMT</pubDate>
      <title>Why you cannot pickle generators</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=38</link>
      <guid>http://peadrop.com/blog/2009/12/29/why-you-cannot-pickle-generators/</guid>
      <description>&lt;p&gt;Joseph Turian wrote &lt;a href="http://blog.metaoptimize.com/2009/12/22/why-cant-you-pickle-generators-in-python-workaround-pattern-for-saving-training-state/" &gt;a post about regarding pickling generator&lt;/a&gt; on his blog. In his post, he says:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;&lt;p&gt;However, generators become problematic when you want to persist your experiment&#x2019;s state in order to later restart training at the same place. Unfortunately, &lt;a href="http://bugs.python.org/issue1092962" &gt;you can&#x2019;t pickle generators in Python&lt;/a&gt;. And it can be a bit of a &lt;a href="http://en.wiktionary.org/wiki/pain_in_the_ass" &gt;PITA&lt;/a&gt; to workaround this, in order to save the training state.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This caught my attention, because I was involved in the decision, he cites, to not allow generators to be pickled in CPython. Although Joseph&amp;#8217;s examples are a bit convoluted, it is pretty clear why his generators cannot be pickled automatically&amp;mdash;i.e., Python cannot pickle the operating system&amp;#8217;s state, like file descriptors.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s ignore that problem for a moment and look what we would need to do to pickle a generator. Since a generator is essentially a souped-up function, we would need to save its bytecode, which is not guarantee to be backward-compatible between Python&amp;#8217;s versions, and its frame, which holds the state of the generator such as local variables, closures and the instruction pointer. And this latter is rather cumbersome to accomplish, since it basically requires to make the whole interpreter picklable. So, any support for pickling generators would a large number of changes to CPython&amp;#8217;s core.&lt;/p&gt;

&lt;p&gt;Now if an object unsupported by pickle (e.g., a file handle, a socket, a database connection, etc) occurs in the local variables of a generator, then that generator could not be pickled automatically, regardless of any pickle support for generators we might implement. So in that case, you would still need to provide custom &lt;code&gt;__getstate__&lt;/code&gt; and &lt;code&gt;__setstate__&lt;/code&gt; methods. This problem renders any pickling support for generators rather limited.&lt;/p&gt;

&lt;p&gt;Anyway, if you need for a such feature, then look into &lt;a href="http://www.stackless.com/" &gt;Stackless Python&lt;/a&gt; which does all the above. And since Stackless&amp;#8217;s interpreter is picklable, you also get process migration for free. This means you can interrupt a tasklet (the name for Stackless&amp;#8217;s green threads), pickle it, send the pickle to a another machine, unpickle it, resume the tasklet, and voil&#xE0; you&amp;#8217;ve just migrated a process. This is freaking cool feature!&lt;/p&gt;

&lt;p&gt;But in my humble opinion, the best solution to this problem to the rewrite the generators as simple iterators (i.e., one with a &lt;code&gt;__next__&lt;/code&gt; method). Iterators are easy and efficient space-wise to pickle because their state is explicit. You would still need to handle objects representing some external state explicitly however; you cannot get around this.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Sun, 5 Apr 2009 13:06:04 GMT</pubDate>
      <title>Porting your code to Python 3</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=37</link>
      <guid>http://peadrop.com/blog/2009/04/05/porting-your-code-to-python-3/</guid>
      <description>&lt;div id=mp5&gt;
&lt;p&gt;See &lt;a href="http://www.advogato.org/mp5.html" &gt;the plain HTML version&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote class=ins&gt;
  The following is a write-up of the presentation I gave to a group of Python
  developers at &lt;a href="http://montrealpython.org/?p=43" &gt;Montreal Python
  5&lt;/a&gt; on February 26&lt;sup&gt;th&lt;/sup&gt;. This is basically a HTML-fied copy of the
  notes I prepared before the presentation. I haven&amp;#8217;t done editing, so expect
  a few grammar mistakes there and there. My complete presentation slides are
  available &lt;a href="http://www.advogato.org/slides/mp5.pdf" &gt;here&lt;/a&gt;. A video was taped should be
  released in the upcoming weeks (I will post a link here when I finally get
  my hands on it). Please note that if your looking for more complete guide
  about Python 3 (and more accurate), I highly recommend that you read the 
  &lt;a href="http://docs.python.org/3.0/whatsnew/3.0.html" &gt;What&amp;#8217;s New In Python
  3.0&lt;/a&gt; document and the &lt;a href="http://python.org/dev/peps/" &gt;Python
  Enhancement Proposals&lt;/a&gt; numbered above 3000.
&lt;/blockquote&gt;

&lt;p&gt;You may wonder why we did Python 3 afterall. The motivation was simple: to fix
old warts and to clean up the language before it was too late. Python 3 is not
complete rewrite of Python; it still pretty much the good old Python you all
love. But I am not going to lie. There are many changes in Python 3; many that
will cause pain when you will port your code; and so many that I won&amp;#8217;t be able
to cover them all in this talk. That is why I will focus only on the changes
that will need to know to port your code. If you want to learn about all new
and shiny features, you will need to visit the python.org&amp;#8217;s website and the
online documentation of Python 3.&lt;/p&gt;

&lt;p&gt;In the second part of this presentation, I will go over the steps needed to
port a real library to Python 3. Hopefully, this part will give you a basic
knowledge and tools to tackle the problems linked to the migration.&lt;p&gt;

&lt;p&gt;Finally, I will give you an insider&amp;#8217;s view of the upcoming changes in
Python 3.1, which suppose to be released later this year.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s starts with the most obvious change in Python 3&amp;#8212;that is print is now
function. Some people really don&amp;#8217;t like this change (mostly because it makes
hello world one character longer). But making print a function is actually a
good thing. First, it more flexible; you can now change the string separator,
pass print() as an argument or even override the function completely.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-04.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-04.png" /&gt;&lt;/a&gt;

&lt;p&gt; In addition, the syntax is much cleaner&amp;#8212;no weird &gt;&gt;sys.stderr
anymore. On other the hand, it is true that it takes a bit of time to get used
to the extra parentheses. Thankfully, converting your code to use the new
print() is easy and completely automated. You just run the 2to3 tool (I will
talk more about 2to3 later) and you&amp;#8217;re done.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-05.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-05.png" /&gt;&lt;/a&gt;

&lt;p&gt;There is one thing special about the keyword arguments of print(); they need
to be explicitly written out. In other words, they can only be supplied as
keywords and never as a positional argument.&lt;/p&gt;

&lt;p&gt;This behavior is actually a new feature in Python 3, called keyword-only
arguments. This is one of things that might surprise you when write new code
with Python 3 (it did surprise me more once), since the error message is not
that great. It makes sense from an implementation point-of-view, but not so
much the user point-of-view. I hope someone will suggest something better in
the future, but in mean time we have to live with this funky error
message.&lt;/p&gt;

&lt;p&gt;Keyword-only arguments are really useful when you have function that takes
a variable number of arguments and you want to add optional options to
it&amp;#8212;just like &lt;code&gt;print()&lt;/code&gt;. Another good use of this feature is for
forcing your API users to explicitly write out their intent. For example, this
is currently for the list.sort() method and the sorted() function.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-06.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-06.png" /&gt;&lt;/a&gt;

&lt;p&gt;Finally, the syntax for making a function take keyword-only arguments is the
following:&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-07.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-07.png" /&gt;&lt;/a&gt;

&lt;p&gt;There is also a way to do the same thing in C, but that is out of scope of
this presentation.&lt;/p&gt;

&lt;p&gt;Now, let me introduce you &lt;em&gt;the&lt;/em&gt; big change in Python 3: Unicode
throughout. &lt;ins&gt;(Ed. There was a big applause when I announced this at
Montreal Python. So, I guess the conversion pain did worth it.)&lt;/ins&gt; This is
huge; it took six Python Enhancement Proposals (PEPs, for short) to cover the
changes related to Unicode. And I am pretty sure that not everything is
covered in these. For this reason, I hope you will understand that I cannot
cover everything today. So, what are these changes?&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-09.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-09.png" /&gt;&lt;/a&gt;

&lt;p&gt;For one, all strings are Unicode by default. This means you cannot treat text
as bytes, and vice versa, anymore. For example, if you read some bytes from
disk or a network, you will need decide whether it is data or text; and this
isn&amp;#8217;t always obvious. Is a filename data or text? Is command-line argument
data or text? Or, is environment variables data or text? In many cases, Python
core developers had to make compromises when converting the old APIs to
Unicode.&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s examine the case of filenames. The first problem we run into is: how
do we detect the character encoding used by the filesystem? There is no
standard way of doing this that works on every platforms supported by Python.
On MacOS X, life is simple; we just use UTF-8. On Windows, we can use the Wide
API and things mostly work. On Unix however, the encoding can be anything. So,
we cannot tell in advance what the encoding will be; we have to detect at
runtime with langinfo API (if present). And this leads to some interesting
bootstrapping issues, since some codecs in Python are not built-in. For
example, there are known problems with Python scripts running from a directory
whose path contains non-ASCII characters.&lt;/p&gt;

&lt;p&gt;Another problem we run into is: what should we handle filenames encoded
incorrectly? Even if we know that the filesystem uses UTF-8, that doesn&amp;#8217;t mean
all filenames will be a valid UTF-8 byte sequence. In Unix for example, there
is only nul and slash that cannot appear in a filename; so, it is possible to
construct filenames that cannot be interpreted as a text string. And this is
basically what I want to say; it is not always clear what is text and what is
data. So in Python 3, most system APIs accept bytes as well as strings as a
work-around.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-12.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-12.png" /&gt;&lt;/a&gt;

&lt;p&gt;However, the problems I have described are not as bad as it sounds. In most
cases, the Unicode enhancements will lead to better code and also fewer bugs.
And having Unicode throughout has opened the door for other
internationalization improvements as well. One of these improvements is
non-ASCII identifiers are now supported (but not advocated).&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-13.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-13.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-14.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-14.png" /&gt;&lt;/a&gt;

&lt;p&gt;Another feature of Python 3 is the new I/O library designed with Unicode
in-mind. From a core developer&amp;#8217;s point of view, this change is fairly large: a
departure of C stdio and a brand-new I/O class hierarchy completely written in
Python (which is currently being rewritten in C for performance).  However,
from the point of view a typical Python developer, there isn&amp;#8217;t much that has
changed. I/O still work the same as before; &lt;code&gt;open()&lt;/code&gt; still return
file-like object, which an be written to and read from just like before.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-15.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-15.png" /&gt;&lt;/a&gt;

&lt;p&gt;But if you want more control over your I/O, now you can. Just import the io
module, and use or derive a class that fits your needs. One nice thing about
the new I/O is once you&amp;#8217;ve defined the raw byte-based interface, you can easily
add buffering and text-handling features.&lt;/p&gt;

&lt;p&gt;Take for example a network socket. What can we do with a socket? Well, we can
read some bytes from it and maybe also write to it too. But, we cannot seek it
like a file. Usually, we call such objects streams. So, we can derive our
&lt;code&gt;SocketIO&lt;/code&gt; class from &lt;code&gt;io.RawIOBase&lt;/code&gt; and define our
methods. Need buffering? Just wrap an instance of &lt;code&gt;SocketIO&lt;/code&gt; with
&lt;code&gt;io.BufferedReader&lt;/code&gt; or &lt;code&gt;io.BufferedWriter&lt;/code&gt;. Need
text-handling too? Well wrap your instance
with &lt;code&gt;io.TextIOWrapper&lt;/code&gt;. And that&amp;#8217;s all there is to it.&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re used to Java I/O libraries, this should sound fairly similar; and
this is intentional. The main difference is the new I/O in Python simpler. If
you want to learn more about the details the new I/O library, I encourage you
to read the PEP and the online documentation.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-16.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-16.png" /&gt;&lt;/a&gt;

&lt;p&gt;Now, let&amp;#8217;s talk about the change that will probably cause the most pain during
the transition: the standard library reorganization. In Python 3, many modules
were remove, renamed and repackaged. Initially, the reorganization was not
part of the plans of Python 3. But since Python 3 was going to be
backward-incompatible anyway, many developers (myself included) saw a chance
to clean up the library and remove the silly old stuff all at once. So,
instead of having many incompatible releases over time, we have big
one.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-18.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-18.png" /&gt;&lt;/a&gt;

&lt;p&gt;Thankfully, the 2to3 tool will handle almost all the work for you.
Unfortuately, 2to3 won&amp;#8217;t help with removals. This means you will need to
change your code to not use these before porting to Python 3. PEP 3108
documents all the changes we have done; it also suggests replacements for
modules that were removed. So, this should be the first place to look at
whenever you have a problem with a reorganized module.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-19.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-19.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-20.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-20.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-21.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-21.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-22.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-22.png" /&gt;&lt;/a&gt;

&lt;p&gt;Also if you the pickle module, the standard library reorganization will make
it hard for you to create pickle data streams that works both on Python 2 and
3. The problem is pickle saves class and function objects by named reference.
This means if you have pickle data created with Python 2, in which a instance
whose class was renamed in Python 3, pickle will not be able to recreate the
instance in question. Unfortunately, there is nothing yet to help you with
that problem. Although it is possible to subclass Unpickler and modify it to
rewrite names on-the-fly, this is not very convenient.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-23.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-23.png" /&gt;&lt;/a&gt;

&lt;p&gt;In addition to stdlib reorganization, the behavior of some well known APIs has
changed. In particular, many methods that used to return lists, now return an
iterator or a view. For example, dict&amp;#8217;s &lt;code&gt;keys()&lt;/code&gt;, &lt;code&gt;items()&lt;/code&gt; and
&lt;code&gt;values()&lt;/code&gt; &lt;ins&gt;(Ed. &lt;code&gt;values()&lt;/code&gt; is not actually a
set-like object for the obvious reason that a dictionary may contain duplicate
values. This was an error from my part.)&lt;/ins&gt; &lt;/ins&gt;are no longer lists; they
return a set-like object called a view. Personnally, I found this change very
nice when working graphs implemented using dicts, because I could now use
standard set operations, like addition and subtraction, on the views.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-24.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-24.png" /&gt;&lt;/a&gt;

&lt;p&gt;Similarly, many built-in functions now return iterators instead of lists. This
is the case of &lt;code&gt;map()&lt;/code&gt;, &lt;code&gt;filter()&lt;/code&gt; and &lt;code&gt;zip()&lt;/code&gt;. 
For &lt;code&gt;map()&lt;/code&gt; and &lt;code&gt;filter()&lt;/code&gt;, it is typically a good idea
to rewrite them as list-comprehension. Another change in the same line is
&lt;code&gt;xrange()&lt;/code&gt; is the new &lt;code&gt;range()&lt;/code&gt;. For most code, this
requires no modifications. Again, 2to3 handles these changes for you.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-25.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-25.png" /&gt;&lt;/a&gt;

&lt;p&gt;Continuing on API changes, some special methods have been removed or renamed.
For example, the next() method on iterators is now called
&lt;code&gt;__next__()&lt;/code&gt;. To get the next item of an iterator, use the built-in
function &lt;code&gt;next()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Also, &lt;code&gt;__getslice__&lt;/code&gt; and friends were removed in favor of
&lt;code&gt;__getitem__&lt;/code&gt;.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-28.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-28.png" /&gt;&lt;/a&gt;

&lt;p&gt;The special methods &lt;code&gt;__hex__&lt;/code&gt; and &lt;code&gt;__oct__&lt;/code&gt; were 
removed in favor of &lt;code&gt;__index__()&lt;/code&gt;. Generally, this requires no
change in your code. Note, 2to3 will not remove the old methods.&lt;/p&gt;

&lt;p&gt;Another fairly important change in Python 3 is the simplification of the rules
for ordering comparisons. So in Python 3, the old three-way comparison rules
has completely replaced by a much simpler (and faster too) mechanism
&lt;ins&gt;(Ed. There wasn&amp;#8217;t much rejoice when I presented this change. People kept
asking why Python doesn&amp;#8217;t generate comparison methods automatically from
&lt;code&gt;__lt__&lt;/code&gt; and &lt;code&gt;__eq__&lt;/code&gt;)&lt;/ins&gt;.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-29.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-29.png" /&gt;&lt;/a&gt;

&lt;p&gt;Clearly, 2to3 won&amp;#8217;t translate old three-way compares, so you will need to
support three-way and rich comparisons if you want your code to work both on
Python 2 and 3. The changes needed are usually straightforward, so this
generally not a problem.&lt;/p&gt;

&lt;p&gt;We already saw that the syntax for the print statement and unicode string was
changed. So, the remaining changes I want to talk about are the other
syntactic changes in Python 3. For the most part, the new syntax niceties are
also available in Python 2.6 has optional features; the difference in Python 3
is you&amp;#8217;re now required to use them. But don&amp;#8217;t worry, 2to3 will handle these
changes fairly well. So what are these changes?&lt;/p&gt;

&lt;p&gt;First, we have new syntax for catching and raising exceptions. In particular,
the syntax for saving an exception was simplify to remove ambiguities.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-32.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-32.png" /&gt;&lt;/a&gt;

&lt;p&gt;Similarly, the syntax for raising exceptions was simplified. Note that all
exceptions must derive from BaseException or, more commonly, from the
Exception class. This was optional in Python 2; this is now enforced in Python
3. As a consequence of the new syntax, tracebacks must be new set explicitly
via the &lt;code&gt;__traceback__&lt;/code&gt; attribute. However if you need to do that,
 you probably want to check out a new feature called exception chaining.&lt;/p&gt;

&lt;p&gt;In Python 3, we also have new syntax for specifying metaclasses. To do so, we
allowed keywords arguments in after base classes list in the class
definition. Currently, this is only used to support the new metaclass syntax;
but this could be used for other purposes, as well, as long the metaclass used
supports it.&lt;/p&gt;

&lt;p&gt;Continuing, relative imports now need to use the from-dot-package syntax. If
you omit the dot, it will be interpreted as an absolute import.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-33.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-33.png" /&gt;&lt;/a&gt;

&lt;p&gt;Now, let me show you two lovely additions to Python 3: set and dict
comprehension. But first, I need to introduce the new syntax for set
literals.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-34.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-34.png" /&gt;&lt;/a&gt;

&lt;p&gt;We can almost guess what is the syntax set and dict comprehension.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-35.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-35.png" /&gt;&lt;/a&gt;

&lt;p&gt;So, we are now ready for the real thing: migrating to Python 3. There is more
than one way to approach the migration and there is no approach that will fit
all your needs. In many cases, you have to experiment and choose whatever work
best for you. Also, I am only going to cover the issue of migrating Python
code. If you want to migrate C extensions, you will need to check out the
online documentation.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-38.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-38.png" /&gt;&lt;/a&gt;

&lt;p&gt;The very first step before migrating is to verify you have an excellent test
coverage. If you do not have a test suite, it would be a good time to start
investing time to create one. I wouldn&amp;#8217;t even think about migrating to Python
3 without test suite, since it is practically impossible to predict where your
code is going to break.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-40.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-40.png" /&gt;&lt;/a&gt;

&lt;p&gt;Once you verified your test suite was alright, you should begin by porting
your code to Python 2.6; generally this is effortless. Then, turn on the -3
flag of Python 2.6. This will enable warnings about features that have been
removed or changed in Python 3. Run your tests and fix all the warnings you
see.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-39.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-39.png" /&gt;&lt;/a&gt;

&lt;p&gt;It is also a good idea to modernize your code at this stage; and try to reduce
the semantic gaps as much as possible. For example, start using the iterator
version of &lt;code&gt;dict.keys()&lt;/code&gt;, &lt;code&gt;.values()&lt;/code&gt; and
&lt;code&gt;.items()&lt;/code&gt;; avoid implict str and unicode coercions; use
&lt;code&gt;__getitem__&lt;/code&gt; instead of &lt;code&gt;__setslice__&lt;/code&gt;; etc. Doing this
will decrease the amount of changes the 2to3 translator will have to do, and thus
reduce the chances of introducing new bugs.&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-42.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-42.png" /&gt;&lt;/a&gt;

&lt;p&gt;Once you are done with that, you now ready to port your code to Python
3. This is where it become tricky. First, you will need to decide how you will
maintain the Python 3 version of your project.&lt;/p&gt;

&lt;p&gt;There is three main possibilities at this point. You can, for one, remove
support for Python 2 and move your project completely to Python 3. This is not
good idea if you already have a lot of users (in the case of a library).&lt;/p&gt;

&lt;p&gt;Another possibility is to modify your code so that the 2to3 tool can translate
it, without manual intervention, to a working Python 3 version. This is
approach recommended by Python&amp;#8217;s core-devel team if you maintains a library that
needs to support both Python 2.6 and Python 3. So when you do changes to your
code, you edit the Python 2.6 version and run the 2to3 tool again to forward
your changes, rather than editing the Python 3 version of the source
code. This approach works, but I find it unnecessarily painful as you still
end up maintaining a lot of crufts.&lt;/p&gt;

&lt;p&gt;So, the approach I prefer is to create a separate branch for Python 3 and
start maintaining two lines of development. This works great if you use one of
these fancy DVCSs, as you can do your changes in the Python 2 branch and then
forward your changes to the Python 3 branch by simply merging them. And when
there is incompatibilities, you can run 2to3 tool on Python 3 code and it will
fix these for you. An advantage of this approach is it gives a change to clean
up your code and remove, from the Python 3 version, all that
backward-compatibility stuff you may have accumulated over the years. And for
many projects, this will be the only acceptable approach (mainly because of
the Unicode changes).&lt;/p&gt;

&lt;p&gt;Now, I would like to demo some of the features that are available that will
ease the transition to Python 3.&lt;/p&gt;

&lt;blockquote class=ins&gt;
  Ed. In this part, I have shown a short demo on how to use 2to3 to
  convert &lt;a href="http://www.feedparser.org/" &gt;feedparser&lt;/a&gt; to Python 3.0.
  This portion of the presentation was not prepared in advance and was
  interactive. If you want to see it, you will need to watch the video.
&lt;/blockquote&gt;

&lt;a href="http://www.advogato.org/slides/mp5-44.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-44.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-45.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-45.png" /&gt;&lt;/a&gt;

&lt;p&gt;Concluding remarks:&lt;/p&gt;

&lt;a href="http://www.advogato.org/slides/mp5-47.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-47.png" /&gt;&lt;/a&gt;
&lt;a href="http://www.advogato.org/slides/mp5-48.png" &gt;&lt;img alt="" src="/slides/mp5-thumb-48.png" /&gt;&lt;/a&gt;

&lt;blockquote class=ins&gt;
  Ending note. If you appreciated the content of this presentation or have
  suggestions, let me know! I am currently planning to do another talk at
  Montreal Python about extending and interfacing external code with
  Python. This presentation would mostly cover how to write extension using
  the C API of Python. As you can imagine, preparing a good presentation is
  lot of work. So any encouragement is welcomed.
&lt;/blockquote&gt;
&lt;/div&gt;
</description>
    </item>
    <item>
      <pubDate>Mon, 31 Dec 2007 00:06:01 GMT</pubDate>
      <title>How to not switch to Dvorak</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=36</link>
      <guid>http://peadrop.com/blog/2007/12/30/how-to-not-switch-to-dvorak/</guid>
      <description>&lt;p&gt;Once in a while, I practice to improve my touch typing skills. Most of
the time, I just find some online stuff or use KTouch. But today, I
wanted to try something different. I always hear good things about the
Dvorak keyboard layout &amp;#8212; i.e., how it&amp;#8217;s supposedly more efficient and
more comfortable than the Qwerty layout. Being a curious person, I
wanted to test this out.&lt;/p&gt;

&lt;p&gt;So when I opened up KTouch, I selected the Dvorak lecture, instead of
the typical Qwerty one. The first lessons were fairly easy. As I went
through the lecture, I managed to keep a fairly pace and accuracy &amp;#8212;
i.e., about 210 characters per minute with a 95% accuracy. About at
the fifth or sixth lesson, I said to myself: &amp;#8220;Wow, I must have been a
Dvorak typist in another life.&amp;#8221; I was really impressed how quickly I
had learnt the basics of the layout and I was indeed starting to
believe that the Dvorak layout was vastly superior to Qwerty.&lt;/p&gt;

&lt;p&gt;Shortly after, I was sold. At this point, I was thinking how
was going to remap my Emacs key bindings. :-)&lt;/p&gt;

&lt;p&gt;However, when I got to the tenth lesson, I found something strange,
very strange. The letter &amp;#8216;q&amp;#8217; on the Dvorak layout was in the upper row
on the left &amp;#8212; exactly where it is on the Qwerty layout.&lt;/p&gt;

&lt;p&gt;I stop typing for a second&amp;#8230;&lt;/p&gt;

&lt;p&gt;&amp;#8230;and look at the keyboard displayed on the screen.&lt;/p&gt;

&lt;p&gt;&amp;#8220;asdf asdf asdf&amp;#8221;&lt;/p&gt;

&lt;p&gt;Oops! I had forgot the change the actual layout of my keyboard. So, I
was still using Qwerty.&lt;/p&gt;

&lt;p&gt;Now, I realize that I have been victim of what they call the &amp;#8220;placebo
effect&amp;#8221;. This little anecdote has certainly thought me to be more
careful, in the future, when trying something new sold has &amp;#8220;better&amp;#8221;.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Fri, 14 Dec 2007 22:06:11 GMT</pubDate>
      <title>Changing</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=35</link>
      <guid>http://peadrop.com/blog/2007/12/14/changing/</guid>
      <description>&lt;p&gt;It been a while since I have written a blog post. It&amp;#8217;s not that I
haven&amp;#8217;t tried, or that I am lacking of ideas. It&amp;#8217;s just that these
things take forever to write. I found the hard way that
perfectionism is the enemy of getting things done (or in fact to get
them started in the first place).&lt;/p&gt;

&lt;p&gt;So now that I got a bunch of half-finished blog posts that I won&amp;#8217;t ever
finish, what should I do with them?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debian Packaging 101 (Part 2)&lt;/li&gt;
&lt;li&gt;Great books for young programmers&lt;/li&gt;
&lt;li&gt;Playing with memory&lt;/li&gt;
&lt;li&gt;ICFP 2007: Entangled in strings&lt;/li&gt;
&lt;li&gt;Summer of Code 2007: Final thoughts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Well, I guess that is my chance to start breaking my perfectionism habits
and put them &lt;a href="http://peadrop.com/unwritten/" &gt;on the web&lt;/a&gt;. (When I said &amp;#8220;half-finished&amp;#8221; I really meant it, by the way).&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 18 Oct 2007 17:04:52 GMT</pubDate>
      <title>Shell tricks: shorthands</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=34</link>
      <guid>http://peadrop.com/blog/2007/10/18/shell-tricks-shorthands/</guid>
      <description>&lt;p&gt;Even with tab completion, typing long commands is tedious. But,
there&amp;#8217;s something even worst: typing the same long commands again, and
again, and again&amp;#8230; So how do you solve that? It&amp;#8217;s simple: you shorten
them. Surprising, uh? Okay enough theory, let me show you some
examples.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a tedious command of Type-A:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% sudo aptitude install zsh
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Look at it carefully since you will need to hunt these long commands
down until none remains. Now, let me explain how you execute a such
command. Open up your personal shell initialization file
(e.g. &lt;code&gt;~/.bashrc&lt;/code&gt; for Bash, &lt;code&gt;~/.zshrc&lt;/code&gt; for Zsh, etc). Then, add the
following:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;alias spkgi="sudo aptitude install"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Reload your shell and finally, enjoy:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% spkgi zsh
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now I can introduce, as you can deduce, other shorten commands that
you can produce and reproduce:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;# Package Management
alias pkg="aptitude"
alias spkg="sudo aptitude"
alias spkgi="sudo aptitude install"
alias spkgu="sudo aptitude safe-upgrade"
alias spkgr="sudo aptitude remove"
alias spkgd="sudo apt-get build-dep"

# Miscellaneous Helpers
alias nc="rlwrap nc"
alias e=$EDITOR
alias se=sudoedit
alias reload="source ~/.zshrc"
alias g=egrep

# To produce annoying alliterations &lt;img src='http://peadrop.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /&gt; 
alias alli="cat /usr/share/dict/words | grep"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next after Type-A tedious commands, we have the Type-S ones. To
execute these, you will you need some sort of special shell
support. So, here&amp;#8217;s some examples of the Type-S monstrosity:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% find Lib/ -name '*.c' -print0 | xargs -0 grep ^PyErr
% find -name '*.html' -print0 | xargs -0 rename 's/\.html$/.var/'
% find -name '*.patch' -print0 | xargs -0 -I {} cp {} patches/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I hope you start to see some patterns (if you don&amp;#8217;t, then try
harder). The first one could (and should) be rewritten as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% rgrep --include='*.c' ^PyErr Lib/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But that isn&amp;#8217;t short enough for me, so I have a short helper:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;rg()
{
    filepat="$1"
    pat="$2"
    shift 2
    grep -Er --include=$filepat $pat ${@:-.}
}
# In Zsh, 'noglob' turns off globing.
# (e.g, "noglob echo *" outputs "*")
alias rg='noglob rg'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It is lovely to use:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% rg *.c ^PyErr Lib/
% rg *.c PyErr_Restore . -C 10 | less
% rg *.[ch] stringlib
% rg *.c ^[a-zA-Z]*_dealloc Modules/ Objects/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The second example is quite similar to the previous one. However, the
find/rename combination is much less common (at least for me) than the
find/grep one. This one needs to be broken in pieces. One obvious thing
to factor out is the &lt;code&gt;find -name&lt;/code&gt; with an alias:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;alias fname="noglob find -name"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Using this alias, you can rewrite the second example as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% fname *.html -print0 | xargs -0 rename 's/\.html$/.var/'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;#8217;s better, but it&amp;#8217;s not short enough yet. The ugly part of this
command is the &lt;code&gt;-print0 | xargs -0&lt;/code&gt;. I hate to type that. Wouldn&amp;#8217;t
it be nice if we could define an alias for it? How about:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;alias each="-print0 | xargs -0"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Unfortunately, that doesn&amp;#8217;t work since aliases are only expanded if
they are in the command position. Luckly, Zsh has that neat feature
called global aliases, which does exactly what we want.&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;alias -g each="-print0 | xargs -0"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With this feature of Zsh, the second example become:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% fname *.html each rename 's/\.html$/.var/'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, we can also attack the third one:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% fname *.patch each -I {} cp {} patches/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It is possible to shorten a bit by defining another alias combining
&lt;code&gt;each&lt;/code&gt; and &lt;code&gt;-I {}&lt;/code&gt;, but that won&amp;#8217;t make a big difference.&lt;/p&gt;

&lt;p&gt;Finally, there are the Type-R tedious commands. These are hard to
avoid, unless you&amp;#8217;re careful. Here&amp;#8217;s again some ridiculous examples to
help you recognize these redundant commands:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% gcc -o stackgrow stackgrow.c
% pkg show emacs-snapshot-bin-common emacs-snapshot-common emacs-snapshot-gtk emacs-snapshot
% cat ../lispref.patch ../lwlib.patch ../etc.patch | patch -p1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To reduce these, you don&amp;#8217;t need change your shell configuration; you
change your habits instead. Using alternations (which are
non-standard, but supported by most shells), you can rewrite the two
first example as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% gcc -o stackgrow{,.c}
% pkg show emacs-snapshot{{-bin,}-common,-gtk,}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, you are surely asking yourself: &amp;#8220;what is different about the
third one?&amp;#8221; Well, think about it. Got it? No? Ah, come on, it is
easy. Here&amp;#8217;s a hint:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% echo 'cat ../{lispref,lwlib,etc}.patch | patch -p1' | wc -c
45
% echo 'cat ../lispref.patch ../lwlib.patch ../etc.patch | patch -p1' | wc -c
61
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You like my hint, don&amp;#8217;t you? Here&amp;#8217;s the answer:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% echo 'cat ../li\t ../lw\t ../et\t | patch -p1' | wc -c
37
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Tab completion doesn&amp;#8217;t work well with prefix alternations. Even if the
command using alternation is shorter, it still doesn&amp;#8217;t beat good old
tab completion.&lt;/p&gt;

&lt;p&gt;And that&amp;#8217;s all folks. I surely have plenty of other tricks to show,
but that will be for the other posts of this short series.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Mon, 17 Sep 2007 06:07:37 GMT</pubDate>
      <title>Pretty Emacs Reloaded</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=33</link>
      <guid>http://peadrop.com/blog/2007/09/17/pretty-emacs-reloaded/feed/</guid>
      <description>&lt;p&gt;My popular&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" rel="footnote" &gt;1&lt;/a&gt;&lt;/sup&gt; &lt;a href="http://peadrop.com/blog/2007/01/06/pretty-emacs/" &gt;Pretty Emacs&lt;/a&gt; package just got a tad better.  I
transferred the package to the brand new &lt;a href="https://help.launchpad.net/PPAQuickStart" &gt;PPA service&lt;/a&gt; provided by
Launchpad. So, what&amp;#8217;s new about the package? First, I glad to announce
the long-awaited amd64 support. Also, I am adding Gutsy Gibbon to the
list of supported distributions.&lt;/p&gt;

&lt;p&gt;To use the updated package on Ubuntu 6.10 &amp;#8220;Edgy Eft&amp;#8221;, add the
following lines to your &lt;code&gt;/etc/apt/sources.list&lt;/code&gt; file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;deb     http://ppa.launchpad.net/avassalotti/ubuntu edgy main
deb-src http://ppa.launchpad.net/avassalotti/ubuntu edgy main
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To use the package on Ubuntu 7.04 &amp;#8220;Feisty Fawn&amp;#8221;, add the following
lines to your &lt;code&gt;/etc/apt/sources.list&lt;/code&gt; file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;deb     http://ppa.launchpad.net/avassalotti/ubuntu feisty main
deb-src http://ppa.launchpad.net/avassalotti/ubuntu feisty main
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To use the package on the development version of Ubuntu &amp;#8220;Gutsy
Gibbon&amp;#8221;, add the following lines to your &lt;code&gt;/etc/apt/sources.list&lt;/code&gt; file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;deb     http://ppa.launchpad.net/avassalotti/ubuntu gutsy main
deb-src http://ppa.launchpad.net/avassalotti/ubuntu gutsy main
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Unfortunately, if you still use Ubuntu 6.06 &amp;#8220;Dapper Drake&amp;#8221;, you will
have to keep using the older package release from my orignal
repository. I still support Ubuntu 6.06, but I won&amp;#8217;t update the
package with newer snapshots.&lt;/p&gt;

&lt;p&gt;After adding the repository to your software source list, upgrade your
version of the package with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo aptitude upgrade
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you do not have a previous version of the package already installed
and you desire to install it, do this instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo aptitude install emacs-snapshot emacs-snapshot-el
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Please note, my package does not have multi-tty support. If you want
multi-tty support and don&amp;#8217;t mind about using bitmap fonts, use Romain
Francoise&amp;#8217;s excellent &lt;a href="http://emacs.orebokech.com/" &gt;package of the CVS trunk&lt;/a&gt; for Debian. Also,
when upgrading the package you might get the following warning
message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;WARNING: untrusted versions of the following packages will be installed!

Untrusted packages could compromise your system's security.
You should only proceed with the installation if you are certain that
this is what you want to do.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is due to &lt;a href="https://bugs.launchpad.net/soyuz/+bug/125103" &gt;a bug&lt;/a&gt; in the PPA system. I believe that it will be
resolved quickly. So, you can safely ignore the warning message for
the moment.&lt;/p&gt;

&lt;p&gt;Final note, thank you everyone for trusting me and giving me some
great feedback about the package. I like to give special thanks to
&lt;a href="http://orebokech.com/" &gt;Romain Francoise&lt;/a&gt; and &lt;a href="http://mwolson.org/" &gt;Michael Olson&lt;/a&gt; for their work respectively on
emacs-snapshot and emacs22, during this summer.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr /&gt;
&lt;ol&gt;

&lt;li id="fn:1"&gt;
&lt;p&gt;A rough estimate tell me there is over 30&amp;#8197;000 people using my
package, where 88% of them are Feisty Fawn users and 11% are Edgy Eft
users.&amp;#160;&lt;a href="#fnref:1" rev="footnote" &gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 2 Aug 2007 20:09:33 GMT</pubDate>
      <title>Protected: Installation scripts review</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=32</link>
      <guid>http://peadrop.com/blog/2007/08/02/installation-scripts-review/feed/</guid>
      <description>&lt;form action="http://peadrop.com/blog/wp-pass.php" method="post"&gt;
    &lt;p&gt;This post is password protected. To view it please enter your password below:&lt;/p&gt;
    &lt;p&gt;&lt;label&gt;Password: &lt;input name="post_password" type="password" size="20" /&gt;&lt;/label&gt; &lt;input type="submit" name="Submit" value="Submit" /&gt;&lt;/p&gt;
    &lt;/form&gt;
</description>
    </item>
    <item>
      <pubDate>Fri, 6 Jul 2007 22:05:23 GMT</pubDate>
      <title>Minor annoyance with Planet</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=31</link>
      <guid>http://peadrop.com/blog/2007/07/06/minor-annoyance-with-planet/feed/</guid>
      <description>&lt;div&gt;&lt;/div&gt;

&lt;p&gt;Do you know how to fix Planet or Wordpress, so when I edit an old post it does not pop back on Planet?&lt;/p&gt;

&lt;p&gt;I do edit some of my posts,  in particular the &lt;a href="http://peadrop.com/blog/2007/01/06/pretty-emacs/" &gt;Pretty Emacs&lt;/a&gt; one, fairly often. I love to have my blog aggregated, but I would hate spamming Planet Ubuntu readers with my old posts. Therefore if I cannot fix this little annoyance, I will have no other choice to remove myself from Planet Ubuntu.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Fri, 6 Jul 2007 20:05:57 GMT</pubDate>
      <title>Summer of Code Weekly #4</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=30</link>
      <guid>http://peadrop.com/blog/2007/07/06/summer-of-code-weekly-4/feed/</guid>
      <description>&lt;div&gt;&lt;/div&gt;

&lt;p&gt;All is well for me and my project.  I finished the merge of
cStringIO and StringIO, and I am now moving to the more challenging
cPickle/pickle merge.  During the last two weeks, I mostly spend my
time analyzing the &lt;code&gt;pickle&lt;/code&gt; module and thinking how I will clean up
cPickle.  My current plan is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make cPickle&amp;#8217;s source code conform to &lt;a href="http://www.python.org/dev/peps/pep-0007/" &gt;PEP-7&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Remove the dependency on the now obsolete cStringIO.&lt;/li&gt;
&lt;li&gt;Benchmark cPickle and pickle.&lt;/li&gt;
&lt;li&gt;Add subclassing support to Pickler/Unpickler.&lt;/li&gt;
&lt;li&gt;Reduce the size of cPickle&amp;#8217;s source code based on the bottlenecks
  found by the benchmarks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hopefully, cPickle/pickle merge will be as smooth (and as fun) as the
cStringIO/StringIO merge.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Mon, 18 Jun 2007 19:07:45 GMT</pubDate>
      <title>Pickle: An interesting stack language</title>
      <link>http://www.advogato.org/person/avassalotti/diary.html?start=29</link>
      <guid>http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/</guid>
      <description>&lt;p&gt;The &lt;code&gt;pickle&lt;/code&gt; module provides a convenient method to add data
persistence to your Python programs.  How it does that, is pure magic
to most people.  However, in reality, it is simple.  The output of a
&lt;code&gt;pickle&lt;/code&gt; is a &amp;#8220;program&amp;#8221; able to create Python data-structures.  A
limited stack language is used to write these programs.  By limited, I
mean you can&amp;#8217;t write anything fancy like a for-loop or an
if-statement.  Yet, I found it interesting to learn.  That is why I
would like to share my little discovery.&lt;/p&gt;

&lt;p&gt;Throughout this post, I use a simple interpreter to load pickle
streams.  Just copy-and-paste the following code in a file:&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;import code
import pickle
import sys

sys.ps1 = "pik&gt; "
sys.ps2 = "...&gt; "
banner = "Pik -- The stupid pickle loader.\nPress Ctrl-D to quit."

class PikConsole(code.InteractiveConsole):
    def runsource(self, source, filename="&amp;lt;stdin&amp;gt;"):
        if not source.endswith(pickle.STOP):
            return True  # more input is needed
        try:
            print repr(pickle.loads(source))
        except:
            self.showsyntaxerror(filename)
        return False

pik = PikConsole()
pik.interact(banner)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, launch it with Python:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python pik.py
Pik -- The stupid pickle loader.
Press Ctrl-D to quit.
pik&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, nothing crazy &lt;em&gt;yet&lt;/em&gt;.  The easiest objects to create are the empty
one.  For example, to create a empty list:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; ].
[]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Similarly, you can also create a dictionary and a tuple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; }.
{}
pik&amp;gt; ).
()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Remark that every pickle stream ends with a period.  That symbol pops
the topmost object from the stack and returns it.  So, let&amp;#8217;s say you
pile up a series of integers and end the stream. Then, the result will
be last item you entered:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; I1
...&amp;gt; I2
...&amp;gt; I3
...&amp;gt; .
3
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you see, an integer starts with the symbol &amp;#8216;I&amp;#8217; and end with a
newline. Strings, and floating-point number are represented in a
similar fashion:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; F1.0
...&amp;gt; .
1.0
pik&amp;gt; S'abc'
...&amp;gt; .
'abc'
pik&amp;gt; Vabc
...&amp;gt; .
u'abc'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that you know the basics, we can move to something slightly more
complex &amp;#8212; constructing compound objects. As you will see later,
tuples are everywhere in Python, so let&amp;#8217;s begin with that one:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; (I1
...&amp;gt; S'abc'
...&amp;gt; F2.0
...&amp;gt; t.
(1, 'abc', 2.0)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There is two new symbols in this example, &amp;#8216;(&amp;#8217; and &amp;#8216;t&amp;#8217;. The &amp;#8216;(&amp;#8217; is
simply a marker.  It is a object in the stack that tells the tuple
builder, &amp;#8216;t&amp;#8217;, when to stop.  The tuple builder pops items from
the stack until it reaches a marker.  Then, it creates a tuple with
these items and pushes this tuple back on the stack.  You can use
multiple markers to construct a nested tuple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; (I1
...&amp;gt; (I2
...&amp;gt; I3
...&amp;gt; tt.
(1, (2, 3))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You use a similar method to build a list or a dictionary:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; (I0
...&amp;gt; I1
...&amp;gt; I2
...&amp;gt; l.
[0, 1, 2]
pik&amp;gt; (S'red'
...&amp;gt; I00
...&amp;gt; S'blue'
...&amp;gt; I01
...&amp;gt; d.
{'blue': True, 'red': False}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only difference is that dictionary items are packed by key/value
pairs.  Note that I slipped in the symbols for &lt;code&gt;True&lt;/code&gt; and &lt;code&gt;False&lt;/code&gt;,
which looks like the integers 0 and 1, but with an extra zero.&lt;/p&gt;

&lt;p&gt;Like tuples, you can nest lists and dictionaries:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; ((I1
...&amp;gt; I2
...&amp;gt; t(I3
...&amp;gt; I4
...&amp;gt; ld.
{(1, 2): [3, 4]}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There is another method for creating lists or dictionaries.  Instead
of using a marker to delimit a compound object, you create an empty one
and add stuff to it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; ]I0
...&amp;gt; aI1
...&amp;gt; aI2
...&amp;gt; a.
[0, 1, 2]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The symbols &amp;#8216;a&amp;#8217; means &amp;#8220;append&amp;#8221;.  It pops an item and a list; appends
the item to the list; and finally, pushes the list back on the stack.
Here how you do a nested list with this method:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; ]I0
...&amp;gt; a]I1
...&amp;gt; aI2
...&amp;gt; aa.
[0, [1, 2]]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If this is not cryptic enough for you, consider this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; (lI0
...&amp;gt; a(lI1
...&amp;gt; aI2
...&amp;gt; aa.
[0, [1, 2]]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Instead of using the empty list symbol, &amp;#8216;]&amp;#8217;, I used a marker
immediately followed by a list builder to create an empty list.  That
is the notation the &lt;code&gt;Pickler&lt;/code&gt; object uses, by default, when dumping
objects.&lt;/p&gt;

&lt;p&gt;Like lists, dictionaries can be constructed using a similar method:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; }S'red'
...&amp;gt; I1
...&amp;gt; sS'blue'
...&amp;gt; I2
...&amp;gt; s.
{'blue': 2, 'red': 1}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, to set items to a dictionary you use the symbol &amp;#8216;s&amp;#8217;, not &amp;#8216;a&amp;#8217;.
Unlike &amp;#8216;a&amp;#8217;, it takes a key/value pair instead of a single item.&lt;/p&gt;

&lt;p&gt;You can build recursive data-structures, too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; (Vzoom
...&amp;gt; lp0
...&amp;gt; g0
...&amp;gt; a.
[u'zoom', [...]]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The trick is to use a &amp;#8220;register&amp;#8221; (or as called in &lt;code&gt;pickle&lt;/code&gt;, a memo).
The &amp;#8216;p&amp;#8217; symbol (for &amp;#8220;put&amp;#8221;) copies the top item of the stack in a memo.
Here, I used &amp;#8216;0&amp;#8217; for the name of the memo, but it could have been
anything.  To get the item back, you use the symbol &amp;#8216;g&amp;#8217;.  It will
copy an item from a memo and put it on top of the stack.&lt;/p&gt;

&lt;p&gt;But, what about sets?  Now, we have a small problem, since there is no
special notation for building sets.  The only way to build a set is to
call the built-in function &lt;code&gt;set()&lt;/code&gt; on a list (or a tuple):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; c__builtin__
...&amp;gt; set
...&amp;gt; ((S'a'
...&amp;gt; S'a'
...&amp;gt; S'b'
...&amp;gt; ltR.
set(['a', 'b'])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There is a few new things here.  The &amp;#8216;c&amp;#8217; symbol retrieves an object
from a module and puts it on the stack.  And the reduce symbol, &amp;#8216;R&amp;#8217;,
apply a tuple to a function.  Same semantic again, &amp;#8216;R&amp;#8217; pops a tuple
and a function from the stack, then pushes the result back on it.  So,
the above example is roughly the equivalent of the following in
Python:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; import __builtin__
&amp;gt;&amp;gt;&amp;gt; apply(__builtin__.set, (['a', 'a', 'b'],))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, using the star notation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; __builtin__.set(*(['a', 'a', 'b'],))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And, that is the same thing as writing:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; set(['a', 'a', 'b'])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or shorter even, using the set notation from the upcoming Python 3000:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; {'a', 'a', 'b'}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These two new symbols, &amp;#8216;t&amp;#8217; and &amp;#8216;R&amp;#8217;, allows us to execute arbitrary
code from the standard library.  So, you must be careful to &lt;em&gt;never&lt;/em&gt;
load untrusted pickle streams.  Someone malicious could easily slip in
the stream a command to delete your data.  Meanwhile, you can use that
power for something less evil, like launching a clock:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; cos
...&amp;gt; system
...&amp;gt; (S'xclock'
...&amp;gt; tR.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Even if the language doesn&amp;#8217;t support looping directly, that doesn&amp;#8217;t
stop you from using the implicit loops:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pik&amp;gt; c__builtin__
...&amp;gt; map
...&amp;gt; (cmath
...&amp;gt; sqrt
...&amp;gt; c__builtin__
...&amp;gt; range
...&amp;gt; (I1
...&amp;gt; I10
...&amp;gt; tRtR.
[1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.2360679774997898,
2.4494897427831779, 2.6457513110645907, 2.8284271247461903, 3.0]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I am sure you could you fake an if-statement by defining it as a
function, and then load it from a module.&lt;/p&gt;

&lt;pre&gt;&lt;code class="prettyprint"&gt;def my_if(cond, then_val, else_val):
    if cond:
        return then_val
    else:
        return else_val&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That works well for simple cases:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; my_if(True, 1, 0)
1
&amp;gt;&amp;gt;&amp;gt; my_if(False, 1, 0)
0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, you run into some problems if mix that with recursion:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; def factorial(n):
...     return my_if(n == 1,
...                  1, n * factorial(n - 1))
... 
&amp;gt;&amp;gt;&amp;gt; factorial(2)
RuntimeError: maximum recursion depth exceeded in cmp
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On the other hand, I don&amp;#8217;t think you really want to create recursive
pickle streams, unless you want to win an obfuscated code contest.&lt;/p&gt;

&lt;p&gt;That is about all I had to say about this simple stack language.
There is a few things haven&amp;#8217;t told you about, but I sure you will be
able figure them out.  Just read the source code of the &lt;code&gt;pickle&lt;/code&gt;
module.  And, take a look at the &lt;code&gt;pickletools&lt;/code&gt; module,
which provides a disassembler for pickle streams.  As always, comments
are welcome.&lt;/p&gt;
</description>
    </item>
  </channel>
</rss>
