Older blog entries for Omnifarious (starting at number 149)

Interesting design problem with serialization and deserialization

I have been working on a serialization framework I'm happy with for Python. I want to be able to describe CAKE protocol messages clearly and succinctly. This will make it easier to tweak the messages without having to rip apart difficult to understand code. It will also make it easier to understand if I drop the project again and then come back to it years later, or if (by some miracle) someone else decides to help me with it.

Here is what I've come up with as the interface, along with one implementation fo that interface for a simple type:

class Serializer(object):
    """This is class is an abstract base class.  Derived classes, when
    instantiated, create objects that can serialize other objects of a
    particular type to a sequence of bytes, or alternately deserialize
    a sequence of bytes into an object of a particular type."""

    __slots__ = ('__weakref__',)

    def __init__(self):
        super(Serializer, self).__init__()

    def serialize(self, val):
        """x.serialize(value) -> b'serialized value'

        This is implemented in terms of serialize_iter by default.

        It is suggested that derived classes only implement serialize
        or serialize_iter and implement one in terms of the other."""
        if self.__class__ is Serializer:
            raise NotImplentedError("This is an abstract class.")
        return b''.join(x for x in self.serialize_iter(val))

    def serialize_iter(self, val):
        """x.serialize_iter(value) -> an iterator over the bytes
        sequences making p the seralized version of value."""
        if self.__class__ is Serializer:
            raise NotImplentedError("This is an abstract class.")
        return iter((self.serialize(val),))

    def deserialize(self, data, memo=None):
        """x.deserialize(data, [memo]) ->
        (value of the appropriate type, memoryview(remaining_data))

        data must be of type 'bytes', or 'memoryview'.  The memo must
        be a value extracted from a previous NotEnoughDataError.

        It is undefined what happens if you use memo and do not pass
        the same data (plus some possible extra data on the end) into
        deserialize that you originally passed in when you got the
        NotEnoughDataError you extracted the memo from.

        May raise a ParseError if there is a problem with the data.
        If the failure was because the parser ran out of data before
        parsing was finished, this is required to be a
        NotEnoughDataError."""
        return self._deserialize(data if not isinstance(data, bytes) \
                                     else memoryview(data),
                                 memo)

    def _deserialize(self, memview, memo=None):
        """x._deserialize(memoryview) ->
        (value of the appropriate type, memoryview(remaining_data))

        Exactly like deserialize, except a memoryview object is
        required.  deserialize is implemented in terms of
        _deserialize.  Derived classes are expected to override
        _deserialize."""
        raise NotImplentedError("This is an abstract class.")


class SmallInt(Serializer): """This class is for integers that are 8, 16, 32, or 64 bits long. They may be signed or unsigned. No other sizes are supported. >>> s = SmallInt(2, True) Traceback (most recent call last): ... ValueError: size is 2, must be 8, 16, 32 or 64 >>> s = SmallInt(8, True) >>> b = list(s.serialize_iter(5)) >>> b == [b'\\x05'] True >>> o = s.deserialize(b''.join(b)) >>> o = (o[0], o[1].tobytes()) >>> o == (5, b'') True >>> o = s.deserialize(b''.join(b) + b'z') >>> o = (o[0], o[1].tobytes()) >>> o == (5, b'z') True >>> s = SmallInt(8, True) >>> b = s.serialize(-5) >>> b == b'\\xfb' True >>> s = SmallInt(8, True) >>> s = s.serialize(128) Traceback (most recent call last): ... ValueError: 128 is out of range for an signed 8 bit integer >>> s = SmallInt(64, False) >>> b = s.serialize(2**64-1) >>> b == b'\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff' True >>> s = SmallInt(64, True) >>> b = s.serialize(-2**63) >>> b == b'\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00' True """ _formats = dict(( ((8, True), '>b'), ((8, False), '>B'), ((16, True), '>h'), ((16, False), '>H'), ((32, True), '>i'), ((32, False), '>I'), ((64, True), '>q'), ((64, False), '>Q') )) __slots__ = ('_size', '_signed', '_low', '_high', '_format') def __init__(self, size, signed): if size not in (8, 16, 32, 64): raise ValueError("size is %d, must be 8, 16, 32 or 64" % (size,)) self._size = size self._signed = bool(signed) self._format = self._formats[(size, signed)] def serialize(self, value): if not isinstance(value, (int, long)): raise TypeError("%r must be an int or long" % (value,)) value = int(value) try: ret = _struct.pack(self._format, value) except _struct.error: raise ValueError("%d is out of range for an %ssigned %d bit " "integer" % (value, ("un" if not self._signed else ""), self._size)) return ret def _deserialize(self, memview, memo=None): numbytes = self._size // 8 if len(memview) < numbytes: raise _NotEnoughDataError((self._size // 8) - len(memview)) else: data = memview[0:numbytes].tobytes() remaining = memview[numbytes:] try: result = _struct.unpack(self._format, data)[0] return result, remaining except _struct.error as err: raise ParseErrror(err)

There is also a CompoundNumbered type for representing tuples. This allows you to represent structured messages with multiple fields. Here is example of how you might represent CAKE new session messages:

cake_newsess_v2 = _serial.CompoundNumbered(
    _serial.Count(), # Version
    _serial.Count(), # Type
    _serial.KeyName(), # Destination key
    _serial.KeyName(), # Source key
    _serial.SmallInt(64, False), # Session serial #
    _serial.CountDelimitedByteString(), # Encryption header
    _serial.CountDelimitedByteString(), # Signature.
    _serial.FixedLengthByteString(32) # Header HMAC
)

There is a problem though. The signature and header HMAC are supposed to be encrypted, but the deserializer can't know the key to use until it's decrypted the encryption header. This means that later parts of the deserialization process need to know about things from previous parts.

I have a way for the deserialization process to save state. This is used so that if deserialization throws a NotEnoughDataError because not enough data is available, the exception may have a memo field. This memo field can then be passed in again to resume close to where deserialization stopped. (Though now I'm sort of wondering if I shouldn't do something generator based instead...)

But this mechanism does not allow state to be passed forward from a previous deserializer to a new one. And this applies the other way around too. When serializing there is stuff that's not really a part of the data being serialized (like the current HMAC or encryption state) that needs to be known by serializer in order to serialize properly.

I'm thinking of adding an optional context parameter to the serialization and deserialization functions that's just an empty dictionary into which this sort of state can be stuffed. But this seems really messy. Can anybody think of any better ways to do this that are fairly general?

Syndicated 2011-02-02 22:46:39 (Updated 2011-02-02 23:03:41) from Lover of ideas

Protocol buffers?

I have a problem for which protocol buffers seem like a good solution, but I'm reluctant to use them. First, protocol buffers include facilities for handling the addition of new fields in the future. This adds a small amount to a typical protocol buffer message, but it's a facility I do not need.

Also, I feel the variable sized number encoding is less efficient than it could be, though this is a very minor issue. I also feel like I have a number of special purpose data types that are not adequately represented.

I'm also not completely pleased with the C++ and/or Python APIs. I think they contain too many googlisms. I would like to see public APIs published that were free of adherence to Google coding standards like do-nothing constructors and no exceptions.

I think, maybe, I will be using protocol buffers for some messages that are sent by applications using CAKE as a transport/session layer. These include some of the sub-protocols that are required to be implemented by a conforming CAKE implementation.

On a different note, I think Google's C++ coding standards are lowering the overall quality of Open Source C++ code. This isn't a huge effect, but it's there.

It happens because Google's good name is associated with a set of published standards for C++ coding that include advice that while possibly good for Google internally is of dubious quality as general purpose advice. It also happens because when Google releases code for their internal tools to the Open Source community, these tools follow Google's standards. And some of these standards have the effect of making it hard to use code that doesn't comply with those standards in conjunction with code that does.

Syndicated 2010-12-04 23:26:39 (Updated 2010-12-04 23:28:28) from Lover of ideas

Today's XKCD

Normally XKCD is amusing for very positive reasons. But I frequently feel a lot like the guy with the beard in this cartoon. It's really frustrating. So, today's XKCD is darkly amusing to me. Freedom is such a hard sell before people lose it. People choose convenience every time, frequently until it's almost too late to fix the problem all the while berating the people who were worried in the first place.

Infrastructures

Syndicated 2010-05-22 00:05:25 from Lover of ideas

19 May 2010 (updated 19 May 2010 at 07:10 UTC) »

Eben Moglen Tech Talk at Google

Eben Moglen is one of the principle lawyers behind the GPL. He's also a tireless free software advocate, and significantly more photogenic and diplomatic than Richard Stallman.

He recently gave this interesting tech talk at Google about the perception of Google by entities outside it. It was really well done, and struck a strong chord with me.

I've noticed that people frequently are incapable of believing that some things Google does are for the reasons Google says they're doing them. For example (and I don't really have the time to find references just now) many people seem to think that Google Doodles, those fun, timely modifications to their main search page, are a marketing tool, when in fact they are largely done purely out of whimsy.

I suppose, in one sense there is marketing purpose. Google is projecting their image of themselves out into the world. It's brand building. But, on the other hand, there isn't. I doubt that Google Doodles started as an idea for brand building in some marketing department. I'm betting some random small group of people decided one day that it would be fun to do, and the idea sort of caught on and now it's a tradition.

But people seem to want to analyze doodles for the marketing message they contain, despite the fact there generally isn't one. The more enigmatic the doodle is, the more determined people seem to be to find the marketing message in it.

This means there is a disparity in perception between people outside Google and people inside Google. One that might serve Google very poorly in the future. It's very important that Google understand this and respond appropriately. Perception is reality and people and organizations live up to expectations. Google risks becoming what people perceive them to be unless they act to correct that perception.

Google also frequently doesn't realize how the fact that they are so large and powerful affects people's perceptions of them. Witness the brouhaha over Buzz. Google did do some somewhat wrongheaded things in introducing it, but Buzz was not anywhere near the privacy destroying aggregator that people thought it was. And the fact that people perceived Buzz in this way seemed to mystify people inside Google, even though it was predictable given Google's size and people's perceptions.

Again, this points to a need by Google to better manage people's perceptions of them, and to manage their product releases better in terms of how people perceive them.

Eben Moglen suggests, quite wisely, that one thing Google could do is to change their policy on contributing internal changes back to Open Source projects. I think this is a good idea, but I doubt it will really be enough.

I am a little worried that if Google takes this advice to heart that they will grow a PR arm that does what every other PR arm in the world does, which is to try to make sure that perception stays far more positive than reality instead of simply trying to make perception match reality. But Google should do something, since I think people think far more ill of them than they generally deserve.

Google is, in fact, the only company I know of that has a revenue stream greater than 1 billion dollars a year that I actually have a positive opinion of.

Syndicated 2010-05-18 23:32:06 (Updated 2010-05-19 06:25:53) from Lover of ideas

The evils of Flash

This was a Slashdot comment, but I think it deserves a top level post here. It's in response to Apple’s attack on Adobe Flash, it’s all about online video NOT. (I added the 'NOT' because that's the author's conclusion.)

Pot calls kettle black, kettle complains, but it's just as black.

Flash is a despicable disgrace. Most of the time when I talk to a Flash developer, the thing they're the happiest about is the control they get over my computer. This is directly because the Flash player is a piece of garbage closed source tool that purposely caters to developers over end-users. The Open Source gnash (not ganash) player has an option to pause a Flash program. The Adobe player will never, ever end up with that option, ever. Giving me control over my own computer is against Adobe's best interest. That makes Adobe's Flash player is little more than a widely deployed trojan horse that, IMHO, is little better than spyware (Flash cookies anyone? Where's my control over those?).

I wouldn't complain so bitterly about this if the gnash player were actually a decent drop in replacement for the closed source Flash player, but it isn't. I have to either choose my freedom to have my computer do what I want instead of what some random corporation wants with Flash that is broken most of the time, or Flash that works while giving up my freedom. I will choose my freedom, thank you very much, but I will be bitter about the stupid choice I'm forced to make.

So, when one maker of a closed, proprietary platform that steals people's freedom purposely does things to the detriment of another closed proprietary platform that steals people's freedom, I can't help but cheer. And I hope Adobe finds a way to play nasty games with Apple too. The more these two companies can find ways to hurt eachother, the more the rest of us benefit.

If Adobe Open Sourced the Flash player (I could care less about the developer tools, they will end up with Open Source implementations no matter what Adobe does if the player is truly open) my objections to Flash would completely disappear. I could realistically choose a fully functional Flash player and I'm certain I could find one with a pause button, or one that refused to store cookies for longer than a week. I could make it myself if I wanted to.

And lest you tell me that I'm just whining, the majority of large sites out there no longer look right without Flash. By not using Flash, I'm cut off from a significant part of the experience of the web. I shouldn't be forced to give up control of my computer in order to browse the web. That's a completely and utterly ridiculous assertion.

Syndicated 2010-05-07 17:27:02 (Updated 2010-05-07 17:54:09) from Lover of ideas

Walking data structures

It's common programmer tech speak to talk about 'walking' data structures, meaning following all the pointers around to put all the data back together again. I think that 'brachiation' is a more apt metaphor, and fits well with the concept of 'code monkey'.

Syndicated 2010-03-30 16:49:53 (Updated 2010-03-30 16:50:11) from Lover of ideas

I hate perl

Case in point, the Net::IP module. The documentation looks nice. It handles IPv6 and IPv4 addresses. It looks clean and simple.

Then, I decided I would like to be able to have IPv4 mapped IPv6 addresses match the IPv4 address ranges I'm singling out for special treatment. So I look into its tool for extracting an IPv4 address from an IPv6 address.

The call, ip_get_embedded_ipv4 doesn't seem to work on IPv6 addresses created with 'new'. It only works on IPv6 addresses represented as strings. This leads me to dive into the implementation.

I discover that the is no coherent internal representation. Just a lot of different attributes that are used at different times for different purposes and are converted from one another as needed.

Additionally, there appears to be no way to import particular symbols of certain classes from the module. You have to import them using the import statements specified in the documentation or take your chances on whether or not it will work. This is because the import mechanism and which symbols are global or not is handled in a fairly ad-hoc sort of way and re-implemented in each module according to the whims of the author.

It's really quite surprising the module works at all. And I'm left feeling like I really ought to re-write it if I want something I can count on.

In reality, looking at the module's implementation was a mistake. This is always what happens to me when I look at a perl module. Either it works in a completely mysterious way using language mechanisms I've never seen used before, or it works in a way that's totally broken and practically guaranteed to break for any use that varies from the specific use-cases described in the documentation. Frequently both are the case. Aigh! Run away!

I hope I can convince my new workplace to stop using perl.

Syndicated 2010-03-18 18:14:05 (Updated 2010-03-18 18:16:08) from omnifarious

31 Jan 2010 (updated 1 Feb 2010 at 09:20 UTC) »

Why I hate all of Apple's new hardware

The iPod, the iPhone, and the new iPad. I hate them all. They are a horrible abomination that appeals to the worst in us, the part that thinks if we all just let someone else handle all the details for us that everything will be OK and we don't need or want to take any personal responsibility for the things we own, for the attitude that convenience beats freedom.

And this isn't because they are small and not a 'full-fledged' computer or anything like that. I would love a world full of tiny useful gadgets that help people get stuff done without getting in their way. No, I hate them because you can't open them up and tinker with them. You can't make them do anything you want them to do, you can only make them do what Apple wants you to be able to do.

And this author has distilled for me at least one incredibly important reason why this freedom is so important in his short essay "Tinkerer's Sunset".

I got my start with computers because of that exact sense. This is the ultimate gadget! I can make it do absolutely ANYTHING! I just have to figure out how to tell it in a language it can understand.

None of the products I mention have that. They all treat 'developers' as a special class that you have to jump through hoops to become a member of (and what kid is going to go do that?). And even then, people who choose to be in that class still don't get to make the machine do anything, just what Apple approves of. That is very, very not OK.

I'm not an Apple hater here. I own one of their laptops because I get root access on it, just like I would own an iPhone if I got root access on it. The laptop is a good piece of hardware, and it's the only laptop I've ever used that I've really enjoyed using.

The most excusable of them all is the iPod. It masquerades as a simple, single-purpose device. But even then, the fact that Apple purposefully hobbles the platform in various ways in order to try to keep you from doing things Apple doesn't want you to do has kept me from even considering buying one.

It's my hardware! MINE! I should get to do whatever the heck I want to with it. This whole 'joint ownership' thing (especially when they pretend it isn't happening) with some large corporation is totally broken. It really distresses me that so many choose convenience over freedom (hint: it doesn't have to be a dichotomy, and I suspect that Google will get this right). My only, rather bitter, consolation is that such people will get the future they deserve.

Note, that I am most definitely not insisting that everybody should open up their appliances and tinker with them. I don't want you all to become developers or anything like that.

What I'm insisting on is that you choose appliances that you can open up and tinker with. Not because you know you want to, but because having the freedom to do so taken away from you is very bad for everybody, especially children who will never get the chance to learn they enjoy tinkering because their corporate overlords forbid them from doing so.

Unfortunately, people who buy such devices may also end up, by their aggregate choices, dragging me into a future that I don't want. Network effects (as in marketing speak network effects) are king on computers. If freedom destroying gadgets become popular, it starts to become really hard to use anything but freedom destroying gadgets.

Edited 2010-02-01 00:14 PST: People who commented before then are commenting on a diatribe where I didn't try nearly so hard to separate the nice things the gadget does from the freedom destroying effects of the policies of the corporation that makes it.

Syndicated 2010-01-31 19:32:38 (Updated 2010-02-01 08:15:51) from Lover of Ideas

This captures my feelings perfectly

I found a fantastic quote that I have to save, though it's only funny (or really irritating) to computer programmers.

Question: What's the difference between Java and Javascript?

One is essentially a toy, designed for writing small pieces of code, and traditionally used and abused by inexperienced programmers.

The other is a scripting language for web browsers.

Syndicated 2010-01-25 20:08:28 from Lover of Ideas

I'm pleased with myself

In answering a question on StackOverflow I appear to have independently re-invented the Curiously Recurring Template idiom as applied to polymorphic copy construction.

Until I tried to answer that person's question I never realized there was such a nice and convenient way to avoid having to copy&paste cookie-cutter clone methods.

Syndicated 2010-01-14 17:10:38 from Lover of Ideas

140 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!