The Wayback Machine - https://web.archive.org/web/20170630144101/http://www.advogato.org/person/DV/diary.html?start=148

Older blog entries for DV (starting at number 148)

Finally finished the libxml2-2.6.0 release from hell at 3am yesterday. To give an idea the ChangeLog since 2.5.10 are 1132 lines long. you can check the xmlsoft.org news to get an shorter version of what got in. Lot of new cleaner parsing API, SAX2, proper error handling, xmlWriter and Walker APIs, lots of bug fixes. The resulting code is cleaner, more modular, smaller and faster even with the new stuff, and it should still be API and ABI compatible (I'm typing this on my Red Hat 9 desktop with libxml2-2.6.0 replacing 2.5.x default, no trouble). There is a 2_5_X branch in CVS but it really should not be needed.

Now I need to 1/ close the related bugs 2/ make some new docs 3/ do the maintainance work that such huge change will obviously generate 4/ look at some changes done at the spec level like XML-1.1 and the 3rd edition of XML-1.0 . But I feel far more comfortable with the current framework than with the 2.5.x one.

William Brack have been helping me tremendously with libxml2/libxslt maintainance and improvement, I need to thank him, and maybe point out that he's probably the most senior hacker on the GNOME project. He operates from Hong-Kong, where he is semi-retired, and if I undertsand correctly he was already programming in the 60's, that didn't stop him learning Python in the last month to refactor the code generating the XML Unicode codepoint checks in libxml2, or gdb'ing and fixing parts of libxslt that I find myself scary to debug, Kudos to Bill !

Still working a lot on the libxml2 changes for 2.6.0 . So far the feedback I got was good on the new API, I will need to make docs, yes this deseperately need better docs, but at least like for the xmlReader API, I have API I feel comfortable documenting. Getting review and patches, someone posted some code implementing something like the xmlReader but working on the tree, Considering that a lot of people seems to have troubles simply walking a tree structure, this may be useful. The negative part is of course the library size. Apparently some people have a hard time with this for desktop use while other are using libxml2 on embedded system, anyway I spent most of the week-end modularizing the library further, there is now a configure flag --with-minimum, which turn off everything possible (including serialization, reader, and even the old SAX1), in which case the library is around 170Kbytes. One of the gain I expected to have in 2.6.0 was to drop the old DocBook SGML parser from the default library configuration, it was intended only as a way to help convert document though xmllint and xsltproc, but apparently scrollkeeper still uses it. Dunno how to best proceed, at the moment the docbook parser simply calls the XML parser so if document were not converted it may fail. Of course I'm sure this is gonna be a pain. since I really don't want a libxml2.so.3 or a libxml3 at the moment, I may have to reintegrate that code ... damn

I'm in the process of making changes to the error reporting in libxml2, last error informations will be availble either globally (per-thread) or within the parser context. Programmers will be able to either gather the structured informations from the synchronous call-back that is already available, or asynchronously when the API call returns. It's like a Gerror but with far more specialized informations. While a think it's a great enhancement, it's a PITA to do, going over the full code trapping the callback and making structured calls instead, sigh...

It's interesting to see the reactions from the libxml2 list to some geeky topics like how C code get optimized, performance, or pure POSIX conformance. There is an interesting talent pool, I find myself asking more and more and really appreciating the quality of the feedback. To me that's one of the main interest of developping in the open-source, you get direct contact with your best users, and the users can be in direct contact with the developper. As the project is evolving, trust is built, information, skills and process get integrated, a good project will distill the best from the talent available, this is enjoyable and if successful bring quality code (and docs) as a result.

Decided to unsubscribe from the OASIS XML-Dev mailing list, I already getting enough mail, and the constant pressure affects my mood, I tend to overreact. The best I can do is to get off list where I'm annoyed because I can become annoying.

More work on libxml2 internals, there is a lot going on there, design and implemented a new set of APIs far cleaner than the old one (which are preserved) allowing more flexibility, control and speed, and getting rid as much as possible to the use of global variables to control parsing options. Reimplementing xmllint parsing part on top of the new API made the improvement clear. This should avoid using the low level interfaces of libxml2 for 99% of the use cases. The capability to cleanly reuse parser context when parsing a succession of files (thing like gconf or RPC like protocol implementation should see a clear boost), and still allow to access at the SAX (1 or 2) level if needed. For more details see libxml2 CVS head or http://mail.gnome.org/archives/xml/2003-September/msg00146.html for more complete explanations. Libxml2-2.6.0 will be a revolution, though I intend to preserve API and ABI compatibility, but most of the benefits will be obtained by a small change in the calling code.

The blog-applet "Add Link" doesn't work for me :-\

There is a new mailing-list for Relax-NG created by James Clark, http://relaxng.org/mailman/listinfo/relaxng-user , it is interesting to see RNG support starting to come into editors and other tools.

Went to my sister wedding this week-end in Le Havre, I'm tired due to the event, excess of food and beverages and also travels, the train on the way back to Paris broke and I missed the corresponding last train to Grenoble, I finally arrived home at 1am going though Lyon and taking a bus.

Released a second beta of the upcoming libxml2-2.6.0 in xmlsoft.org FTP test subdir, I also asked for feedback on more upcoming changes. I described http://mail.gnome.org/archives/xml/2003-September/msg00115.html some of the changes I think I'm gonna add, some of it might be on th verge of the API and ABI compatibility. I'm also removing DocBook SGML support, it was intended only as na help for converting doc not as an official support for the format (it's far too broken for this anyway).

Just read miguel last blog entry and felt compelled to answer a bit:

1/ being fast is fine, but you must first be conformant to the spec, otherwise the usefulness of using XSLT in the first place is moot. Conformance is what takes time and lot of efforts. Sorry but I got negative feedback on the conformance of the Mono XML layer, I hope this get fixed and that the XSLT implementation won't be bad either. I would be intersted to know how many of the bug tests I accumulated over the years in libxslt/tests/* actually pass correctly with your implementation.

2/ libxslt is thread safe in the sense that concurrent transformations can be done in parallel one per thread

3/ Saxon is not the fastest XSLT implementation. It's probably the best w.r.t. conformance.

4/ Yes libxslt takes a KISS approach, I didn't worked on trying to do query optimization, well I did a bit but not much. It didn't tried to build a specific tree model, reusing libxml2 one. I did not code libxslt for performances actually.

I installed 0.7 I hope this will work

Took the afternoon off, after 2 week hacking like crazy fixing and rewriting the XML parser internals I needed a break. Climbed a mountain around, blue sky very nice, but when I tried to take a picture my Canon S20 signaled a problem with the CF card. Quite surprising as I took a picture of the cat yesterday, just reloaded the battery, I basically never remove the CF card. Alas once back home still broken, unscrewed the chassis, looked inside, looks okay, the CF card works in my laptop. Actually I get the same error wether I have a card inserted or not :-( . This camera is exactly 3 years old, I hate this...

Found some mushrooms on the way back, enough for a large pan, I'm used to recognise them, those were really good.

I'm quite surprised that anonymous voting which is a fairly important part of the democratic process could get so much pushback (from a minority clearly), this really pisses me off, I still didn't get any reason why in practice it's a bad thing, people objects that they "feel" it might give trouble. And clearly the current process is not good, nobody should be able to act on the way a given person voted if that person doesn't intend to make it public. This is a fundamental right protecting the voter who can act in perfect freedom. Getting negative and unjustified feedback on this really makes me sick ! The main reason I was interested in the fundation building was to be able to guarantee a clean process where the coders can work without feeling the weight from the companies involved, while still preserving the interest of those in order to get funding. Seing the childish (sorry!) reaction of blocking a fundamental and important change toward this goal without clear reason is extremely demotivating. Maybe it's time I focuse on something else, considering how libxml2/libxslt user base is growing maybe a smaller organization dedicated to this code base start to make sense too.

It's Sunday evening, it was a bad day, except for the walk and the mushrooms, and I feel quite negative right now :-(

Released libxml2-2.5.11 to fix a couple of critical bugs from libxm2-2.5.10 just before Gnome 2.4 release.

And just commited to CVS a complete change of the XML parser to use a different SAX2 based interface, this will provide namespace support and attribute defaulting for SAX users. There is also major changes inside the parser, like string interning, avoiding memory allocation and copies, it's not all finished but start looking solid in my regression tests. I labelled that version as 2.6.0beta1 and uploaded it for testing in xmlsoft.org FTP area

Tested the change on my severn beta test box, apparently the gnome desktop is coming up correctly, as well as a few applications I tried so binary compatibility seems to work so far. BTW the gnome info on that desktop indicate 2.3.7, and even on a low powered Celeon 400 + BP6 + 128MB of ram the desktop is usable :-)

More cleanups to the rpmfind.net boxes it looks better inside even if the outside view didn't changed much. SuSE and Freshrpms distro are now split into more useful subset, this should help using the search results. So far nobody complained so the new infrastucture is probably working.

Hardware failures, last week it was my keyboard Alt keys which stopped working suddenly (thanks Nabil for finding a replacement), now it's the CD burner, okay it was old I got it in 98, this was also my last piece of SCSI equipment plugged. So I bought a new DVD online, that will make a nice addition if it arrives and I can make it work. The cat is currently playing with a tiny screw I forgot on the floor...

Started to work on the SAX update of the XML parser, this is a serious piece of work, this will push namespace checking down at the core, the goal is also to improve performances with less string copies, I should also be able to move to "immutable" document building where all name strings are allocated from a dictionary, limiting malloc, speeding up freeing of trees and possibly boosting a lot locality of access for the xmlReader. But this is a lot of work, next libxml2 release is likely to be 2.6.0 and not 2.5.11 !

Just found that I had been added to Planet Gnome, I really need to see the picture they put for me, so let's write a diary.

Cleaning-up rpmfind.net servers, I spent most of the week-end trying to fix the mirrors, clean-up the databases, ditch more rpm2html C code and replace it with easier to read/write/maintain python scripts. This is a whole interconnected mess of C/python/php scripts manipulating SQL (MySQL), exchanging XML between the servers, and generating HTML at some point. Maintainance is hairy but it works, and I basically didn't touch it for 2 years, as a result the cleanup took a lot of time, effort, and CPU time (if only ALTER TABLE DIS/ENABLE KEYS or SELECT within SELECT was implemented in the installed MySQL I would have lost far less time and CPU, oh well, I hate databases anyway).

I also noticed that my SuSE mirror rsync password does not work on the speakeasy box, and that rpmfind now get denied access for KDE rsync. Sent mail about the situation asking for new access, but didn't got any reply so far. Each database contains in average: 165,000 packages, 200 distributions/release, 1+ million Requires, 400,000 Provides and 3.5+ million path for files present in the packages.

So if rpmfind.net is acting funny, that's normal, I'm playing with it again. Doing some attempts at load balancing too, YMMV. Some statistics from the queries pages seems to indicate that 45 million searches have be run on the 3 boxes in the last 2 years. It's both huge i.e. a fairly large number, but relatively small compared to the number of Linux/RPM distro deployments. I draw the conclusion that a lot of people are actually not updating once installed, or at least not that many are updating using the network, or they don't know about rpmfind.

Oh and I replaced the xmlsoft.org and rpmfind search pages with the ones for the protest agaisnt software patents in Europe. I got 5 mails about it ranging evenly from hostile to supportive. I then discovered that far more messages pointed at it on a slashdot thread. That's fine, don't send me mail, go argue on /. that suits me perfectly :-) . Now that the vote has been posponed to Sep 22 I think I will get the pages back soon.

In other news, my brother got married, I borrowed Nabil's Olympus E10 digital camera, it's a reflex with a smartcard disk. I took 400+ picture over the 2 days, basically the battery were running out before the disk got filled. Marriage was "traditional", well organized, I had some fun, and some of the pictures are great. I loved that camera.

I also discussed on IRC with Rik the problem of rogue applications taking down the desktop (as I naively pointed out in my last diary as something to fix), of course he has looked at the problem for quite a while, the crucial point is how to detect that an application is doing something wrong consuming resources, and how to distinguish it from an acceptable processing. There is apparently no good way to do this and heuristics don't work, I think it has to be a policy, and then ways to express policy rules to the kernel is not easy either... the problem seems just too hard :-(

Looking into the possibility of changing my ADSL provider (from Nerim to Free), but since I rely on ADSL for my work, I will start by getting a second phone line and get a parallel setup then switch only when everything works. I expect some pain nonetheless !

Oh ... and it's my birthday today.

10 Aug 2003 (updated 11 Aug 2003 at 00:13 UTC) »

hadess: football ain't too bad a sport, but football supporters, oh man, what a nice team you're in ... >;->

Now that I have been fooled about writing in this stupid HTML form, maybe I should try to entertain the masses bored on a Monday morning at work, fine ... Notes: 1/ I'm not drunk 2/ monday morning at work on summer must suck, I'm sure 3/ federico, I promised you to write this entry 3 weeks ago, so now you know how I handle promises on IRC or deadlines ...

I went to Linux Symposium, it was fun but heavy, (technically that is I didn't drink much this year), 4 tracks in parallel for 4 days might be a bit too much. There was some really interesting talks, very technical though and my OS knowledge is getting rusty, or simply improvements at the Linux kernel level are getting more and more difficult as the kernel matures (hint there is plenty of stuff left to do in userland !). ajh and the team ran a fine conference again, in spite of some missing wireless network equipment. It was good to see malcolm, I was quite interested by his presentation on SMIL, that's some of the part needed in userland, I feel a bit sorry for robla but it will be hard to standardize on a library which doesn't allow to build non-opensource applications. Nat's presentation was interesting too, crazy as usual, but I feel more in need of information selection than blasting more data which "might" be relevant on my screen, do we really need a "automatic FYI system", well maybe if we can make it smart. Ahum, about being smart, when an application takes over all resources WE SHOULD NOT REBOOT as the quickest way to regain control (for those who missed the demo, the data acquisition and inference processing went wild and Nat destop became totally unresponsive, rebooting the laptop was the only solution). Okay maybe this would be a desktop only setting but there should be a way to get the kernel to stop any program trashing the system for more than 15 seconds, then you send a DBus message (now that would have make another nice presentation) about it and the GUI ask "Application foo is behaving: kick the baby ? Yes / No", and acts accordingly, the kernel must stop the program activity first to even get a chance to ask. When the mouse doesn't redraw anymore, from a desktop user perspective, the machine is dead anyway ! Rik talk about VM was nice but all of this works when the constraints are linear, when you get something like an application taking over all resources and trashing it is non-linear behaviour, and usual tuning doesn't work (and ulimit stuff is simply not sufficient, I want very large application running, but I don't want them taking over the system !)

Also went to Red Hat, good to resynch with the colleagues from time to time. Interesting talks with jbj and other folks interested in metadata set and format needed for yum, apt, up2date, red-carpet and such. People interested might want to subscribe to the list, but please let keep this focused.

Made a couple of libxml2-2.5.9/libxslt-1.0.32 over the week-end, 90% of it are pure bug fixes, coming from a lot of contributors (makes me happy) and especially William Brake a serious coder in Taiwan, I hope I will have a chance to meet him one day, seems we never flight to the US at the same time <grin/>.

My little brother is getting married in 2 weeks, and my sister announced she would too in September. Must be the hot summer or something ... seriously it's really hot and dry here, some of the forest around looks like during fall, leaves are falling, fires are a disaster. If it happen to really be a consequence of our use of fossil fuel and excess of energy consumption, it's getting really worrysome ... The "good" point is that contrary to coral dying in tropical countries the weather crazyness start affecting western lifestyle, maybe with a good retroaction loop people may regulate their behaviour (or governments may force them to), but I'm dreaming loud :-( . People driving their 2 tons SUV to go to work in town simply piss me off ... you bastards !

I brough back from my US trip the South Park season 2 DVDs, and Snatch because it was cheap and the movie was excellent IIRC, I remember the sound track was colorful (like in SP, surprise :-). While rereading this entry I'm afraid SP is affecting my language, sorry, at least now I know where "1/ collect underpants 2/ ??? 3/ profits" actually comes from, I'm learning !!!

139 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!