11 Jan 2008 faw   » (Journeyer)

Bits from the Debian i18n meeting (Extremadura 2007)
From December 12th to December 15th, Junta de Extremadura hosted another one of the Debian Meetings; five i18n guys shared ideas, food, buses and fun with the Debian KDE maintainers. We would like to thank Extremadura for hosting us during the Hispalinux Meeting 2007, the event was held at Universad de Derecho (Law University) in Caceres, Spain.

These are the minutes, results and notes from our work, it is a brief description but hopefully complete of what we have done and what is still missing/pending.

Thanks to Cesar (cek) we had the chance to work on churro (i18n.debian.net) locally; the server is still running a 2.4 kernel because of some "tick" problems with 2.6 series, the last one tried was 2.6.21 and we should try newer ones, in order to support upgrades and not get stuck with 2.4, we hope Cesar will find time to test new Debian kernels.

First, let me introduce everybody to the services, robots and resources being hosted by i18n.d.n:

  • MoinMoin wiki for local and simple reference documentation, it contains all the links to the below resources. (http://i18n.debian.net/wiki/)

  • Pootle experimental server

  • dl10n scripts, aka dl10n robots (codename Lion), these scripts are responsible for the status of pseudo URLs used by some translation teams, by the Project Smith and by the NMU Priority List for i18n NMU Campaign

  • Synchronization of the i18n material used by the Debian website to generate translation statistics about PO and PO-debconf

  • Generation of Compendium PO files per-language

  • Different types of statistics

  • Other non user-visible services like a full source mirror for stable, testing, unstable and experimental, used by the scripts and robots.

  • DDTP, Debian Descriptions Translation Project

  • DDTSS, The Debian Distributed Translation Server Satellite, a web front-end for DDTP, now integrated to DDTP to use the Database back-end instead of the e-mail interface.

And, at some point, we found important to state clear the acronyms and names used in related DDTP projects/tools:

  • DDTS, Debian Description Translation Server, this is the main "back-end" used in DDTP, it tends to be the interface between translator tools (present and future ones) and the database;
  • ddt.cgi is a CGI interface that is able to provide info for specific packages or translations, including diffs, related packages and active/inactive descriptions.
  • DDTC which is the old (and still functional) command line client for DDTP.

We took the chance to organize a few things on churro, old accounts were cleaned out and removed, we moved from /org to /srv and got more GBs of space to the "playground". Old files were also removed and some are schedule to deletion on early 2008. With the reallocation of /org we also find some more space to /home and /var, we reorganize some of the links on the web space (specially to remove services from people's accounts), and we changed the mirror script to also synchronize the Packages and Contents files.

Grisu and Martijn worked mainly on DDTP and DDTSS integration. DDTSS now provides statistics for stable, testing and unstable, we are also working with Debian Med to provide support and infrastructure to a specific audience, like packages related to Medicine. The conversion to talk directly with DDTP/DDTS database also provided:

  • Fetching new translations is almost instantaneous and marks translation as requested (avoiding duplicated works via the e-mail interface).

  • After sufficient reviews occurred, the upload is instant

  • Committed DDTS / DDTSS / DDTP website generation into SVN
    • Added READMEs for the above directories

DDTSS now announces the user using authentication because of its integration with the Database backend used by DDTP. Quick trivia: DDTP is now a compound of 25 languages occupying 18 GBytes.

A few days before the meeting we had the offer to use "AUTOBYHAND" to upload a package with the Translation-* files. The package is now called 'ddtp-translations' and we worked during the meeting to create scripts to build the package and to test it on the archive side. This approach allow Debian i18n Team to upload new translations and remove old ones (or inactive ones) without bother FTP Master Team. Special thanks to Anthony Town, he has been working with us to prove tips, fixes and info on how to produce the package and the scripts. The code is available in the debian-l10n SVN under pkg-ddtp-translations:

In our case, "BYHAND" processing consists of a simple tarball of the {main,contrib,non-free}/i18n/Translation-*, we decide to work on a set of scripts to make it easier to create new packages (ddtp-translations) in a consistent way and keeping debian/changelog up-to-date. We also made some suggestions to the script what will run on the archive side to check the tarball structure, base on the examples of debian-maintainers and debtags (tags-override).

One of our initial targets for the meeting with regards to Pootle and Debian was to try big PO files per language, fortunately, Nicolas and Friedel were able to increase Pootle performance enough to get a few languages from DDTP loaded in Pootle. Using the upstream Pootle-diet branch, which uses a database back-end for the generation of statistics, the time to browse the DDTP POs of a language (~20.000 files) went down to a dozen of seconds.

Speaking about Pootle, Friedel gave us a good picture of what is coming next in terms of Pootle's development. There are improvements planned in the areas of permissions and rights delegation, as well as file management (for projects and templates). Improved management of terminology projects is also planned.

Improvements in the QA capabilities of the translate toolkit and Pootle are planned to help with the "false positives" of the pofilter checks. Better reuse of existing translations will become possible by using better translation memory techniques. There is also work planned on formats and converters involving, for example, XLIFF, TMX, TRADOS and WordFast.

Another pending task for quite a while was the CVS migration to SVN, it is now done, with a new layout. Commits to the CVS were disabled and every single script or resource depending on CVS should be changed to use SVN. For now, we are publishing (via HTTP) the status files generated by the pseudo-urls robots until we can fix the scripts to re-enable the commit of the files. You can find them here: http://i18n.debian.net/debian-l10n/status/

We are pretty happy with the changes and results of the work during those days, but we still have some items pending on our TODO list:

  • More advertisement and usage information about PO Compendiums
    There are two use cases are identified:
    • Filling new PO files.
    • QA work to find inconsistent translations.
    Maybe Eddy would love to do that? :-)

  • Extend the duration of the statistics history. (Nekral)

  • Debian packages of the services running on churro
    • DDTP (Grisu)
    • DDTSS (Martijn)
    • dl10n (Nekral)

  • DDTP: add some scripts to handle packages with version in the description (e.g. kernel and kernel modules) (Martijn)

  • DDTP: Standard generation of the translation tarballs (faw)

  • DDTP: document the bracketed stats on the main page (faw)

  • DDTC: should be updated to match the current features. Documentation to ease integration with procmail. (Nekral - low priority)

  • Implement mail service for translation teams with their own robots (e.g. Dutch) (faw)

  • Collect data from http://www.debian.org/devel/website/stats/ (Nekral)

  • http://www.debian.org/intl/l10n/po/,
    are built based on the churro material. It would make more sense to build these statistics on churro (Nekral)
    • We could "fork" the page and add some fancy new features on these pages (Nekral)
    • Add information from the coordination page to indicate that a translation is ongoing. (Nekral)

  • Pootle: missing review indication. Hard with PO back-end. (Friedel)

There are a couple more reports to be sent but they are more focused on i18n specific questions, tools and plans for 2008. So, probably those will be sent only to debian-i18n mail list. If you are interested, please, stay tuned. :-)
Posted on d-d-a: http://lists.debian.org/debian-devel-announce/2008/01/msg00002.html
And a big thanks to Nicolas Fran├žois (aka nekral), he helped me a lot making notes, preparing the text and reviewing it; and was patient enough to wait for the report while I was solving some personal problems.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!