Older blog entries for timj (starting at number 19)

Sayebackup.sh – deduplicating backups with rsync

About

Due to popular request, I’m putting up a polished version of the backup script that we’ve been using over the years at Lanedo to backup our systems remotely. This script uses a special feature of rsync(1) v2.6.4 for the creation of backups which share storage space with previous backups by hard-linking files.
The various options needed for rsync and ssh to minimize transfer bandwidth over the Internet, time-stamping for the backups and handling of several rsync oddities warranted encapsulation of the logic into a dedicated script.

Resources

The GitHub release tag is here: backups-0.0.1
Script URL for direct downloads: sayebackup.sh

Example

This example shows creation of two consecutive backups and displays the sizes.

$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create backup as bak-.../mydir
$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create second bak-2012...-snap/
$ ls -l # show all the backups that have been created
drwxrwxr-x 3 user group 4096 Dez  1 03:16 bak-2012-12-01-03:16:50-snap
drwxrwxr-x 3 user group 4096 Dez  1 03:17 bak-2012-12-01-03:17:12-snap
lrwxrwxrwx 1 user group   28 Dez  1 03:17 bak-current -> bak-2012-12-01-03:17:12-snap
$ du -sh bak-* # the second backup is smaller due to hard links
4.1M    bak-2012-12-01-03:16:50-snap
128K    bak-2012-12-01-03:17:12-snap
4.0K    bak-current
Usage
Usage: sayebackup.sh [options] sources...
OPTIONS:
  --inc         make reverse incremental backup
  --dry         run and show rsync with --dry-run option
  --help        print usage summary
  -C <dir>      backup directory (default: '.')
  -E <exclfile> file with rsync exclude list
  -l <account>  ssh user name to use (see ssh(1) -l)
  -i <identity> ssh identity key file to use (see ssh(1) -i)
  -P <sshport>  ssh port to use on the remote system
  -L <linkdest> hardlink dest files from <linkdest>/
  -o <prefix>   output directory name (default: 'bak')
  -q, --quiet   suppress progress information
  -c            perform checksum based file content comparisons
  --one-file-system
  -x            disable crossing of filesystem boundaries
  --version     script and rsync versions
DESCRIPTION:
  This script creates full or reverse incremental backups using the
  rsync(1) command. Backup directory names contain the date and time
  of each backup run to allow sorting and selective pruning.
  At the end of each successful backup run, a symlink '*-current' is
  updated to always point at the latest backup. To reduce remote file
  transfers, the '-L' option can be used (possibly multiple times) to
  specify existing local file trees from which files will be
  hard-linked into the backup.
 Full Backups:
  Upon each invocation, a new backup directory is created that contains
  all files of the source system. Hard links are created to files of
  previous backups where possible, so extra storage space is only required
  for contents that changed between backups.
 Incremental Backups:
  In incremental mode, the most recent backup is always a full backup,
  while the previous full backup is degraded to a reverse incremental
  backup, which only contains differences between the current and the
  last backup.
 RSYNC_BINARY Environment variable used to override the rsync binary path.
See Also

Testbit Tools – Version 11.09 Release

flattr this!

Syndicated 2012-12-01 02:32:59 from Tim Janik

ListItemFilter Mediawiki Extension

For a while now, I’ve been maintaining my todo lists as backlogs in a Mediawiki repository. I’m regularly deriving sprints from these backlogs for my current task lists. This means identifying important or urgent items that can be addressed next, for really huge backlogs this can be quite tedious.

A SpecialPage extension that I’ve recently implemented now helps me through the process. Using it, I’m automatically getting a filtered list of all “IMPORTANT:”, “URGENT:” or otherwise classified list items. The special page can be used per-se or via template inclusion from another wiki page. The extension page at mediawiki.org has more details.

The Mediawiki extension page is here: http://www.mediawiki.org/wiki/Extension:ListItemFilter

The GitHub page for downloads is here: https://github.com/tim-janik/ListItemFilter

flattr this!

Syndicated 2012-11-23 17:58:17 from Tim Janik

Meeting up at LinuxTag 2012

 

Like every year, I am driving to Berlin this week to attend LinuxTag 2012 to attend the excellent program. If you want to meet up and chat about projects, technologies, Free Software or other things, send me an email or leave a comment with this post and we will arrange for it.

flattr this!

Syndicated 2012-05-15 13:12:56 from Tim Janik

Testbit Tools Version 11.09 Released

No Bugs
(Image: Mag3737)

 

And here’s another muffin from the code cake factory…

About Testbit Tools
The ‘Testbit Tools’ package contains tools proven to be useful during the development of several Testbit and Lanedo projects. The tools are Free Software and can be redistributed under the GNU GPLv3+.

This release features the addition of buglist.py, useful to aid in report and summary generation from your favorite bugzilla.

Downloading Testbit Tools
The Testbit Tools packages are available for download in the testbit-tools folder, the newest release is here: testbit-tools-11.09.0.tar.bz2

Changes in version 11.09.0:

  • Added buglist, a script to list and download bugs from bug trackers.
  • Added buildfay, a script with various sub commands to aid release making.
  • Fixed version information for all tools.
  • Added support to the Xamarin Bug Tracker to buglist.py.
  •  

    Feedback

    If you find this release useful, we highly appreciate your feature requests, bug reports, patches or review comments!

    See Also

    1. The Bugzilla Utility buglist.py – managing bug lists
    2. Wikihtml2man Introduction – using html2wiki

    flattr this!

    Syndicated 2011-10-01 00:18:55 from Tim Janik

    Wikihtml2man Introduction (aka html2man, aka wiki2man)

    Wiki↠HTML↠Man

     

    What’s this?
    Wikihtml2man is an easy to use converter that parses HTML sources, normally originating from a Mediawiki page, and generates Unix Manual Page sources based on it (also referred to as html2man or wiki2man converter). It allows developing project documentation online, e.g. by collaborating in a wiki. It is released as free software under the GNU GPLv3. Technical details are given in its manual page: Wikihtml2man.1.

    Why move documentation online?
    Google turns up a few alternative implementations, but none seem to be designed as a general purpose tool. With the ubiquituous presence of wikis on the web these days and the ease of content authoring they provide, we’ve decided to move manual page authoring online for the Beast project. Using Mediawiki, manual pages turn out to be very easily created in a wiki, all that’s then needed is a backend tool that can generate Manual Page sources from a wiki page. Wikihtml2man provides this functionality based on the HTML generated from wiki pages, it can convert a prerendered HTML file or download the wiki page from a specific URL. HTML has been choosen as input format to support arbitrary wiki features like page inclusion or macro expansion and to potentially allow page generation from other wikis than MediaWiki. Since wikihtml2man is based purely on HTML input, it is of course also possible to write the Manual Page in raw HTML, using tags such as h1, strong, dt, dd, li, etc, but that’s really much less convenient to use than a regular wiki engine.

    What are the benefits?
    For Beast, the benefits of moving some project documentation into an online wiki are:

    • We increase editability by lifting review requirements.
    • We are getting quicker edit/view turnarounds, e.g. through use of page preview functionality in wikis.
    • We allow assimilation of user contributions from non-programmers for our documentation.
    • Easier editability may lead to richer documentation and possibly better/frequently updated documentation.
    • Other projects also seem to make good progress by opening up some development parts to online web interfaces, like: Pootle translations, Transifex translations or PHP.net User Notes.

    What are the downsides?
    We have only recently moved our pages online and still need to gather some experience with the process. So far possible downsides we see are:

    • Sources and documentation can more easily get out of sync if they don’t reside in the same tree. We hope to be mitigating this by increasing documentation update frequencies.
    • Confusion about revision synchronization, with the source code using a different versioning system than the online wiki. We are currently pondering automated re-integration into the tree to counteract this problem.

    How to use it?
    Here’s wikihtml2man in action, converting its own manual page and rendering it through man(1):

      wikihtml2man.py http://testbit.eu/Wikihtml2man.1?action=render | man -l -

    Where to get it?
    Release tarballs shipping wikihtml2man are kept here: http://dist.testbit.eu/testbit-tools/.
    Our Tools page contains more details about the release tarballs.

    Have feedback or questions?
    If you can put wikihtml2man to good use, have problems running it or other ideas about it, feel free to drop me a line about it. Alternatively you can also add your feedback and any feature requests to the Feature Requests page (a forum will be created if there’s any actual demand).

    What’s related?
    We would also like to hear from other people involved in projects that are using/considering wikis to build production documentation online (e.g. in manners similar to Wikipedia). So feel free to leave a comment about your project if you do something similar.

    See Also

    1. New Beast Website – using html2wiki
    2. The Beast Documentation Quest – looking into documentation choices

    flattr this!

    Syndicated 2011-05-12 23:49:23 from Tim Janik

    Attending LinuxTag 2011

     

    Like every year, I am driving to Berlin this week to attend LinuxTag 2011 to attend the excellent program. If you want to meet up and chat about projects, technologies, Free Software or other things, send me an email or leave a comment with this post and we will arrange for it.

    Syndicated 2011-05-09 12:29:55 from Tim Janik

    BEAST v0.7.4 released

    BEAST/BSE version 0.7.4 is available for download at:

    BEAST is a music composition and modular synthesis application released as free software under the GNU LGPL that runs under Unix. Refer to the About page for more details.

    The 0.7.4 release integrates the bse-alsa package, several speedups, important bug fixes and translation updates.

    Please feel free to provide useful feedback or contribute on IRC, the mailing list and in the Wiki.

    TRANSLATORS: Please help us to improve the BEAST translation, just download the tarball, edit po/.po and email it to us or submit translations directly via the Beast page at Transifex.

    Overview of Changes in BEAST/BSE 0.7.4:

    • Renamed the project to Better Audio System / Better Sound Engine
    • Moved project website to: http://beast.testbit.eu/
    • Various build system fixes [stw,timj]
    • License fixups for some scripts [stw]
    • Fixed subnormal tests on AMD64 if SSE unit is in DAZ mode [stw]
    • Replaced slow resampler checks with a much faster resampling test [stw]
    • Performance improvements for various tests [stw]
    • GLib 2.28 unit test porting [stw]
    • Speed improvements for record field name [stw]
    • Fixed XRUNs in ALSA driver on 64bit systems [timj]
    • Added beast.doap [Jonh Wendell]
    • PO handling improvements.
    • Updated German translation.
    • Updated Norwegian bokmål translation [Kjartan Maraas]
    • Added e-Telugu translation [Veeven]

    flattr this!

    Syndicated 2011-04-09 01:50:16 from Tim Janik

    Human Multitasking

    Multitasking Mind
    (Image: Salvatore Vuono)

     

    The self deceiving assumption of effective human multitasking.

     

    People are often telling me they are good at multitasking, i.e. handling multiple things at once and performing well at doing so. Now, the human brain can only make a single conscious decision at a time. To understand this, we need to consider that making a conscious decision requires attention, and the very concept of attention means activating relevant information contexts for an observation or decision making and inhibiting other irrelevant information.

    The suppression involved in attention control makes it harder for us to continue with a previously executed task, this is why interruptions affect our work flows badly, such as an incoming call, SMS or a door bell. Even just making a decision on whether to take a call already requires attention diversion.

    Related, processing emails or surfing while talking to someone on the phone results in bad performance on both tasks, because the attention required for each, necessarily suppresses resources needed by the second task. Now some actions don’t suffer from this competition, we can walk and breathe or balance ourselves fine while paying full attention to a conversation. That’s because we have learned early on in our lives to automate these seemingly mundane tasks, so they don’t require our conscious attention at this point.

    Studies [1] [2] have shown time and again, that working on a single task in isolation yields vastly better results and in a shorter time frame when frequent context switches are avoided. This can be further optimized by training in concentration techniques, such as breath meditation, autogenic training or muscle relaxation.

    Here’s a number of tips that will help to put these findings to practical use:

    1. Let go of the idea of permanent reachability, nothing is so urgent that it cannot wait the extra hour to be handled efficiently.
    2. Make up your own mind about when to process emails, SMS, IM, news, voice messages.
    3. Start growing a habit of processing things in batches, e.g. walk through a list of needed phone calls in succession, compose related replies in batches, first queue and later process multiple pending reviews at once, queue research tasks and walk through them in a separate browsing session, etc.
    4. Enforce non-availability periods where you cannot be interrupted and may concentrate on tasks of your choice for an extended period.
    5. Schedule phone meetings in advance, ensure everyone has an agenda at hand for the meeting to avoid distractions (Don’t Call Me, I Won’t Call You).
    6. Deliberately schedule relaxation phases, e.g. take a 5 minute break off the screen per hour, ideally moving and walking around; rest breaks are needed after 90 minutes at latest.

    flattr this!

    Syndicated 2011-03-31 01:11:37 from Tim Janik

    Lanedo at CeBIT 2011

    This week, our people are running the Lanedo booth at CeBIT in Hannover.

    CeBIT Logo

    Everybody is invited to come and visit us in hall 2, booth D44/124, in the open source park. We will give introductions to our services, talk about current and future developments around GTK+ and Tracker and anything you want to approach us with.

    flattr this!

    Syndicated 2011-02-28 20:04:15 from Tim Janik

    Using mod_disk_cache with MediaWiki

     

    MediaWiki is a pretty fast piece of software out of the box. It’s written in PHP and covers a lot of features, so it can’t serve pages in 0 time, but it’s reasonably well written and allows use of PHP accelerators or caches in most cases. Since it’s primarily developed for Wikipedia, it’s optimized for high performance deployments, caching support is available for Squid, Varnish and plain files.

    For small scale use cases like private or intranet hosts, running MediaWiki uncached will work fine. But once it’s exposed to the Internet, regularly crawled and might receive links from other popular sites, serving only a handful of pages per second is quickly not enough. A very simple but effective measure to take in this scenario is the enabling of Apache’s mod_disk_cache.

    Here’s a sample benchmark for the unoptimized case:

    $ ab -kt3 http://testbit.eu/Sandbox
    Time taken for tests:   3.33173 seconds
    Total transferred:      301743 bytes
    Requests per second:    6.26 [#/sec] (mean)
    Time per request:       159.641 [ms] (mean)
    Transfer rate:          96.93 [Kbytes/sec] received

    Now we configure mod_disk_cache in apache2.conf:

    CacheEnable   disk /
    CacheRoot     /var/cache/apache2/mod_disk_cache/
    And enable it in Apache:
    $ a2enmod disk_cache
    Enabling module disk_cache.
    Run '/etc/init.d/apache2 restart' to activate new configuration!

    This in itself is not enough to enable caching of MediaWiki pages however, this is due to some bits in the HTTP header information it’s sending:

    $ wget -S --delete-after -nd http://testbit.eu/Sandbox
    --2011-02-09 00:48:21--  http://testbit.eu/Sandbox
      HTTP/1.1 200 OK
      Date: Tue, 08 Feb 2011 23:48:21 GMT
      Vary: Accept-Encoding,Cookie
      Expires: Thu, 01 Jan 1970 00:00:00 GMT
      Cache-Control: private, must-revalidate, max-age=0
      Last-Modified: Tue, 08 Feb 2011 03:24:32 GMT
    2011-02-09 00:48:21 (145 KB/s) - `Sandbox' saved [14984/14984]

    The Expires: and Cache-Control: headers both prevent mod_disk_cache from caching the contents.

    A small patch against MediaWiki-1.16 fixes that by removing Expires: and adding s-maxage to Cache-Control:, which allows caches to serve “stale” page versions which are only mildly outdated (a few seconds).

    mw-disk-cache-201101.diff
    With the patch, the headers changed as follows:
    $ wget -S --delete-after -nd http://testbit.eu/Sandbox
    --01:03:03--  http://testbit.eu/Sandbox
     HTTP/1.1 200 OK
     Date: Wed, 09 Feb 2011 00:03:03 GMT
     Vary: Accept-Encoding,Cookie
     Cache-Control: s-maxage=3, must-revalidate, max-age=0
     Last-Modified: Tue, 08 Feb 2011 03:24:32 GMT
    01:03:03 (386.21 MB/s) - `Sandbox' saved [14984/14984]

    Upon inspection, there’s no Expires: header now and Cache-Control: adapted as described. Let’s now rerun the benchmark:

    $ ab -kt3 http://testbit.eu/Sandbox
    Time taken for tests:   3.5511 seconds
    Total transferred:      38621189 bytes
    Requests per second:    831.14 [#/sec] (mean)
    Time per request:       1.203 [ms] (mean)
    Transfer rate:          12548.95 [Kbytes/sec] received

    That looks good, 831 requests instead of 6!

    Utilizing mod_disk_cache with MediaWiki can easily speed up the number of possible requests per-second by more than a factor of one hundred for anonymous accesses. The caching behavior in the above patch can also be enabled for logged-in users with adding this setting to MediaWiki’s LocalSettings.php:

    $wgCacheLoggedInUsers = true;

    I hope this helps people out there to speed up your MediaWiki installation as well, happy tuning! ;)

    flattr this!

    Syndicated 2011-02-09 01:03:45 from Tim Janik

    10 older entries...

    New Advogato Features

    New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

    Keep up with the latest Advogato features by reading the Advogato status blog.

    If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!