Older blog entries for yeupou (starting at number 137)

27 Dec 2010 »

Keeping the dpkg installed software database clean

The system on my workstation was installed in 2008, December. Actually, I installed Debian AMD64 version over an i386 version on the same box, which was installed around 2003.

Debian ships tools that makes it easy to keep a clean system. For instance, debfoster allows to easily get rid of all no-longer necessary libraries and al: you just have to select the important pieces and it will remove any software that is not required by one of these. And apt-get, nowadays, just like deborphan used to, even warns you when some software is no longer required and provides you with the autoremove command line argument that do the job automatically.

(debfoster is, supposedly, deprecated, like apt-get is in favor of aptitude. Well, I like debfoster)

That being said, if I run dpkg –list | grep ^r | nl | tail -n 1 on this box, after only one year, I get 617 lines about removed software I do not care about. Mostly, they were kept in the dpkg database because I (or me using the system) modified their conffiles. The following will clean this: for package in `dpkg –list | grep ^r | cut -f 3 -d ” “`; do dpkg –purge $package; done && debfoster

Syndicated 2010-12-26 23:30:40 from # cd /scratch

12 Nov 2010 »

Using partitions labels

Recent linux versions (yes, I’m talking kernel here – linux is not an operating system) introduce new IDE drivers. It implies a device naming convention change. Instead of hda, hdb, etc, you get sda, sdb, etc, just like SCSI drives.

I have three hard disks on my main workstation – plenty of partitions. So in my case, it makes sense to use a unique identifier for each partition so nothing breaks up whenever I add/remove a drive or boot on an older kernel with the previous IDE drivers.

There are already uniques ids for each partition available using the command blkid. It returns unbearables and meaningless, but very uniques, ids like af8485cf-de97-4daa-b3d9-d23aff685638.

So it is best, for me at least, to label partitions properly according to their content and physical disposition, which makes for uniques id too in the end.

For ext3 partitions, I just did:

e2label /dev/sda2 sg250debian64 e2label /dev/sda3 sg250home

For the swap, e2label cannot help, so we set the label with mkswap, recreating it:

swapoff /dev/sda1 mkswap -L sg250swap /dev/sda1 swapon -L sg250swap

For ntfs partitions, I did:

apt-get install ntfsprogs ntfslabel /dev/sdb1 hi150suxor ntfslabel /dev/sdb2 hi150suxor2

Then, /etc/fstab must be edited as:

LABEL=sg250swap none swap sw 0 0 LABEL=sg250debian64 / ext3 errors=remount-ro 0 1 LABEL=sg250home /home ext3 defaults 1 2 LABEL=hi150suxor /mnt/suxor ntfs-3g defaults,user,noauto 0 0 LABEL=hi150suxor2 /mnt/suxor2 ntfs-3g defaults,user,noauto 0 0

Finally, grub (or any other boot loader) config should be updated to reflect that. However, unless I’m mistaken, with grub2 as shipped by debian, everything is generated usings scripts that does not seem to handle labels

Syndicated 2010-11-11 23:38:24 from # cd /scratch

8 Nov 2010 »

Minimalistic BitTorrent client over NFS/Samba

Not quite AJAX

While current trends in music/movie industry will surely encourage development of a new generation of peer-to-peer softwares, the same way they made CD-burners cheap in a less than a decade, I’m still quite happy with BitTorrent.

I used torrentflux for quite some time. Shipped with Debian, installed on my local home server, accessible to any box on the network over https, even if it’s interface is not exactly eye candy, it works. I just had to configure web browsers to access http://server/torrentflux/index.php?url_upload=$ each time they hit a .torrent file. But even if web interface may be powerful, user-friendly, I resent torrentflux for having me to click plenty of time (at least two times just to start a download), after having logged in.

I took a look at rTorrent. It works by looking into a directory for new .torrent then load them automatically. Wonderful. Sadly, you have to log in over SSH and then manually select over a text user interface which download you want to actually start.

I liked the idea of dragging’n'dropping .torrent in one directory. It can be done over NFS or Samba, with no additional login. I have those already set up on my server. Next step is to handle queue management with the same directory.

I came up with the idea of using a command line BitTorrent client through a script that would watch the damn NFS/Samba directory. It would :
– notice and register new .torrents dropped
– allow to forcefully pause/remove any designated torrent
– allow to forcefully pause all downloads
– warn by mail whenever a download is completed and unregister the relevant torrent

So I wrote such script so it would handle transmission daemon as shipped by debian testing. It looks for file in a given directory named after the following syntax:
– $file.torrent = torrent to be added
– $realfile.hash = torrent being processed (delete it to remove the torrent)
– $realfile.hash- = torrent paused
– $realfile.hash+ = torrent (supposedly) completed and already removed
– all- = pause all

Here’s the HOWTO:

apt-get install tranmissioncli screen adduser torrent echo "torrent: youruser" >> /etc/aliases

su torrent
cd ~/
mkdir watch download
exit

mkdir -p /server
ln -s /home/torrent /server

Obtain uid/gid of torrent necessary below:

cat /etc/passwd | grep torrent

Here I get 1003/1003.

Edit /etc/exports to set up NFS access (this assumes your NFS server is already set up), add:

# every box on the network get rw access to rtorrent /home/torrent/download 192.168.1.1/24(rw,sync,all_squash,anonuid=1003,anongid=1003) /home/torrent/watch 192.168.1.1/24(rw,sync,all_squash,anonuid=1003,anongid=1003)

On each NFS client, add in /etc/fstab (you must create mount points):

server:/home/torrent/download /mnt/torrent/download nfs rw,nolock 0 0 server:/home/torrent/watch /mnt/torrent/watch nfs rw,nolock 0 0

Edit /etc/samba/smb.conf to set up Samba access (this assumes your Samba server is already set up, add:

[Download] path = /home/torrent/download browseable = yes public = yes valid users = youruser force user = torrent force group = torrent writable = yes

[Watch]
path = /home/torrent/watch
browseable = yes
public = yes
valid users = youruser
force user = torrent
force group = torrent
writable = yes

Restart NFS/Samba servers, mount networked file system on the clients.

Add a startup script for transmission-daemon, edit it if need be (daemon configuration is done here), fire it up:

cd /etc/init.d/ wget http://yeupou.free.fr/torrent/init.d/torrent update-rc.d torrent defaults 80 /etc/init.d/torrent start

At any time, you can check the current daemon process with screen:

screen -r torrent

Add torrent-watch.pl in /usr/local/bin or /usr/bin (anywhere in $PATH):

cd /usr/local/bin wget http://yeupou.free.fr/torrent/torrent-watch.pl chmod a+x torrent-watch.pl

Check that it runs properly. Drag’n'drop any .torrent in /home/torrent/watch and run:

su torrent torrent-watch.pl cat status

If everything is ok, add in /etc/cron.d/torrent:

* * * * * torrent cd ~/watch && /usr/local/bin/torrent-watch.pl

You’re done.

Syndicated 2010-11-08 17:16:25 from # cd /scratch

8 Oct 2010 »

Release: SeeYouLater 1.1

I’ve just released SeeYouLater 1.1 (fetch a list of IP or known spammers and to ban them by putting them in /etc/hosts.deny). This is a small cleanup release, now it avoids duplicates in both database and hosts.deny.

You can obtain it on the Gna! project page using SVN or debian packages.

Syndicated 2010-10-08 14:49:33 from # cd /scratch

19 Sep 2010 »

Being warned of pending packages upgrades with apt-warn

I started using GNU/Linux with RedHat 5.2. It cames with plenty of packages (GNOME 0.20, Linux 2.0.36, etc) and I was quite happy to deal with RPM (RedHat Package Manager, hum) telling me which package is required to install another one, which package contains which files. You simply had to go to RPMFind.net to get missing packages. If no package was available, you could write a clean RPM spec to build one or use checkinstall to build RPMs on the fly when doing make install. It was more than ten years ago, still, nowadays Microsoft Windows XP (sorry, I never used Vista/7) have no clean packaging system that I know of; you have a clumsy list of installed software (InstallShield, whatever it means), no clear idea of dependancies, you can remove pieces of software required by other still installed software and there are plenty of installed pieces of software that you have no way of clearly listing.

At that time, I had a Pentium II 350 MHz and a Pentium 200 MMX as workstations and a Pentium 133 MHz as home server. I, soon enough, had the idea to write a script to produce a list of installed packages readable over intranet and so I published a BASH-based script to output an HTML view of a RPM database called pdbv, standing for Package DataBase View, the first version 1.0.0 being released in June 2002. On the Gna! project page, when listing pros and cons of pdbv, the first pro that came up was “it does not require lucid/gtk+/qt or other big libs”: nowadays, GTK+ and Qt probably no longer strike the mind of anyone as “big (bloated) libraries” and I assume Lucid is no longer even installed on most GNU/Linux systems. Later, I rewrote pdbv in Perl which made if was faster and lighter. Here are demos of pdbv: pdbv 1.x with French locale, pdbv 2.x.

As you can see browing pdbv’s demos, it obviously supports also dpkg (Debian Package, duh). I gradually switched over Debian GNU/Linux for two reasons: apt-get and the branching stable/testing/unstable. Apt-get was the end of wasting time on RPMFind. Debian stable offers astonishing stability for servers while testing/unstable provides brand new desktop software in a timely fashion.

Nowadays, I spend less time dealing with computers and I no longer rely much on pdbv. Due to lack of support (I guess I’m to blame; but KPackage or Synaptic are surely more useful to endusers anyway), it will be removed from Debian at its next stable release (it is still in Debian lenny but no longer in testing). I no longer care much about which software is installed, I use debfoster to keep clean my systems (I know, just like apt-get, debfoster is deprecated in favor of aptitude, but I cannot help using it instead).

However I’d like to know which upgrades are pending. For this reason (and I’m quite sure I’m reinventing something that already exists, but I failed to find it and I wanted it my way), I wrote a small script called apt-warn that will run apt-get update and then warn you of pending updates (only if it has not warned you already about them). It requires Apt::Pkg. It is supposed to be installed a cronjob in /etc/cron.daily. Running on my workstation this morning, it outputs:

Follows 4 newly updated package(s) that you could upgrade on bender: hicolor-icon-theme (0.11-1 -> 0.12-1) sudo (1.7.2p7-1 -> 1.7.4p4-2) xserver-common (2:1.7.7-4 -> 2:1.7.7-6) xserver-xorg-core (2:1.7.7-4 -> 2:1.7.7-6) Follows 5 recently updated package(s) that you also could upgrade: autopoint (0.18.1.1-1 -> 0.18.1.1-2) gettext (0.18.1.1-1 -> 0.18.1.1-2) gettext-base (0.18.1.1-1 -> 0.18.1.1-2) login (1:4.1.4.2-1 -> 1:4.1.4.2+svn3283-1) passwd (1:4.1.4.2-1 -> 1:4.1.4.2+svn3283-1)

Autopoint, gettext, login and passwd pending upgrades were already warned about yesterday. A second run will return no output since there is no other available upgrade not already warned about.

Syndicated 2010-09-19 10:19:39 from # cd /scratch

9 Sep 2010 »

Getting MPlayer to cope cleanly with redshift

Redshift is a nice tool that adjusts the color temperature of your screen according to your surroundings. As result, your eyes hurt less if you are working in front of the screen at night.

It is easy to set up, it is for instance already packaged for Debian (package redshift). Once installed, you have to determine longitude and latitude of your position – googling around should do. And you can made some test to defines which range of temperature you want redshift to work with – I like it cold, so I go from 6500 to 9300. And you add it in autostart, the way you want.

In my case, I added redshift -l 48.799:2.505 -t 6500:9300 & just before startkde in my ~/.xsession

Easy, isn’t it? Sure. But when I watch TV with MPlayer or any video with SMPlayer, especially around 01 AM, I’d like color temperature back to normal. And, no, I’m not fond of the idea of doing a killall each time I start watching a video and then a call to redshift afterwards when I’m done.

Configure SMPlayer to use this MPlayer wrapper that kills and starts redshift at the right time

So I wrote a wrapper that send SIGTERM to any redshift process when starting, call MPlayer then, when over, restart redshift. It uses basic perl functions so it has no dependencies. You may however edit it to the set latitude/longitude and temperature range to whatever you like.

It should be just as if you were using MPlayer, so you can configure SMPlayer, or any MPlayer frontend, to use this wrapper. Obviously, this wrapper could be modified to work with vlc, xine or any else video rendering engine.

Syndicated 2010-09-09 14:55:00 from # cd /scratch

13 Aug 2010 »

Slaying Spams with both Bogofilter and SpamAssassin embedded in exim

Ads are spam. Good thing with the internet’s ads is that you can set up countermeasures.

(Disclaimer: yes, there is nothing new here, just an example of setup)

I have plenty of email addresses from different providers, some are definitely history. I could go through the websites of all of these and set up forwarding for the one I no longer use but still want to be able to get mail from, just in case. Well, I would do that if I was using my mail client to fetch mails – because otherwise fetching mails would actually take ages.

But, as I have a local home underclocked server, I find way easier and potent to, instead, use ESR’s fetchmail to download them all to a single account that is accessed by my mail client through IMAPS. I have a /etc/fetchmailrc like:

poll pop.free.fr with proto POP3 user 'XXX' there with password 'XXX' is 'localuser' here poll imap.gmail.com with proto IMAP user 'XXX@gmail.com' there with password 'XXX' is 'localuser' here with ssl user 'XXZ@gmail.com' there with password 'XXZ' is 'localuser' here with ssl

Fetchmail download mails than then relies on the installed SMTP, which is Exim, to deliver it to end user account mailbox accessible through IMAPS.

What’s so nifty nifty about? Well, mails will also be filtered for spam. As it happens on the local home server, it will be unnoticeable for the end user that is me. We’ll use several anti-spam tools, not caring about redundancy and time-consumption: DNSBLs, Bogofilter, SpamAssassin, razor2.

So, here we go. Note that Exim (exim4) in Debian use the user Debian-exim. localuser is the recipient end-user.
We will create a system group dedicated to spamchecking to easily share bayesian databases:

# addgroup --system spamslayer # adduser Debian-exim spamslayer # adduser localuser spamslayer

* Bogofilter is a bayesian spam filter . It is said to be faster and lesser time consuming than the SpamAssassin’s own bayesian filter so will run mails through it first. It is installed with the debian package.

Edit /etc/bogofilter.cf as follows:

bogofilter_dir=/var/lib/bogofilter db_transaction=yes

The bayes directory must be created by hand:

# mkdir /var/lib/bogofilter # chgrp spamslayer /var/lib/bogofilter # chmod 2777 /var/lib/bogofilter

* SpamAssassin is a powerful, at the cost of time-consumption, spam-killer. It is installed with the debian package.

In the following site-wide config /etc/spamassassin/local.cf, I use bayesian filters, razor2, several DNSBLs and I adjust some tests according to my needs:

# Save spam messages as a message/rfc822 MIME attachment instead of # modifying the original message (0: off, 2: use text/plain instead) # report_safe 1 # Set which networks or hosts are considered 'trusted' by your mail # server (i.e. not spammers) # trusted_networks 192.168.1. # Locales # # (I only receive mails in English or French) ok_locales en fr # Set the threshold at which a message is considered spam (default: 5.0) # required_score 3.3 # Use Bayesian classifier (default: 1) # # (I created the relevant directory) use_bayes 1 bayes_file_mode 0777 bayes_path /var/lib/spamassassin-bayes/bayes score BAYES_20 0.3 score BAYES_40 0.5 score BAYES_50 0.8 score BAYES_60 1 score BAYES_80 2 score BAYES_95 2.5 score BAYES_99 6 # Bayesian classifier auto-learning (default: 1) # # (I may change that, not sure about it) bayes_auto_learn 1 # Set headers which may provide inappropriate cues to the Bayesian # classifier # bayes_ignore_header X-Bogosity bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Status # use razor # (/etc/razor is the standard debian path) use_razor2 1 razor_config /etc/razor/razor-agent.conf score RAZOR2_CF_RANGE_51_100 3.2 # some rbl checks are already made by exim, at RCPT time, not all. skip_rbl_checks 0 rbl_timeout 30 score RCVD_IN_SBL 15 score RCVD_IN_XBL 15 score RCVD_IN_SORBS_HTTP 15 score RCVD_IN_SORBS_SOCKS 15 score RCVD_IN_SORBS_MISC 15 score RCVD_IN_SORBS_SMTP 15 score RCVD_IN_SORBS_ZOMBIE 15 # adjust some tests scores: lower DUL test score FROM_ENDS_IN_NUMS 0.2 score FROM_HAS_MIXED_NUMS 0.2 score FROM_HAS_MIXED_NUMS3 0.2 score RCVD_IN_NJABL_DUL 0.1 score RCVD_IN_SORBS_DUL 0.1 # lower stupid test score DNS_FROM_SECURITYSAGE 0.0 # adjust some tests scores score FAKE_HELO_HOTMAIL 3 score FORGED_HOTMAIL_RCVD 3 score HTML_FONT_BIG 2.4 score NO_REAL_NAME 2 score RCVD_IN_BL_SPAMCOP_NET 3 score SUBJ_ILLEGAL_CHARS 4.8 score EXTRA_MPART_TYPE 2.8 score SUBJ_ALL_CAPS 2.6 # increase all scores related to drugs: what do I care, duh score DRUGS_ANXIETY 5 score DRUGS_ANXIETY_EREC 5 score DRUGS_ANXIETY_OBFU 5 score DRUGS_DIET 5 score DRUGS_DIET_OBFU 5 score DRUGS_ERECTILE 5 score DRUGS_ERECTILE_OBFU 5 score DRUGS_MANYKINDS 10 score DRUGS_MUSCLE 5 score DRUGS_PAIN 5 score DRUGS_PAIN_OBFU 5 score DRUGS_SLEEP 5 score DRUGS_SLEEP_EREC 5 score DRUGS_SMEAR1 5 # same goes for porn score AMATEUR_PORN 5 score BEST_PORN 5 score DISGUISE_PORN 5 score DISGUISE_PORN_MUNDANE 5 score FREE_PORN 5 score HARDCORE_PORN 5 score LIVE_PORN 5 score PORN_15 5 score PORN_16 5 score PORN_URL_MISC 5 score PORN_URL_SEX 5 score PORN_URL_SLUT 5

The bayes directory must be created:

# mkdir /var/lib/spamassassin-bayes # chown Debian-exim /var/lib/spamassassin-bayes # chmod 0777 /var/lib/spamassassin-bayes

Obviously, it implies that razor2 must be properly installed. We install the debian package then set it up. Remember it must run with user Debian-exim, so we do:

# chown -R Debian-exim:spamslayer /etc/razor # su Debian-exim $ razor-admin -home=/etc/razor -register $ razor-admin -home=/etc/razor -create $ razor-admin -home=/etc/razor -discover

To save ressources, we start SpamAssassin as a daemon (spamd), that will be called using its specific client (spamc). Before using the initd script, edit as follows /etc/defaut/spamassassin:

# Change to one to enable spamd ENABLED=1 # SpamAssassin uses a preforking model, so be careful! You need to # make sure --max-children is not set to anything higher than 5, # unless you know what you're doing. OPTIONS="--create-prefs --max-children 5 --helper-home-dir -u Debian-exim -g spamslayer" # Cronjob # Set to anything but 0 to enable the cron job to automatically update # spamassassin's rules on a nightly basis CRON=1

All that being do, you’ll want to (re)start the daemon with the relevant initd script (/etc/init.d/spamassassin restart here).

* Now we’ll tune Exim to call all by himself first Bogofilter and then SpamAssassin, if necessary only. We use splitted configuration in /etc/exim4/conf.d/. That is debian-specific I think but it does make any difference anyway.

First we define useful transports in /etc/exim4/conf.d/transport/35_spamblock (the name 35_spamblock is arbitrary and the number does not matter here):

spamslay_bogofilter: driver = pipe command = /usr/sbin/exim4 -oMr spamslayed-bogofilter -bS use_bsmtp = true transport_filter = /usr/bin/bogofilter -l -p -e home_directory = "/tmp" current_directory = "/tmp" # must use a privileged user to set $received_protocol # on the way back in! user = Debian-exim group = spamslayer log_output = true return_fail_output = true return_path_add = false message_prefix = message_suffix = # spamslay_spamd: driver = pipe command = /usr/sbin/exim4 -oMr spamslayed-spamd -bS use_bsmtp = true transport_filter = /usr/bin/spamc home_directory = "/tmp" current_directory = "/tmp" # must use a privileged user to set $received_protocol # on the way back in! user = Debian-exim group = spamslayer log_output = true return_fail_output = true return_path_add = false message_prefix = message_suffix =

Second we define routers, here in /etc/exim4/conf.d/router/350_spamblock – the order matters, here it is just after 300_exim4-config_real_local and before 400_exim4-config_system_aliases:

# first bogofilter spamslay_router_bogofilter: # When to scan a message : # - it isn't already flagged as spam # - it has not yet been spamslayed at all condition = "${if and { {!eqi{$h_X-Spam-Flag:}{yes}} {!eq {$received_protocol}{spamslayed-bogofilter}} {!eq {$received_protocol}{spamslayed-spamd}} }}" driver = accept transport = spamslay_bogofilter # # second spamd spamslay_router_spamd: # When to scan a message : # - it isn't already flagged as spam # - it has not yet been spamslayed with SA condition = "${if and { {!eqi{$h_X-Spam-Flag:}{yes}} {!match{$h_X-Bogosity:}{^Yes}} {!eq {$received_protocol}{spamslayed-spamd}} }}" driver = accept transport = spamslay_spamd # # This route will send any mail that got here to the devnull alias, that # should be configured in /etc/aliases to be a real link to /dev/null. # This route should get only mails that have spam score higher than 14. # This will affect users mails! spamslay_killit: condition = "${if ge{$h_X-Spam-Level:}{\*\*\*\*\*\*\*\*\*\*\*\*\*\*} {1}{0} }" driver = redirect data = spam file_transport = address_file pipe_transport = address_pipe

* Next step, now that spams are flagged, it makes sense to put them apart. I do this with procmail. Here’s the relevant bit /home/localuser/.procmailrc:

IMAPDIR=$HOME/.Maildir/ ISDIR="/" DOT="." # tagged by hand, to be learned from by both SpamAssassin and Bogofilter spam=$IMAPDIR$DOT"Poubelle.Spam"$ISDIR # by spamd spamBySA=$IMAPDIR$DOT"Poubelle.SpamSA"$ISDIR # by bogofilter spamByBg=$IMAPDIR$DOT"Poubelle.SpamBg"$ISDIR expirable=$IMAPDIR$DOT"Poubelle.Expirable"$ISDIR # :0 * ^X-Spam-Status: Yes $spamBySA :0 * ^X-Spam-Flag: YES $spamBySA # :0 * ^X-Bogosity: Yes $spamByBg

* Training bayesian filters.

Now that spam ended up in a specific mailbox/maildir, both SpamAssassin and Bogofilter bayesians filters can be trained to be effective. We add the following in /etc/cron.d/bayes:

# trains bayesian filters BASEDIR="/home/localuser/.Maildir" SPAMDIR_MANUAL="$BASEDIR/.Poubelle.Spam/cur/ $BASEDIR/.Poubelle.Spam/new/ $BASEDIR/.Poubelle.Spam" SPAMDIR_SA="$BASEDIR/.Poubelle.SpamSA/cur/ $BASEDIR/.Poubelle.SpamSA/new/ $BASEDIR/.Poubelle.SpamSA" SPAMDIR_BG="$BASEDIR/.Poubelle.SpamBg/cur/ $BASEDIR/.Poubelle.SpamBg/new/ $BASEDIR/.Poubelle.SpamBg" # # spamd: can handle easily bogofiltered found spams 25 * * * * localuser /usr/bin/sa-learn --spam $SPAMDIR_MANUAL $SPAMDIR_BG >/dev/null # # bogofilter: not sure how it would cope with spamd headers so we'll avoid them # for now # (-u was not set as it is discouraged perf-wise in bogofilter's manual) # Dirty hack to cope with rights issues: running as root - not great 28 * * * * root /usr/bin/bogofilter --register-spam -B $SPAMDIR_MANUAL $SPAMDIR_BG && chown Debian-exim -R /var/lib/bogofilter

Obviously, if you want it to learn from plenty of different users, you’ll have to think of something more elaborate
Anyway, regarding plenty of users, it would actually probably wise to think twice about the whole concept of sharing bayesian filters that may not at all be accurate for very differents users.

I’m not very happy with the handling of bogofilter files read/write access, it remains to be cleaned up. Obviously, one alternative would have been to avoid meddling with Exim and to run both bogofilter and spamd via procmail. Sure, it would not have been site-wide setup but for a few users, ~/.procmailrc can be replicated easily. But actually I enjoy messing with Exim, that’s kind of a hobby. I skipped here the part where we call DNSBLs in Exim (working out-of-the-box anyway). And on a production server, with the SMTP wide opened to the web, it is possible to follow this approach just to shut off spammers at SMTP-time -which induces a huge resources gain- and even ban them.

Syndicated 2010-08-13 14:56:03 from # cd /scratch

4 Aug 2010 »

Underclocking, going backward?

Do you remember back in the days when a Pentium III doing 600GHz was awesome? At that time, when guys at Intel were foretelling that the increase of the processors clock rate will have no end, or at least none that they could possibly envision, you’d see that oath as testimony of the faith in a future of endless possibilities, gaming-wise.

Weirdos...

Later on, Intel went as far as publishing a Pentium 4 which was degraded version of the Pentium III. Less complex, less instructions, it was able to go higher in clock rate than any Pentium III, something 1.8GHz easily. It went on. People even bought laptop with Pentium 4 2.6GHz. And then people start noticing: hey! it’s winter, it’s freezing damn cold outside but I’m not even forced to turn the heat on! Or funnier, hey, why do my brand new laptop is making more noise than a vacuum cleaner? And what black magic made power supplies became a noticeably costly component of a computer?

Well, that’s all about physics. And there’s not much to do about. The faster the computer processing unit run, the more energy it will burn, the hotter it will get.

AMD was smart enough to soon start shipping processors with lower clock rate than Intel ones for the same effective potential. It was also smart enough to brand them accordlingly, branding them for instance something like 3200+ to tell they would be as potent as a Pentium 4 3.2Ghz, while they had a way slower clock rate.

Intel could surely not completely obviously go backward – and publicly recognize AMD wittier. But they could not loose the growing market of the laptops, where the heat issue (not to mention the impact on the batteries life) was too much of a problem with Pentium 4, so they invented the Pentium M… based on the Pentium III, of course.

Considering the unavoidable antagonism beetween fastness and energy consumption, the best idea that someone (who, I do not know) came up with was to enable the operating system to set the clock rate according to the current need. It comes with many different names (Cool & Quiet, whatever) and I believe is it now available with most recent processors. On Debian, you just have to install cpufredutils and load the relevant kernel module cpu (powernow_k8 for instance on my AMD Athlon 64 X2 Dual Core workstation) and then pick a policy. Yes, you have to pick a policy, like on demand, performance, etc.
Obviously, there is a performance loss (hence the name -performance- of the policy which actually only set the clock rate to max) since you are not always running the fastest possible: there is always a delay needed for the operating system to understand that now you need full speed when it was idle just before. The different policies purpose is to optimize this delay – tuning inertia, in which regard on demand is simply harsher than conservative.
Next step would be to have the operating system guessing if you’ll need full speed or not according to what you are actually doing (which software do you run, etc) and what you are about to do according to past usage (yes, logging what you use to do and making guess).

So currently, on a workstation like mine, using cpufreq on demand is probably a wise choice. Most of the time, it will run slower than it could, because you do not need full power of a recent processor to browse over the web, reply to mails and whatever crap like that you may want to do. And when you’re compiling a piece of code, when you are encoding a piece of music, then you’ll have full power. I never or rarely use GNU/Linux to play games so inertia is not a crucial issue – however, to play games, it would surely be best to set the policy to performance, even if after the game is started it will likely, anyway, request full power (surely, you configure your games to the best resolution, anti-aliasing, etc, that your box can handle, don’t you?).

(Not to mention that, gaming-wise, graphical cards now do a big part of the job, the most important anyway, making CPU less important by comparison to so-called GPU… but that’s another story)

So, now, I’ll get straight to the point. I run also a little shuttle box as local server. It serves files, it is up 24h/24 and do plenty of small things. It comes with a Celeron 2.6GHz but it surely would do as well with a slower clock rate. With in mind the idea of reducing the heat of this processor as much possible, I searched over the web on the subject of underclocking. The mainboard of the shuttle, by design, does not allow to make this processor run slower than it does. There is no possibility of playing with cpufreq or alike with a Celeron – which is actually a crappy Pentium (no L2 cache, less instructions, etc).

Pentium 4 2.80GHz running at 1.40GHz

I found however interesting the idea of buying a processor designed to run with a faster front side bus than the actual mainboard we have. It focus on the fact that the processor clock rate is actually determined by both the clock multiplier and the front side bus (FSB).
This shuttle front side bus runs at 400MHz. If I pick a processor, says, designed for a 800MHz front side bus, which is usual of Pentium 4 around 3GHz, it will run twice slower.

So I spent nine euros to get a (used - but the Celeron 2.6Ghz is not brand new either) Pentium 4 2.8GHz. And now, my shuttle runs 1.4GHz. Processor temperature is around 35°C, and the sole fan of the box is around 1300RPM. Nice side effect, this processor got Intel’s Hyper-Threading (simili multiprocessor), which is definitely good for a server.

The only remaining thing to do is to undervolt it now.

Syndicated 2010-08-04 14:58:50 from # cd /scratch

30 Jul 2010 »

Videos misc scripts: saving space, resetting subtitles

These days, while according to neutral sources movies industry has never been so juicy so it made obviously necessary to restrict freedom in France in the name of its survival, I thought nice to share two small scripts handy when dealing with videos on your harddrive.

Surely, you wouldn’t store videos downloaded over the internet that you haven’t paid for. I guess that’s immoral since in 2009 the US box office top $10bn for the first time in history, during worldwide economic downfall . An advertisement paid by these guys that made these $10bn, while bankrupcy was really an option for major financial institutions and eviction just the same for poor tenants, said it is piracy – while it is still hard for me to envision how it relates to these events occuring on regular basis in Malaysia, Cameroon, Red Sea, etc.

And if you are not concerned by the moral issue (a communist like Jesus Christ, aren’t you?), maybe you are afraid to get caught anyway. Well, it is unlikely that police would come to your house with a search warrant looking for piracy evidence. Mostly because there is no such thing in French law as a search warrant. Indeed, police is entitled to enter in your house in some cases: in three cases only. First, there is what is named commission rogatoire, an order given to a policemen by a judge to do something specific in his name such as search in your house. Looks obviously like an US/UK common law search warrant but it is not: sure it gives the same rights to the police, but it is not an usual procedure in France as it applies only to criminal investigations (information judiciaire), not for trivial misdemeanour/regulatory offences. Second, in the case of a enquête de crime ou délit flagrant (felony or misdemeanour punishable by jail time that just occurred), police can enter your house without your consent. Three, in what is actually the only case that would allow police to enter your house for a regulatory offence (which what this piracy is actually more or less about) is the enquête préliminaire – funny, in this case, police requires your (written) consent to enter. If guess that if I had this kind of piracy evidence at home and the police coming to my doorstep asking to enter in an enquête préliminaire, I would probably not consent.

And I would not even dare to bring the issue of the fine endorsed by this HADOPI law. It is said that, as a friendly reminder of your place you should not have forgotten so easily (customer, yes, that’s you – nothing else – even if you do no harm, it is not up to you to proceed otherwise as you’d wish), by Law, your Internet access will be discarded while you’ll still pay for it. Well, French penal Law states also that “Nul n’est responsable que de son propre fait”, meaning that you can only be punished for your own doings. Sure, there are exceptions (boss that somehow forced employee to misbehave, etc). But none that I think of such as the case of two persons living in the same house and being punished together for the actions of only one of them without the knowledge and consent of the other. By principle, this idea is outlaw, a regression of two thousands years of penal law, dropping us back in the days when you were entitled to take possessions (by force if necessary) of the belongings of a spartian you’ve just met because, as athenian (or whatever), you were recently spoiled by another spartian, no matter that they had no ties aside their citizenship.

1. So there is a script called dir2x264.sh that I wrote for the purpose of saving harddisk space by cleanly converting .avi and .mpg files to an x264-encoded .mp4 file.

It could surely be tuned – I noticed issue when dealing with .mkv files. So far it uses mencoder (mplayer’s encoder) with lavc as audio codec and x264 as video codec. So obviously it requires mencoder, with support of these two (usual) codecs.

To use it, go in the directory where you have avi or mpg files, put the script in there and call it (it will always process all the files in the current `pwd`).

$ cd ~/myvideos $ chmod +x dir2x264.sh # (if not made executable already) $ ./dir2x264.sh

It will log work being done in dir2x264log, to easily evaluate the harddisk space saved.

2. In case you cannot find (Have you tried SMPlayer?) the exact correct subtitle file for your video but found one that is just fine except there is a delay between the sound/image and the subtitle, the subtitle_reset.pl could do the trick for you. It depends on Subtitles.pm (libsubtitles-perl in debian).

This script takes two command line arguments: –file and –time (in seconds, positive value or negative), so the usage is quite obvious.
It will make a backup of the original file. If you run it several times to finely adjust your file, it will always restart from the backup file, unless removed obviously.

Note that however this script will not help if the matter is that the video and subtitles file frame-rate differs. You may want to give a try to subs, a script that is now shipped with Subtitles.pm (that was not there or that I missed when I first wrote this one).

Syndicated 2010-07-30 20:08:39 from # cd /scratch

7 Apr 2010 »

RSS feeds: HTML output with rawdog from akregator’s OPML

Akregator, a KDE RSS aggregator

RSS feeds are probably one of the most useful tools of nowaday’s internet. Obviously, it is not really complicated to find interesting pages over the web. It is way harder to keep up to date, however. These feeds fix that issue. I will not explain what RSS fields are but will focus on how I use them.

On my main workstation, with KDE, I use Akregator that aggregates all the feeds. It is nicely integrated in the enviroment: in Konqueror, with one click, I can add whatever RSS field is mentionned in the headers of a HTML page. After adding RSS fields, I can sort them by categories I defined.

Akregator's feeds in rawdog's HTML output

It happens from time to time that I want to access my RSS fields on another computer over the network or even with my laptop over the web. Here comes rawdog, a “RSS Aggregator Without Delusions Of Grandeur”. I picked it because it is easy to set up and lightweight (unlike TinyRSS etc). This aggregator is installed on my local network server and uses akregator list of feeds and produce a multicolumn HTML output that apache serves.

First, on my main workstation (on which one I use Akregator): I set up a cronjob that copies Akgregator’s feeds list to my user account on the server. Note that I use SSH with a key with no passphrase to do so.

/etc/cron.d/rawdog:

25 * * * * user if [ -e ~/.kde/share/apps/akregator/data/feeds.opml ]; then scp ~/.kde/share/apps/akregator/data/feeds.opml server:~/ 1> /dev/null ; fi

(It is nice that Akgregator use the OPML format and not a specific config file)

Next, on server side, on which one rawdog has been installed (nothing specific here, it is shipped by Debian), I created a rawdog user account then made a symlink from /home/rawdog to /var/www/rss.

We need first to provides rawdog with the Akregator’s OPML – it does not support it. To do this, I fetched a perl scripted made by Tero Karvinen that I edited so it support categories. It results in the following opml_to_rawdog.pl stored in /home/rawdog/scripts (/var/www/rawdog/scripts directory access over http being forbidden by Apache).

We set a cronjob to produce a feeds list that rawdog can handle, /etc/cron.d/rawdog:

# Make sure we have the latest feeds, imported from akregator # if there is a user/feeds.opml, compare it with the current one 30 * * * * root cd /home/rawdog && if [ -e /home/user/feeds.opml ]; then if [ ! -e feeds.opml ] || [ "`diff /home/user/feeds.opml feeds.opml`" != "" ]; then scripts/opml_to_rawdog.pl /home/user/feeds.opml > feeds ; fi ; fi

I put a rawdog config file in /home/rawdog and I edited it to suit my needs. Most notably, I edited as follow /home/rawdog/config:

maxarticles 50 datetimeformat %d %B, %Hh%M template templates/page itemtemplate templates/item outputfile index.html showfeeds false

feeddefaults
killtags true
truncate 220

# this is the file generated by opml_to_rawdog.pl
include feeds

It relies on two templates pages stored in templates/ directory: templates/page and templates/item. Not surprisingly, the layout is based on a CSS file called style.css – you will have to edit it to match categories names. It also requires the truncate plugin, to be stored in the plugins directory.

The last step is to update /etc/cron.d/rawdog to actually generate the HTML output:

# Run every 9 minutes */9 * * * * rawdog cd ~/ && /usr/bin/rawdog -d ~/ -u -w 2>> errors # once per month, clean up the errors list 10 2 2 * * rawdog rm -f ~/errors

That’s all folks (even if there is room for improvement)!

Syndicated 2010-04-07 22:37:54 from # cd /scratch

128 older entries...