Recent blog entries for berend

An update on the Linux NFS server problems I had: massively high i/o by process jbd2/xvda1-8.

The initial fix I settled on was perhaps a file system corruption issue. That didn't work actually, even though initially it looked like it did.

Then I thought it maybe a Linux kernel. But even that didn't permanently fix it.

Issue just appeared again: added one more php5-fpm server to the mix, an identical copy of another one, and bang problem came back.

My latest solution: I was using an lvm2 stripe, switched to an md stripe. Seems to work so far.

26 Jun 2014 (updated 26 Jun 2014 at 23:33 UTC) »

Suffered from very slow scp to a FreeBSD server. rootbsd said the hpn patch should help, but that it was already applied in FreeBSD 10. That was the pointer I needed.

By default FreeBSD has some kind of scp transfer performance improvement patch:

# Disable HPN tuning improvements.
#HPNDisabled no

When I change this to:

HPNDisabled yes

My scp performance jumped from about 300KB/s to 8MB/s in this particular instance. Now applying this to all my FreeBSD boxes, on my local network performance jumped from 58MB/s to 71MB/s.

I think this setting should be disabled!

As an update to my last post on weird Ubuntu 12.04 NFS4 server load: repairing the ext file system actually didn't really work.

Redid the test, problems came right back.

Next thing I did was turn the root volume into xfs: better, but still writing 1MB/s, with 50% i/o utilisation for the root disk.

So perhaps a Linux kernel things. Created a 13.10 nfs server, problem disappeared.

28 Mar 2014 (updated 3 Apr 2014 at 03:28 UTC) »

Really weird issue yesterday trying to move a customer to AWS. Testing was all fine, but when we switched the ip address, the system grinded to a halt.

The cause was the NFS server, which became unresponsive, so web and php5 farms stopped. Using iotop I found out that this was caused by the jdb2 process, jbd2/xvda1-8 in my case. jdb2 basically was at 100% i/o. Initially I thought perhaps the instance was faulty, so build a new NFS server (simply replaying my ansible script). Got exactly same behaviouron the new server, all i/o grind to a halt as jbd2 took over as soon as I did even simple things like checking out a Drupal repository (so single client, doing an svn co).

But why would jbd2 kick in? With iostat -x 1 I determined that we were writing 5MB/s to the root file system. That made no sense. There is nothing on this NFS server that would do that. All data is on separately mounted EBS disks. The root file system is ext4, but all the other files systems were xfs! And the clients only mount the xfs file systems.

Using a suggestion to debug what's going on, I tried:

echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable

Waited for a minute and then did:

cat /sys/kernel/debug/tracing/trace

Got a lot of lines like:

nfsd-943   [000] 8559086.521147: ext4_sync_file_enter: dev 202,1 ino 30703 parent 30666 datasync 0
nfsd-942 [000] 8559086.527871: ext4_sync_file_enter: dev 202,1 ino 30703 parent 30666 datasync 0

OK, clearly the NFS daemon causes a lot of datasync() calls. But why would this have an effect on the root file system?

After more googling I found this comment:
Problem vanished after fsck'ing my ext4 partitions

Huh? Worth a try. Stopped NFS server, mounted root disk on another server, ran fsck:

# fsck /dev/xvdf
fsck from util-linux 2.20.1
e2fsck 1.42 (29-Nov-2011)
cloudimg-rootfs: clean, 62399/524288 files, 378952/2097152 blocks

and reattached. Problem solved!!

I have no explanation for this behaviour, except that maybe the latest Ubuntu 12.04 LTS AMI has a bad disk.

PS: I now believe I was wrong, see this update.

The C programming language has been a plague of security problems for decades. When oh when will programmers abandon this language? It's unsafe under any conditions. Who needs type checking? Gotos are not harmul. It's depressing. We have known this stuff since the 60s.

Still using Ed L. Cashin's 2001 backup scripts on my network, a bit sad perhaps. But haven't found a more flexible environment.

I moved it along to support xfs, now zfs, my backups are written to disk first, then concurrently to tape while the next backup start. All without too much work.

All other backup tools look like a bit too much work, and you might get stuck when technology moves on.

Kevin Drum on healthcare insurance, so true:

Let’s be honest. What we all want is unlimited access to medical care; unlimited access to any procedure we want no matter how pricey; unlimited choice of physicians; instant availability of doctors every time we get an ear ache; and we’d like all this for free. That’s what we want. And we’re annoyed when we don’t get it.

At Xplain Hosting we have been hosting Drupal since 2008. First on Ubuntu 8.04, systems built from scratch. To upgrade to another version, you build another system from scratch.

Have now switched to Ansible, so hopefully never have to do that again. And we now have good documentation what a working system looks like.

Upgraded from FreeBSD 8.x-STABLE to FreeBSD 9.2-PRELEASE. Disaster. Went back to FreeBSD 9.1. Still disaster. Every few hours the kernel panics, and the server reboots. Have completed major changes, went from i368 to amd64. Same bug. Somewhere PF has become unreliable in FreeBSD 9.

Not good. Have used FreeBSD since version, never had these problems.

Not sure what to do now. Probably simply my pf rules, which won't be easy, to see if the problem disappears that way.

Have gone to the 20th anniversary of the Manukau Symphony Orchestra (MSO). We've heard them since they played in the Papatoetoe Town Hall.

They had commissioned a piece of Gareth Farr, interesting, but won't work well via CD/Spotify I'm afraid. Usually a problem with modern music. Next piece was Mozart's Violin Concerto No.3 in G, K.216. I'm not a Mozart fan usually, unless played in period instruments, but Loata Mahe gave a great solo, really individual performance.

The big piece was Mahler. The first Mahler for the orchestra, and a really great performance from Uwe Grodd and the orchestra. Uwe was able to draw the music out of the orchestra, especially the beginning of the first movement was very well done. I liked the third movement, first half of fourth movement also went exceptional. Great night.

391 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!