Older blog entries for forrest (starting at number 81)

20 May 2005 (updated 20 May 2005 at 06:21 UTC) »

Well, mirwin's attempt to link to an essay my wife wrote finally got me to post here again (it's actually here; it ends in '.html' not '.htm'). It's a little stilted because her assignment required her to use specific terms from her Sociology class, but at least going to school can get her to write.

She has talked about writing some short stories to illuminate how China has changed in recent years. I hope she will.

I'm a temporary bachelor again, as my wife and half-year-old son went back to China to visit her parents. (Here are a few pictures.)

I'm a little disappointed my Hanzi Quiz program hasn't been more popular. It's popular in it's niche: it was number one in a Google search for "hanzi" for years, until it got beaten out by the venerable Hanzi Smatter.

The thing is, while I took special care to deal with the unique issues surrounding the use of Chinese characters, what I've written is a general purpose study program. The categories could be anything: a dozen different languages (see any one and guess any other), common names versus latin names for plants or animals, dates versus historical events ... absolutely anything. And nobody has taken advantage of this.

Even within the realm of Chinese characters, almost no one has changed the quiz data; many links to my site say "here's a very basic hanzi quiz", but none of these people even seems to realize they can take it and make it as advanced as they please.

Part of the problem is the difficulty of editing the utf-8 entries ... while I could enter the Chinese using emacs, I had to whip up a little perl program to convert "pinyin with trailing tone numbers" to "pinyin with (accent marks as) tone marks". I bundled that program in my tarball, but I don't think anyone else is going to use it.

A potential change for Hanzi Quiz, which I am half-done with but dropped becuase I doubted anyone else would be interested, is to translate that perl code into javascript, so the quiz entries (the "cards") can be written with tone numbers, helping somewhat with the data entry problem. That javascript's done, but that lets me do a second part:

Before the quiz begins, show a screen with all the accented characters used to display the pinyin. If the user can't see all of the characters correctly (which has been a problem in some environments) they can choose to use the tone numbers instead. Yay! What's more...

... using an intro page can allow the user to select which "quiz" to load (a trick I found that works on moz and ie, you have to do something like document.write("</SCR" + "IPT>") to keep the browser from getting confused, but you can choose which .js to load based on a choice from a dropdown list).

The problems are

  • The whole thing about the tone numbers pushes the code in the direction of being specifically about hanzi, especially with the "can you see these chars" intro page. (Although quizzes on any subject could still be run in it, and just not use the pinyin stuff.)
  • Being able to select many possible quizzes requires someone to come up with the quiz content. I was really hoping someone else would do this part, like a teacher or something. But I haven't had much luck getting anyone else interested.
So, that project languishes.

I could go on about other stuff (like should I try a new distro?) but I'm already up way too late as it is. Maybe later.

Hey everyone, I'm back.

I haven't been working on any projects lately; and I'm kind of wondering what I should do next. I've put some effort into trying to figure out why my Hanzi Quiz program is hosed in Safari (and Konqueror): that's led me to submit a couple of bug reports.

The second of those reports is so whack that I wonder if something must just be wrong with my system: I first discovered that some accented characters don't show up in utf-8 encoding (although chinese chars are fine), and then I tried iso-8859-1 ... and accented characters didn't appear then either!

I thought that had to be a mistake, so I went and found the link to the famous French newspaper Le Monde, and that showed accented characters just fine. But when I viewed the html source, I found that all the accented characters were numeric entities! Do you Europeans never use the actual iso-8859-1 byte values (or utf-8 byte sequences) for your accented characters?

It strikes me as near impossible that Konqueror, as a European project, would fail to catch this -- but that's what's happening on my (Debian unstable) system.

In any case, Konq (and therefore Safari) look so hopelessly hosed to me that I'm not too inclined to try to make any serious DHTML work on for them. I think I should find another project for a while.

On a personal note, my wife and I are expecting a son in late October. That's an exciting prospect, but also somewhat scary.

11 May 2004 (updated 11 May 2004 at 07:21 UTC) »
Pardon my French ...

Scanning my web server logs, I found a French site with a link to my Hanzi Quiz program.

(Dang, I'm happy my program seems to be so popular. But I guess I'm not too suprised, because I looked around a lot before setting out to write it, and there really was nothing else like it.)

Unfortunately, the description of my program on this site n'est pas exactement vrai. It says "Pour Linux", but the reality is that it's written in Javascript and happily runs in Mozilla and Internet Explorer, and probably other browsers with Javascript support (although, notably, Safari has problems).

It's possible to edit the description, but ... well I took French in high school, but I'm sure if I tried to say what I just said in English above, it would be at best un-idiomatic and quite possibly incorrect or even unintelligible.

I could, of course, write English into their French site, but that doesn't seem right.

Could someone who knows the language please update this entry for me? I'd be very grateful.

Your tax dollars at work ...

This is what you will find at the website http://www.immigration.gov/ (slightly edited to only use advogato-allowed tags):


404 - Requested Page Has Moved
The website for U.S. Citizenship and Immigration Services (USCIS) moved to http://uscis.gov. You will be redirected to the requested page only after clicking on the "redirect me" link below.

All web pages at http://www.immigration.gov and http://www.bcis.gov have been moved to http://uscis.gov

If you attempted to reach this website through a bookmark, please change the bookmark.

If you were referred to this address by another website, please contact the owners of that site to inform them of this change.

redirect me to uscis.gov

Thank you, U.S. Citizenship and Immigration Services (USCIS)


404? Uh ... a redirect is a 302, right? But you're not actually being redirected at all. What do the headers actually say?

HTTP/1.x 200 OK
Server: Microsoft-IIS/4.0
Date: Fri, 09 Apr 2004 01:54:57 GMT
Content-Type: text/html
Cache-Control: private

Oh, so someone using IIS can't even set up a redirect, and thinks they're using that sophisticated web lingo correctly by promenently labeling this page "404".

I could make a lot of jokes about government stupidity here, but I guess this page pretty much speaks for itself. As a taxpayer, I want to know why I'm paying for another IIS server, when Apache is so well proven? Why am I paying the salary of the bozo who set up that page when there are so many high school students who can set up a proper redirect?


26 Mar 2004 (updated 26 Mar 2004 at 07:24 UTC) »

Which Linux Laptop?

At my work, I'm now doing some work on a java webapp written with Struts. Sadly, this means I'm now mostly using Windows ... even though everything I need to work on runs fine under Linux.

I have been using a Linux box at work that's a 200MHz or so cast-off, and have a kvm switch to go between it and my Windows box. Until this latest project, I've only used the Windows box for mail. Needless to say, it isn't quite up to the task of this java development. Even the 1GHz Windows box I have with 512 Mb RAM is clunky.

Although it really shouldn't matter what my desktop is, I'm struck by how clunky it is to do anything even a little complex without multiple virtual desktops. I'm always staring at the little rectangles (now almost squares) in the start bar, trying to deciper which of the many windows I have open is the one I'm looking for.

Although I doubt my work would spring for a decent Linux box for me, I think I could get a docking station if I brought my own laptop. Since I've wanted to get a linux laptop anyway, I've been thinking about that more and more.

The requirements are a little different than what I might choose on my own. A fast CPU, lots of RAM (at least a gig), and a big hard disk are what I need. Battery life is not a consideration in this case: I'll almost always be using this unit at work or at home, with external power.

All other things being equal, I prefer AMD CPUs, but that's just wanting to support the underdog, and is for me much less important than functionality.

I've always been initmidated to buy a laptop for running Linux; they're expensive and have all sorts of quirks which make it a gamble.

I'm soliciting recommendations, and I hope some of you will help me out before using Windows drives me (any more) batty.

Thanks for any advice!

You Rock!

Within minutes of my last post describing the gcc weirdness I was experiencing, tk put his finger on the basic problem: I was running into a stack size limit. And he gave me a workaround much simpler than my use of malloc.

Clueless as I am, I was still left wondering what had happened to my configuration, since my program worked back in August. AlanShutko provided the vital information: the stack size limit is a function of the shell. The ulimit -s command he suggested revealed that I do indeed have a stack size limit, set at 8192.

I can only guess this limit is compiled into the executable, because it's not in /etc/profile, ~/.bash_profile, or ~/.bashrc. My /bin/bash is dated Feb 22, so it has been replaced since my program last worked.

Finally, haruspex raises an interesting question: why wasn't the large unused array in my sample program simply optimized away? He tested this with gcc 2.95.4, and just now I tested with 3.3.3, and using -O2 doesn't get rid of the unused variable. This observation makes me feel "hey, this isn't just about me being an idiot". Cool.

Thanks, guys.

21 Mar 2004 (updated 21 Mar 2004 at 07:27 UTC) »
Debian "sid" gcc problem

I hate it when I get a problem with a basic piece of infrastructure and I don't know which piece is responsible for the error. Where does one begin to file a bug report?

Since this is my diary, I feel that I can at least blow off steam by posting here ... and maybe (often!) someone will give me a clue. That's got to be one of the most inefficient ways ever of filing a bug report, but hey ... it's just my diary, ok?

I run Debian "sid" (a.k.a. "unstable") and I was revisiting some code which worked on August 19th, only to have it segfault on me.

I eventually found I could reproduce the problem just by declaring a large array. Here is (literally) a hello world program which exhibits the problem on my machine:

#include <stdio.h>

int main() { /* char arr[8385040]; no problem */ char arr[8385041]; /* segfault! */ printf("Hello, World!\n"); }
The segfault occurs on the printf statement. WTF?

I get the problem when with either

 gcc -o hello hello.c && ./hello 
 gcc-2.95 -o hello hello.c && ./hello 

Can anyone else reproduce this, or am I just going crazy?

As a workaround, I found that I can use malloc to create my (syntactic equivalent to a) large array with no problem.

... random personal stuff ...

I just got back from my 4th trip to China. We're having trouble getting permission for my parents-in-law to visit the U.S. (the U.S. government objects, not the Chinese) and my mom really wanted to meet them. So we took her to Wuhan, and of course visited famous sites like the Great Wall while we were there.

I like China. I wouldn't want to live there, but I think it would be cool if I could work there for a year or so. I'm too tied to job security, though, to actually try to make that happen.

And oh yeah, Happy Birthday to me. I'm now 42 years old.

I never realized how hard (impossible) it was to even get a modestly complex html table to render correctly in modern browsers: I wrote an account of my frustrations (with screenshots) here.

Ok, I've got my perl script for price-shopping at CD BABY done. It adds the number of tracks as well as the prices to the genre list you select, and sorts it by price.

You still can't tell how long any of the CDs are, which is whack: number of tracks, although very poor, is the best clue available. (From now on, I'm writing the artists to ask how long their CDs are before I buy them!)

Find cdbaby_cheapskate.pl on my hacks page.

Java driving me nuts

I'm starting to get involved with a java development group at work, and I'm trying to get tomcat set up on my Debian box at home.

It's not going smoothly, and unfortunately no one has replied to my post to the debian-java mailing list, although it's been several days.

If someone here could offer me a clue, or advise me where else I should be asking, I'd be most grateful.

It seems that all my java experiences are going like this. It's not so much that it's a steep learning curve, it's that certain pieces of information I need are nowhere to be found.


I finally placed an order with the independant CD store that all the anti-RIAA slashdotters always mention. I talk about what I ordered over on LJ.

While they're really cool, I have two problems with their site

  1. They don't encourage price shopping, and
  2. They don't say how long any of their CDs are.
I wrote a little perl script to get around (1): it takes one of their "all artists of a given genre" html pages, fetches all the prices and outputs the html with the list sorted in price order, and with prices displayed. (I'll be putting it out on my hacks page once I get it a little more user-friendly.)

But shopping on price alone is not enough; the shortest CD I bought was 37:29 and the longest was 70:02 -- that's a huge difference. Duration isn't everything, but it's certainly an important factor.

I wrote to CD Baby asking them to include this information in their CD descriptions. I don't know how they'll respond, but if more people request the same thing (hint hint), I guess they're more likely to notice.

72 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!