Advogato: Blog for dhd

Cool. I got the Sphinx-II "continuous audio" module working with select(2) in Perl. So I can get rid of this horrible bit of code in my speech I/O framework, since I don't have to fork off a blocking process to do speech recognition anymore:

	# ARGH ... POE has messed with %SIG
	@SIG{keys %SIG} = ('DEFAULT') x keys %SIG;

The Sphinx-II "non-blocking" utterance processing interface is kind of broken... it processes only a single frame of data at a time, which is way less than the amount typically available to read from the audio device, and there's no explicit function to flush unprocessed frames (though you can just call uttproc_rawdata() with an empty buffer and blocking on :-) Fortunately, recognition is so much faster than real time that there's absolutely no reason to use non-blocking mode, even in a single-threaded program.

Once the 0.3 release happens I will volunteer to take a hacksaw to all the redundant and poorly-designed interfaces in Sphinx-II, fix them up, and properly document them.

In other news, I'm learning POE incrementally ... I've just taken the first step towards using it as more than just a convenient wrapper for select(2). Namely, I've taken my random collection of states handling the Festival server, Sphinx, audio I/O, and "dialog management" (such as it is - currently this is just "Hello World" and repeating back what the user says), and split them up into multiple sessions. Soon I'll take a stab at packaging them up into actual components.

And I just discovered that the ALSA emulation of the OSS interface is not quite bug-compatible. In ALSA, select(2) on PCM devices (including /dev/dsp) works as expected. With the kernel OSS drivers, you have to call read(2) on /dev/dsp before you can select it for reading, and if you start writing to it, even if your sound card is capable of full-duplex, you will no longer be woken up on read. Total fucking brain damage. Sigh...

23 Jan 2001 dhd » (Master)