Older blog entries for lethal (starting at number 11)

770 Notes

pycage, while MPU-side decoding is the easiest way to go, DSP-side will still be beneficial (albeit somewhat more complicated). Whether the benefits are worth the effort is another matter. The tools that you need to roll your own codecs are available, and you can do this mostly in C without having to resort to too much tms320c55x assembly. The biggest issue is likely familiarizing yourself with the DSP kernel, the socket node interfaces, and so forth. Most of this is documented pretty well at the dspgateway page.

For the adventurous, there's still an unused mailbox line between MPU and DSP on 1710 in the current implementation that could probably be round-robin'ed pretty easily. We also presently don't make use of hardware page table walking, which makes the exmap interface a bit clunky (essentially wiring TLB entries by hand, but at least they're pre-faulted).

It would also be interesting to see how the FP-driven codecs compare to the integer-based one under EABI with a soft-float toolchain. ogg123 might even be usable out of the box with soft-float (though at likely higher than the CPU utilization numbers that have been quoted). On another note, it's also pretty easy to figure out DSP load average through the sysfs interface, so it may be worthwhile to profile some of that, especially if the DSP ends up getting more heavily loaded.

Haven't posted here in awhile. Work is keeping me busy. As is getting the kernel running on SH-2A on the MS7206SE01 board.

On the sh front, things have been progressing nicely with the new clock and timer frameworks. The timer stuff is still in need of being extended to more transparently deal with multiple timer channels, but this can wait until the timesource driver stuff on l-k sorts itself out. No use redoing the timer stuff twice..

On another note, the cpufreq driver still needs to be reworked for the clock framework as well. This will still take a bit of doing, but in the end it should leave us with a single driver capable of dynamic scaling on every CPU subtype that hooks in to the clock framework (this will go on the TODO list for now).

With sh64, things have also been pretty quiet. Ran in to some fairly consistent slab corruption that seems to have only popped up in recent kernels, suppose its time to dig out the redzoning for non-BYTES_PER_WORD minaligned architectures patch and get slab debugging working again. Unfortunately the UW SCSI drives I was using that managed to trigger this on my Cayman both ended up killing themselves. Lets see how far we get with onboard IDE.. judging by the schematics, at least PIO was wired right, and should mostly work (DMA on the other hand..). Some of the GPIO configuration in the SuperIO is probably still off (since much of that was borrowed from microdev), so it seems there will be more than one thing to debug..

And just to show how often I actually log in to this thing, I seem to have had this following paragraph started, which was amusingly retained (from some time in 2003):

More uClibc hacking today and the last couple of days. Started working on the shared loader backend for sh64, which is now at the point where most of the work is done, but now there's just a lot of debugging and testing left. At least some good has come out of it so far, it turned out that the R_SH_IMM_MEDLOW16 relocation was broken in multiple ways in glibc, so I ended up fixing that while writing up the relocation handling code for uClibc. Regardless, the uClibc stuff is in pretty good shape now, so the next logical step is to start tinkering with buildroot and friends, though that will still have to wait till after some more debugging time.

The ironic thing is that years later, the sh64 ldso stuff needs to be fixed again due to some ABI changes, though I have so far been successfully putting it off. ldso is vindictive ;-)

Disclaimer: As nothing really interesting has been happening lately, be forewarned that this entry will be somewhat dry and generally boring, even if for some reason you _are_ interested in the state of Linux/SH-2 support.

Lots of SH-2 hacking lately, quite exhausting, though still quite fun. The VBR semantics are completely different in relation to the SH-3/4, so this buys entry.S a much needed overhaul. Unfortunately this also required some changes in semantics, at least on the SH-2 side for the general-purpose exception handling code -- though this is all quite hacky already, especially given the number of different registers and register names, etc.

Another minor nuisance, gcc sanely labels things like saving off ssr as an SH-3 and up instruction, but binutils subsequently defaults to accepting virtually anything as valid. binutils CVS now seems to properly support a processor family flag that clearly defines this, so that should be dealt with relatively well once I get finished hacking that.. this will be an interesting contrast to gcc flags by ABI level, so hopefully that will all work out cleanly. Between that and the latest -fno-zero-initialized-in-bss mess with 3.3, I definitely hope we won't need more stupid gcc/binutils version specific checks for the kernel build, as these are already starting to add up..

Additionally, the fixed references to arch/foo/kernel/vmlinux.lds.S in the top-level kernel Makefile are truly annoying. This now forces anyone who wants to use multiple ld scripts to either make a wrapper script with ifdef abuse, or do gross symlinking hacks at build time. This is certainly a disappointing step back in comparison to the 2.4 behavior..

Back to the SH-2 issue, it should be a lot easier to identify what still needs to be done (other then things like the system call interface, which still needs cleanup for things like TRA referencing, INTEVT/EXPEVT stuff I just finished) once the aforementioned binutils issues are out of the way. It's quite bothersome to identify problem spots when the assembler will knowingly accept accesses to things like different register banks and ssr/spc, etc. even when these don't actually exist on the SH-2.. though I'm sure there will be quite a few. At least now with the exception vector, early SCI console, XIP, etc. out of the way, we should be set to actually start debugging on live silicon.. Now the only other trick is getting the page_alloc2 stuff updated and merged, and getting the overly pesky inode and dentry cache hast tables reduced in size -- there's not a whole lot of room when you've only got 512KiB of RAM to work with..

Also got some 7760 IPR patches sitting in my home directory, this is pretty much the last remaining portion of the 7760 backend that needs to go in (I did the exception vector / sh-sci / etc. stuff previously). So this is definitely good news, even though it reminds me that I still need to get the 7040/7044/7045/etc. stuff figured out and written..

Lastly, also got some uClibc hacking done. Some relatively uneventful sh64 syscall updates to satisfy current busybox, etc. Just finished off the pthreads work, so now we should be good to go for static pthreads.. that still leaves the ldso work, but that can wait for another day (particularly as it's rather mind numbing). After that, we should be able to start doing sh64 builds under buildroot, should be fun.

Well, decided to give gnome 2.4 a try. This proved to be rather entertaining, as the last time I attempted to build gnome by source was many years ago, and that required much hacking just to get the thing to pretend to build. Gave garnome a try, and that seemed to work pretty well, though several packages needed some persistent prodding before they wanted to work on my rh7.3 workstation. Now just have to wait and see how painful this will be under osx. Also, in regards to all of the recent gnome-blog traffic in the recentlog, I was surprised to have it die randomly after hardly any text entry. I'll stick with mozilla and safari for now.

Got around to starting on some DocBook stuff for sh, which also proved to be interesting. Most of it is behaving quite well, except I seem to be getting duplicated description entries from referenced source. I've not seen this before, and don't see anything obvious looking at the parser. This needs more investigation. Oddly enough, this seems to only occur on certain source files, and is completely isolated to the description, as all other fields are parsed correctly. Most irritating.

Minor other work on the sh tree. Added in compatability hooks for the old ISA DMA API to wrap to the SH DMA API, which did a pretty decent job of outlining a lot of the limitations with the old API on this particular hardware. However, for anyone wanting less-than-exciting single-address DMA transfers without hacking things for the new API, this seems to work just fine. We also now do proper cpu flag reporting as well as some cache reporting in cpuinfo, though nothing particularly exciting.

Spent a bit of time working on AICA / SPU related things on the DC today. Started out writing a module for the g2 dmac to do spu dma directly from the aica channel, though this still needs much debugging. So far we don't seem to be able to keep consistent data in registers (ie, write-out a p2seg addr and get back the same address in an entirely different segment). other things, always read 0, which in itself isn't a problem, but the lack of the completion interrupt firing certainly creates some issues. the joys of undocumented hardware.. back to the wince dump.

On a similar note, we can use a channel on the sh dmac itself for writing out the buffer, which is what is happening for testing now. unfortunately since we only have 4 channels on the 7750, this isn't an option. at least now I can look at optimizing some of the completion / signalling code in the subsystem so we don't use quite as many cycles (polling for residue sucks). however, the good news is, even when we're constantly polling for residue, cpu usage is still down from the old manual copy / wait on fifo method. Once the remaining performance things are ironed out, there should be no problems dealing with any high-bitrate samples thrown at it. This will be even more fun once zx80user gets his alsa driver finished and supporting all of the aica channels, instead of just the two channels supported by the oss driver.

Merged / cleaned / rewrote random parts of SnapGear's SH-DSP patch from the uClinux tree a few days ago, which turned out to be quite fun. As a result, rewrote most of the cpu init code, which should now be much easier to work with (especially for adding probe hooks / setting cpu flags). So far this is proving to be quite clean.

Also got most of the 8139too hacks cleaned up and merged in both 2.4 and 2.6, which handily knocks off another thing from the HEAD TODO list, and nukes yet more common code cruft from CVS.

More random tree maintenance. Cleaned up random bits of the SMP code, and made the SMP kernel compile again. Though it will likely be awhile before I'll get around to testing this. Being the sole user of the SMP code however, makes this perfectly reasonable.

Quite a bit of this stuff also needs documenting, which I can safely say is one of my least favourite activities (which Documentation/sh seems to reflect). Perhaps its time to take another look at the DocBook stuff, as it would be quite nice to organize most of this into a general sort of sh architecture guide, instead of just random text files. This also incidentally happens to be another point on the TODO list. Now I just need the motivation to write documentation instead of hacking on code. The general tediousness of debugging SPU DMA might spur this on quicker than anticipated.

Falling behind on posting again, so here's a quick recap on what I've been hacking on recently.

Made quite a lot of progress on everything DMAC related the last few days. We're now much closer to something resembling a real subsystem, though there's still a few minor things to work out (including polling threads per DMA engine for doing large unblocked transfers). As a test, I also wrote a quick and dirty clear_page() and copy_page() using a dual-address mode configured channel on the SH DMAC, which ended up working quite well, despite some icache oddities in the clear_page() case which still need further debugging.

With the birth of the new SH-specific dma subsystem, I was also prompted to move the PCI stuff around again, and now we have a nice new shiny arch/sh/drivers/ where pci and dma stuff live. In the future, I suppose I'll move the sh cpufreq stuff here as well, as it really has no place in the arch/sh/kernel/ heirarchy .. though that's something for another day.

Anyways, now that I've got dual-address mode DMA as well as cascading to the PVR2 DMA in the Dreamcast, I suppose it's time to try to figure out how to tie this into pvr2fb in some sane fashion. For one, I don't seem to be able to use the user address of the write buffer as a source address for the DMA transaction, so something needs to be done here so we don't have to have the copy_from_user() overhead before we can start up the DMA transfer. Not entirely sure how DRM/DRI deals with this sort of stuff, though I suppose that's the next logical point to start looking at.

This would be much less of a headache in uClinux.. ;-)

Falling somewhat behind in posting frequency, so I suppose now's a good time to make yet another entry. Nothing overly eventful lately, managed to finish off most of the remaining issues with the store queue API I've been working on, which was nice. Unfortunately the only way I could get the cleanup and flushing for userspace mappings implemented in a clean fashion entailed adding back in the unmap and sync ops to the vm_operations_struct. These seem to have been removed in 2.4-test time, mostly because no one was using them. Hopefully it won't be too much of a fight to try and get these merged back into 2.6 proper .. otherwise it'll just be another thing stopping sh from working out of the box on vanilla 2.6.

On another note, it appears that mrbrown's ps2 "exploit" has been slashdotted. This wouldn't be much of a problem, except for the fact that that exploit happens to be posted on the same machine I use as a mailserver and for my IRC sessions. Lag suddenly has new meanings. This particular exploit is quite exciting though, I'm almost tempted to take one of my ps2s and write a native pong clone that doesn't happen to be RTE or reload1 encumbered .. though I shouldn't get carried away. Unfortunately for some (particularly certain petty individuals with some inferiority issues to work out), this is instantly viewed as a method for furthering rampant piracy. Regardless, this certainly seems like good news for the ps2dev community.

Did a number of non-kernel things today -- which in itself doesn't seem to happen very often. Ported uClibc CVS HEAD to sh64 this morning, based off of some ancient patches from SuperH. Nothing too exciting port-wise, static linking is the only thing that really works at the moment, as I still have to sit down and hack on the ldso and libpthread interfaces, but I'll leave that for another day. A static hello world stripped comes in at a hefty 41k, which certainly has glibc beat. Patches off to andersee, and should hopefully be merged soon. So far so good.

Also spent some time hacking on a simple RDF/RSS parser using libxml2. This is a result of not being able to find a suitable tool that did what I wanted. Originally I was going to hack this into mutt directly, but it seems much more logical to roll this into some sort of fetchmail-like tool for doing the initial fetching/parsing/sorting. Then I can dump this stuff straight into mbox format and pick it up through mutt that way (since I can also add the mailboxes directly and get notification on updates, while keeping the fetching tool running in the background). I suppose another alternative would be to hack this into fetchmail directly, but the tools are different enough that this probably isn't an overly useful approach (not to mention the blatant disregard for sane headers).

Today was mostly PCI cleanup day, as the vast majority of the day was spent cleaning up most of the PCI mess for sh on 2.6. Managed to hunt down a rather obscure bug with the PCI auto code (cloned from MIPS, whhich cloned it from PPC, etc.), which was causing the dreamcast BBA to not respond properly. This ended up being a problem when poking the mem BAR to read its size and then writing out a new auto configured value for the BAR. As a workaround, we have to write back the pre-configured address that it comes up with after being powered on. It's not exactly obvious why this is the case, especially given the fact that the BAR value isn't even the same as the address range that we use for accessing the board later on. Presumably a good chunk of this still needs further debugging. The odd thing is that this doesn't end up being a problem for the I/O BAR, but even when using PIO for device programming, the MAC still comes back garbled. Given the fact that some similar issues are showing up on 7751R, I suspect there's still something else that needs fixing. Most annoying.

On another note, it looks like the rest of my sh updates made it in in time for 2.4.22-rc1, which means that -rc1 should now be in pretty good shape for both sh _and_ sh64. This is definitely good, since the sh stuff was way out of sync for way too long. This should at least cut down on how much time I'll have to spend maintaining the 2.4 stuff. Now it's just 2.6 that needs more attention.. particularly for sh64, which I still need to port *grumble*.

Yesterday was mostly spent working on a new API for the SH-4 store queues. This ended up going pretty well, except now there's still some address translation issues to work out. Namely, we have to have an implicit mapping from the store queue virtual address to the associated physical address. Doing this by hand works fine, but then we lose the mapping when the TLB is flushed. Alternatives here are either wiring the TLB entry (which also entails moving the rest of the SH stuff over to array access of the UTLB), or putting together a pte and shoving it in the page tables by way of update_mmu_cache() or something similar. Wiring the entry is the best way to go if we end up using the queues quite heavily at a given point in time, since we can keep the translation around and survive a context switch which may flush the TLB on an ASID counter wrap around. However, if we aren't using the queues, it makes no sense to keep the TLB entry locked down. As such, I'll probably have to go with both methods, or some sort of hybrid between them. Will have to look into this more later.. back to profiling the TLB flushing overhead for now, and then off to start implementing array access and porting over my sh64 tlb interface for doing mostly the same thing.

Put off hacking on the DMAC stuff, since it still isn't appealing any way you want to look at it. I'll probably get to this later once I'm done with some other things, or when I get bored enough to want to actually look at it again.

Spent a few hours being blinded by my IRC client while hunting down various new fansubbed anime. As anyone who has done this before in a console IRC client can tell you, it's not pretty. Thankfully, ripping out mIRC color support made things somewhat more bearable, but not much. Now if only usenet/freenet/<insert random buzzword compliant network of the month here> were more practical for this sort of thing.. although it does seem that BitTorrent is slowly making inroads here, so perhaps this won't be such a headache in the future.

While letting that and BitKeeper play the "lets see who can trash my dialup the most" game, I actually managed to get some work done. Spent the last couple of hours beating more of the sh boards into submission for the 2.6 tree, which mostly entailed ripping out huge chunks of I/O routines which we now deal with through generic wrappers that are filled in at run-time instead. This has made a large number of board-specific code a lot cleaner, and generally smaller. Now it's time to start gutting useless hacks to appease gcc 2.x, since for one, sh is hopelessly broken there, and two, we can't even rely on it to build the kernel reliably anymore. At least this will clean up some of the machvec and syscall code some.

2 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!