Older blog entries for danstowell (starting at number 80)

17 Mar 2014 (updated 19 Mar 2014 at 13:15 UTC) »

How long it takes to get my articles published - update

Here's an update to my own personal data about how long it takes to get academic articles published. I've also augmented it with funding applications too, to compare how long all these decisions take in academia.

It's important because often, especially as an early-career researcher, if it takes one year for a journal article to come out (even after the reviewers have said yes), that's one year of not having it on your CV.

So how long do the different bits take? Here's a bar-chart summarising the mean durations in my data:

The data is divided into 3 sections: first, writing up until first submission; then, reviewing (including any back-and-forth with reviewers, resubmission etc); then finally, the time from final decision through to publication.

Firstly note that there are not many data points here, so for example I have one journal article that took an extremely long time after acceptance to actually appear, and this skews the average. But it's certainly notable that the time spent writing generally is dwarfed by the time spent waiting. And particularly that it's not necessarily the reviewing process itself that forces us all to wait - various admin things such as typesetting seem to take at least as long. Whether or not things should take that long, well, it's up to you to decide.

Also - I was awarded a fellowship recently, which is great - but you can see in the diagram, that I spent about two years repeatedly getting negative funding decisions. It's tough!

This is just my own data - I make no claims to generality.

Syndicated 2014-03-17 15:23:03 (Updated 2014-03-19 09:11:29) from Dan Stowell

Python scipy gotcha: scoreatpercentile

Agh, I just got caught out by a "silent" change in the behaviour of scipy for Python. By "silent" I mean it doesn't seem to be in the scipy 0.12 changelog even though it should be. I'm documenting it here in case anyone else needs to know:

Here's the simple code example - using scoreatpercentile to find a percentile for some 2D array:

import numpy as np
from scipy.stats import scoreatpercentile
scoreatpercentile(np.eye(5), 50)

On my laptop with scipy 11.0 (and numpy 1.7.1) the answer is:

array([ 0.,  0.,  0.,  0.,  0.])

On our lab machine with scipy 13.3 (and numpy 1.7.0) the answer is:

0.0

In the first case, it calculates the percentile along one axis. In the second, it calculates the percentile of the flattened array, because in scipy 12 someone added a new "axis" argument to the function, whose default value "None" means to analyse the flattened array. Bah! Nice feature, but a shame about the compatibility. (P.S. I've logged it with the scipy team.)

Syndicated 2014-02-14 08:37:38 (Updated 2014-02-14 08:38:35) from Dan Stowell

7 Feb 2014 (updated 8 Feb 2014 at 20:14 UTC) »

How to analyse pan position per frequency of your sound files

Someone on the Linux Audio Users list asked how they could analyse a load of FLAC files to work out if it was true for their music collection, that bass frequencies below about 150 Hz (say) tended to be centre-panned. Here's my answer.

First of all, coincidentally I know that Pedro Pestana published a nice analysis of exactly this phenomenon, at the AES 53rd conference recently. He actually looked at hundreds of number-one singles to determine the relationship between panning and frequency in the habits of producers/engineers for popular tracks. The paper isn't open access unfortunately but there you go.

So anyway here's a Python script I just wrote: script to analyse your audio files and plot the distribution of panning per frequency. And here's how it looks when I analyse the excellent Rumour Cubes album:

(Just to stress, this is a simple analysis. It simply looks at the spectral representation of the complete mix, it doesn't infer anything clever about the component parts of the mix.)

See any patterns? The pattern I was looking for is a bit subtle, but it's right down at the bottom below 100 Hz (i.e. 0.1 kHz on the scale): the bass tends to "pinch in" and not get panned around so much as the other stuff.

This analysis of Lotus Flower by Radiohead (by Daniel Jones) shows the effect more clearly.

This is what's generally observed, and widely known in mixing engineer "folklore": pan your bass to the centre, do what you like with the rest. Not everyone agrees on the reasons: some people say it's because the bass can cause the needle to skip out of vinyl records if it's off-centre, some people say it's because we can't really perceive the spatialisation very well at low frequencies, some people say it's just to maximise the energy in the mix. I have no comment on what the reasons might be, but it's certainly folk wisdom for various audio people, and empirically you can test it for yourself by analysing some of your music collection.

NOTE: Code and image updated 2014-02-08, thanks to Daniel Jones (see comments below) for spotting an issue.

Syndicated 2014-02-07 16:41:20 (Updated 2014-02-08 15:13:39) from Dan Stowell

Gaussian Processes: advanced regression with sounds, and with geographic data

This week I was learning about Gaussian Processes, at the very nice Gaussian Processes Winter School in Sheffield. The term "Gaussian Processes" refers to a family of techniques for inferring a smooth surface (1D, 2D, 3D or more) from a set of sampled noisy data points. Essentially, it's an advanced and mathematically very sound type of regression.

Don't get confused by the name, by the way: your data doesn't have to be Gaussian, and Gaussian Process regression doesn't always produce smooth Gaussian-looking results. It's very flexible.

As an example, here's a first pass I did of analysing the frequency trajectories in a single recording of birdsong.

I used the "GPy" Python package to do all this. Here's their GPy regression tutorial.

I do want to emphasise that this is just a first pass, I don't claim this is a meaningful analysis yet. But there's a couple of neat things about the analysis:

  1. It can combine periodic and nonperiodic variation (by combining periodic and nonperiodic covariance kernels). Here I used a standard RBF kernel plus a periodic kernel which repeats every 1 syllable, and another periodic kernel which repeats every 3 syllables, which reflects well the patterning of this song bout.
  2. It can represent variation across multiple levels of detail. Unlike many other regressions/interpolations, sometimes there are fast wiggles and sometimes broad curves.
  3. It gives you error bars, which are derived from a proper Bayesian posterior.

So now here's my second example, in a completely different domain. I'm not a geostatistician but I decided to have a go at reconstructing the hills and valleys of Britain using point data from OpenStreetMap. This is a fairly classic example of the technique, and OpenStreetMap data is almost a perfect for the job: it doesn't hold any smooth data about the surface terrain of the Earth, but it does hold quite a lot of point data where elevations have been measured (e.g. the heights of mountain peaks).

If you want to run this one yourself, here's my Python code and OpenStreetMap data for you.

This is what the input data look like - I've got "ele" datapoints, and separately I've got coastline location points (for which we can assume ele=0):

Those scatter plots don't show the heights, but they show where we have data. The elevation data is densest where we have mountain ranges etc, such as central Scotland and in Derbyshire.

And here are two different fits, one with an "exponential" kernel and one with a "Matern" kernel:

Again, the nice thing about Gaussian Process regression is that it seamlessly handles smooth generalisations as well as occasional patches of fine detail where needed. How good are the results? Well it's hard to tell by eye, and I'd need some official relief-map data to validate it. But from looking at these two, I like the exponential-kernel fit a bit better - it certainly gives an intuitively appealing relief map in central Scotland, and it gives visually a bit less blobbiness than the other plot. However it's a bit more wrong in some places, e.g. an overestimated elevation in Derbyshire there (near the centre of the picture). If you ask an actual geostatistics expert, they will probably tell you which kernel is a good choice for regressing terrain shapes.

The other thing you can see in the images is that it isn't doing a very good job of predicting the sea. Often, we dip down to altitude of zero at the coast and then pop back upwards after. No surprises about this, for two reasons: firstly I didn't give it any data points about the sea, and secondly I'm using "stationary" kernels, meaning there's no reason for the algorithm to believe the sea behaves any differently from the land. This is easy to fix by masking out the sea but I haven't bothered.

So altogether, these examples show some of the nice features of Gaussian Process regression, and, along with the code, that the GPy module makes it pretty easy to put together this kind of analysis in Python.

Syndicated 2014-01-17 07:18:48 (Updated 2014-01-17 07:29:01) from Dan Stowell

OpenStreetMap UK: what should we do this year?

As a contributor to OpenStreetMap, one thing I've been wondering recently is what sort of map data should we collect for the UK, now that the coverage has already got good. Since OpenStreetMap generally has great coverage of the UK, when you're out and about with a printed-out map and a pen, it's very rare that you can find much significant that isn't mapped already - sometimes a new street or a missing church. You could pour your time into mapping increasingly obscure things, whatever you're interested in. But what would be the most useful things to map in the UK, over the coming year? Things that are not just interesting to map but could be practically useful to people? Some thoughts:

  • Addresses. I kind of don't like mentioning this, because I find it boring to map addresses, and I'd much rather that the UK address data magically appeared from some big open-data source. But addresses are obviously really useful for so many things: routing, looking up shops, etc. Coincidentally, Simon Poole (chair of OSM Foundation) also says address collection is the thing we need, for OSM in general not just UK.
  • Postcodes. In the UK postcodes are really important for satnav routing etc. For some reason I suspect that collecting postcodes could be less mind-numbing as collecting addresses, but just as useful. See Jerry's blog about UK postcodes in OSM for an analysis of where we are with postcodes... about 3% of them. As he says, we need to do better than this - so how best to collect them?
  • Footpaths. Really important for planning walking routes, whether in the city or the countryside. We also need to mark when footpaths have steps or are otherwise no good for wheelchairs/prams. (It's also handy to know when footpaths are full-blown rights of way, or just "permissive" access.) In his speech at State Of The Map 2013, Peter Eastern mentioned that they estimated UK footpath data was still pretty incomplete. I often use OSM for planning walking routes - it has loads of footpaths that no other services have, but I do still often go walking somewhere and find new footpaths that aren't in there yet. I don't know how we could specifically push for more footpath mapping - all I will say is please help us and map walking routes :)

Some notes on other things which I'm not sure how vital they are:

  • Buildings. I know when we've been doing London mapping meet-ups, Harry Wood has mentioned that OSM's buildings coverage for London is rather patchy. You can see it on the map - there are pockets full of buildings mapped, and large pockets with none. But... is this a bad thing? What would we want buildings mapped for? I know they're useful in fancy 3D map renderings, but for more practical purposes...? I'm guessing it's not that crucial, though it might relate a bit to the address mapping.
  • Shops. It's great to have shops, restaurants, pubs and other local businesses in OSM. Once you start mapping these, though, you notice there's quite a rapid turnover - your high street probably gains/loses a shop every 3 months or so, at a wild guess. So this data is useful, but it's less permanent than all the other stuff I've mentioned so far. I'd suggest there's no point having a big push to map every shop in every high street, we just need to let the momentum build to a point where that happens under its own steam.
  • Postboxes. Again Jerry has a detailed breakdown, and says we need to map them more. Plus Robert Whittaker has some data mining tools about postbox completeness. On the other hand, is it really that urgent to map postboxes? It doesn't feel anywhere near as critical as mapping addresses, walking routes, etc. The only use case I can think of is "where's the nearest postbox?" which is rarely a critical matter.
  • GPX traces. After MapBox published their beautiful rainbow GPS map tiles which provide a lovely way to see the GPS traces contributed by the community, I noticed at least two villages where there were basically zero traces uploaded. Are GPS traces important to UK mapping? The coverage of the aerial imagery is good, and generally quite well GPS-aligned, so... do we need more GPS traces around the UK? I genuinely don't know, and would be interested to find out either way.
  • Grit bins. Something I noticed a couple of winters ago - it would be really handy to have every grit bin mapped: one day, when it's freezing cold outside, all the grit bins are hidden under a foot of snow, and you need to clear a driveway, it could be really handy. That's just one little thing that I don't think anyone has particularly focussed on, so a little call out - please map amenity=grit_bin when you see them!

I'd be grateful for any feedback on the thoughts above, including other things that could be priorities. Just one UK mapper's perspective.

Syndicated 2014-01-01 13:44:07 (Updated 2014-01-01 13:44:55) from Dan Stowell

18 Dec 2013 (updated 18 Dec 2013 at 23:12 UTC) »

SuperCollider inspired web audio coding environments

SuperCollider is an audio environment that gets a lot of things right in terms of hacking around with multichannel sound, live coding and composing the different structures you need for music.

So it's no surprise that in the world of Web Audio currently being born, various people are getting inspired by SuperCollider. I've seen a few people make pure-JavaScript systems which emulate SuperCollider's language style. So here's a list:

I think there's at least one I've forgotten. Please let me know if you spot others, I'd be interested to keep tabs.

So there are obvious questions: is this a duplication of effort? should these people get together and hack on one system? is any one of them better than the others? I don't know if any of them is better, but one thing I know: it's still very early days in the world of Web Audio. (The underlying APIs aren't even implemented fully by all major browsers yet.) I'm sure some cool live coding web systems will emerge, and they may or may not be based on the older generation. But there's still plenty of room for experimentation.

Syndicated 2013-12-18 12:06:50 (Updated 2013-12-18 17:16:00) from Dan Stowell

Fact check: is it true that 1/3 of GP surgeries fails health standards?

There was an inspection of GP surgeries that came out last week, widely reported/headlined as "one third of GP surgeries" failing basic health standards. So is it true that one third of GP surgeries fails basic standards? No, and for a very simple reason.

The Care Quality Commission surveyed 910 GP surgeries (out of 8000 total) and found failings in one-third of them. But how did they pick the surgeries to inspect? Did they do it at random? No.

"80% were targeted because of known concerns. The remainder were chosen at random."

In other words, this survey was not a survey of all our surgeries, but of the ones that people were already suspicious about. In a sense, it was a survey of the worst of the bunch. When you pick your targets like this, it makes no sense to generalise the result to the rest of the GP surgeries.

What's the true number? Well we don't know. If we make the assume that all the dodgy surgeries were included in the batch of 910, the percentage would be 3.8%. It would be good luck to capture all the dodgy surgeries, though, so probably a bit higher than that. Still something to be concerned about, of course - but no crisis. The UK is still internationally leading in quality and cost effective healthcare so there's no need to panic...

Syndicated 2013-12-17 03:11:39 (Updated 2013-12-17 03:13:11) from Dan Stowell

1 Dec 2013 (updated 1 Dec 2013 at 18:10 UTC) »

Offline cacheing map tiles in Firefox OS, using IndexedDB

I've been at the OpenStreetMap hack weekend. [Photo]. One of the things I wanted to explore was getting offline maps working on my Firefox OS phone.

Firefox OS is not like iPhone or Android - every app has to be written in pure HTML and JavaScript, which means that for full-featured phone apps / web apps we need to use the fancy extra tools that Mozilla and others are providing to beef up things beyond the usual web features. One of these is IndexedDB which allows a HTML page to store objects longish-term. So let's use that for cacheing map tiles:

The demos are just online maps, so they key thing is testing them with internet access on, and then trying again with internet access off.

I had to make a small tweak to the Leaflet slippy-map code in order to patch it to grab map tiles from my cache-or-live service. I don't know if there's a more "sustainable" way to do this...

Is this any different from standard browser cacheing? Well yes and no. It lets us have tighter control over what we, as "an app", do or do not remember. We get to say that we want to remember map tiles but we don't necessarily need to remember logo images, for example.

Syndicated 2013-12-01 11:29:53 (Updated 2013-12-01 12:53:45) from Dan Stowell

28 Nov 2013 (updated 28 Nov 2013 at 21:11 UTC) »

The UK Government Response to the BIS Open Access Review

The UK Government's Department of Business Innovation and Skills recently published a review of Open Access research publication. It made a number of really good recommendations, including de-emphasising the "gold" (pay-to-publish) route, and stepping back from the over-extended embargo periods that the publishers seem to have got RCUK to agree to.

The Government has published its response to this review. What is their response? basically, "Nah, no thanks."

  • The review said "RCUK should build on its original world leading policy by reinstating and strengthening the immediate deposit mandate in its original policy". The Government said "... timely OA ... through mutually acceptable embargo period". There's nothing "mutual" about the choice of embargo period, given that many academics have been asking for the position that the government has just explicitly rejected.
  • The review said "We recommend that the Government and RCUK revise their policies to place an upper limit of 6 month embargoes on STEM subject research and up to 12 month embargoes for HASS subject research, in line with RCUK’s original policy published in July 2012". The Government said "A re-engineering of the research publications market entails a journey not an event" or in other words "No". Note the vacuousness of their statement. It could easily have been "an event", and the committee wasn't even recommending the total removal of embargoes.
  • The review said "We recommend that the Government and RCUK reconsider their preference for Gold open access during the five year transition period, and give due regard to the evidence of the vital role that Green open access and repositories have to play as the UK moves towards full open access." The government said "Government and RCUK policy with an expressed preference for Gold OA [sets the direction of travel]". This is fair enough as a sentiment, but unfortunately the government response also included the publisher's favourite "open access flowchart" which clearly tells researchers that gold open access must be chosen if available. Note that this is not a consensus or objective reading of current RCUK rules, let alone the future. The government is showing no signs of backing away from this weird new competitive problem they're creating right now, where researchers in universities have to compete with their own colleagues (studying completely different disciplines) for the tiny and certainly insufficient institutional pay-to-publish funding pots.
  • The review in fact agrees with the position I just stated: "RCUK’s current guidance provides that the choice of Green or Gold open access lies with the author and the author’s institution, even if the Gold option is available from the publisher. This is incompatible with the Publishers Association decision tree, and RCUK should therefore withdraw its endorsement of the decision tree as soon as possible, to avoid further confusion within the academic and publishing communities." The government says "As discussed above the UK OA Decision Tree sets out clearly the direction of travel." Arrrrrrrrrrrrrrrgh, are you not even listening?

I could go on. Suffice to say that I was so encouraged by the sane voice of the BIS review; yet the government's response appears to be a solid and completely shameless "not for turning".

Syndicated 2013-11-28 14:54:38 (Updated 2013-11-28 15:17:23) from Dan Stowell

10 Nov 2013 (updated 10 Nov 2013 at 22:11 UTC) »

Embedded acoustic environments (Barry Truax)

This weekend I was at the Symposium on Acoustic Ecology. Interesting event, but here I just want to note one specific thing from Barry Truax, who gave a keynote as well as a new composition.

Truax has a pretty nice way of talking about acoustic structure at different scales. As a composer he's been an important proponent of granular synthesis, and as a teacher his way of talking about sound meshes rather neatly with the granular approach.

One issue he brought out in his keynote is how, over the past 100 years, our ways of listening have changed, and our sophistication as listeners. He's not just talking about professional or arty listeners, but all of us. In the past, our "acoustic environment" was pretty much synonymous with our immediate environment more generally. This (Truax argues) is one of the reasons that people in the 1910s seemed to be fooled by the sound of an opera singer on a phonograph record, a sound which to us comes across as a feeble imitation. But recording technologies have allowed us to abstract the acoustic environment from our immediate environment: we now have a felicity with embedded acoustic environments that is so sophisticated as to be casual. We know how to relate to the person sitting next to us on the tube listening to headphones; we understand the voices in the radio, why they have different reverb from the room we're sitting in, and why they can't hear us; we understand what is being hinted at when the narrator in a radio play doesn't seem to be in the same room as the characters.

Later that night, at the concert, there was a great example of embedded acoustic environments. We were listening to a multi-channel electronic concert, in a huge ex-ship-building shed ("No. 3 slip") in a dockyard. This hangar allowed plenty of sound in from outdoors, and so as the music played, it was... ahem... "augmented" by various other sounds: the dockyard's big clock chiming the hours; the firework-like sounds of artillery fire in a naval training ground; and also a heavily-echoed "Call Me" by Blondie!

I don't believe any of this was deliberate ;) but it's a great example of an embedded acoustic environment - and furthermore, the challenge that it presents to electronic composers. Composers need to be aware that the environment they're constructing will be usually played back over some speakers which don't form the entirety of the acoustic environment, but a sub-system of it, for the listeners. (Is this challenge equivalent to a demand to always be site-specific? Not quite, but related.) Some of the composers last night I think did not rise to this challenge, and it showed. But some of them did. Barry Truax was premiering a new piece called "Earth and Steel", written specifically for this place, and it worked great, it was very affecting.

Syndicated 2013-11-10 10:56:48 (Updated 2013-11-10 16:49:12) from Dan Stowell

71 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!