Older blog entries for salmoni (starting at number 579)

5 Jun 2008 (updated 5 Jun 2008 at 12:17 UTC) »

I have finished a new Python module which is designed to import SPSS data files as a Python object. It seems to work quite well with the data sets I have. Not all functions are enabled yet: some of the type 7 records are not working yet, but for some I have to reverse engineer the solution and for others I need new data sets that use the subtype records.

When it's a bit more solid, I will probably release it under the AGPL.

llkc - I guess I didn't explain well. My idea isn't an interactive debugger, though there are elements of that in it. The best thing is to produce the code.

In other news, I should (hopefully, all being well!) be doing some consultancy soon. I'm not sure of the size of the job, but it sounds like a good one (ie, interesting and a challenge).

1 Jun 2008 »

I noticed that Google have opened their appengine up completely now. Signups are ongoing here and it doesn't seem as though there is any limit to the number of users this time. Prices are also available here.

31 May 2008 »

Perhaps this has been done, but I have been thinking about my ideal IDE for Python.

I like editors and have tried many. I also like interactive interpreters and have tried many. But my issue is that I often have to have both running at the same time (and yes, I know there are editors with interpreters running in them at the same time, but that's not what I'm thinking of).

But what about a dynamic editor/interpreter?

It sounds fanciful and I'm just beginning to think of the architecture but here is how it would work with Python.

You type some code in. It works interactively, so only executes when a block is entered. Or it may not. Each code block has a flag next to it that when activated causes the code to be marked as executable. When executed, only that code is run.

Ok, still fairly basic.

But what about if the user could also interactively run code separately from the stored blocks. So if I type in a large program, I can still type 'print "hi"' in the middle, click it, and it and only it will run.

But even better: what about if I can execute the code block by block or even line by line?

Again, this is not totally revolutionary. But what if you could change existing code and cause the program (assuming that it's still running and waiting for the user to enter the next code) to step back to a previous state? And then run up to the end with the new code?

And then when saving, the user can save a working version (with the interactive bits in place) and a "parade" version with all the interactive bits taken out.

I'm not sure this has been done (though if anything can, it's probably Eclipse or Emacs). I have probably described this idea poorly, but I think it could be a good thing that unifies the best of editor/IDE operation with the best of interpreters operation.

I'll have to work on a prototype and test it myself to see if it works.

29 May 2008 (updated 29 May 2008 at 09:13 UTC) »

Python Consulting

This is an announcement that I will be doing Python consulting from now. My expertise covers Python, wxPython, NumPy and SQLAlchemy; and the primary area of my work is on numeric analysis / statistics, though of course you get a PhD in human-computer interaction thrown in if you want interfaces made.

If anyone has any Python work they would like help with, I can offer a discount on open source code. I can work internationally as long as requirements can be sent electronically. The best way to contact me as salmoni - at - gmail.com

Apart from that, all is well here in the Philippines! The coding on the new project is going well and I'm considering farming off the database viewer/importer tool as a separate component for database management. I'm not exactly sure what functionality would be necessary for this, but suffice to say that the basics should be easy to implement (and the middling / advanced stuff a nightmare!).

Factorial ANOVA of large sets

I've also solved all the problems concerning factorial analysis of variance for extremely large datasets (ie, those too large to fit into memory). I will crack on with this code now to get it done and to make an industrial quality heavy-weight data analysis tool. This will be open sourced in time, after testing anyway. The real problems that I have are a) getting hold of an environment (ie, a machine with a massive database on it), and b) getting comparison results, though SAS should be able to deliver on this. I understand that SPSS will face problems if the data are too big for memory; but SAS can work around this just like my code can.

Moore's Law makes this of decreasingly utility; but it's nice to have software that you know can handle any task.

Article

I've also enquired about submitting an article to a Python journal about how to use the code module to implement an interactive interpreter and embed it within a Python program. This comes from work on the statistics program where I wrote one for quick debugging and found it so good that I extended it a little to be used as a permanent tool.

One problem we found is that when declaring and using a variable, a user would have to write:

x = newvar()

newvar("x") x.data([3,4,5,6])

It would make more sense (to novices) to write

newvar(x) x.data(3,4,5,6)

It does this now. What I did was override the code.InteractiveInterpreter.showtraceback method to catch NameErrors (which are risen when x is sent to newvar because x doesn't exist). Then the code works out the command and sends it again to the newvar method but with the x in quotes. It's minor stuff but less annoying to users.

And if a database has awkward variable names that are not valid variable names in Python, they cannot be used: so I added a catcher to showtraceback that catches AttributeErrors and tests to see if a string has been issued with a program method:

"Variable 1 (2000)".variance()

This would never work normally within Python without overriding the string class (which is another possibility). However, the catcher above can catch this attribute error and redirect the 'variance()' bit to the proper variable definition.

All this just means that the application is beginning to work around its users instead of demanding that they work around it.

I also added lots of alternative names for descriptive tests so:

x.samplestandarddeviation() x.standarddeviation() x.stddev() x.stdev() x.sd()

all call the same function. This helps because when I've used a new statistics program, I have to find out the exact name for the functions. This way, I don't have to remember which one: I just pick a common one, and away I go! :-)

27 May 2008 »

I spent the weekend wrestling with factorial ANOVA code which was nice and fun. All seems to work alright but there is still some finishing and of course testing to do before it's anything like releasable. Plus I need to work on how to work things like post-hoc tests and simple effects for when a significant effect is observed. Lots of fun!

22 May 2008 »

I've been having lots of fun working through factorial statistics code. Actually, I'm not being sarcastic because I've spent so much time preparing the data ready for analysis (that's the part that takes the most work), that the statistics code itself is a nice easily stroll. And curiously, it's fun. The preparation stage doesn't provide so much in the way of motivation because it doesn't really do anything from an end-users perspective. But the stats code can analyse factorial analysis of variance of arbitrary factors and that is a rather nice thing indeed. It actually does something!

In other news, the naming of the business (branding etc) is coming to a head and hopefully we should have formed our company soon and bought all the URLs etc. We had a blitz last weekend and managed to get some ideas that I thought were rather good. I won't mention them here because of squatters, but when we're ready, I will be able to announce them.

And once I have announced them, I can make a public release of the software! Yay!

The above factorial code won't be in it though as it's not anywhere near tested (though I should just add it anyway for users to look at and shake their heads at). The problem is that I like to release things that actually work properly. That goes against the principles of "release early, release often" mantra so I should learn to lose control and just get code out there.

Thanks to everyone from here who completed the questionnaire I linked to in my last diary entry. The information has been tremendously useful! And as I promised, the code will be open source code, probably under the AGPL (which ever one we choose - apparently there are two, both of which are very similar).

17 May 2008 »

Statistics software questionnaire

If anyone uses statistics software of any sort (whether Excel, SPSS, R, SAS or anything), I would be grateful if you could help by completing a survey we have put up at SurveyMonkey. It shouldn't take longer than a few minutes to complete and there are only ten questions. Feel free to expand upon your answers if possible.

Thank you very much in advance to those who complete it.

btw, it's all for the open source software that we're producing. We're stuck for a name now.

14 May 2008 »

The market research has been going well and in our favour. We used a survey and interviews (blind for the first half to get opinions about the field and open the second half to get opinions about our product). We certainly have a strong market here.

And the development is going well though I have been stuck a lot on importing data. However, the tool is extremely flexible and useful - and it's great for merging data from different sources into one unified dataset which is something I think advanced users will appreciate.

I have also been trying to work on the interactive results without too much luck and have instead asked the opinions of the very knowledgeable people on the wxPython mailing list. They seem to come up with extremely helpful answers, but why not ask here?

My situation is this: I have a wxHTML frame displaying HTML results. These need to be dynamic - users will be able to select options that will mean the HTML needs to be changed and then redisplayed. The best way I can think of dealing with this is just to get the HTML (stored in a temporary memory file system) and remove the old code and insert the new code in its place and then re-display it. Does this seem like too much of a bad hack?

12 May 2008 »

wxPython Sizers

I just wasted most of a day trying to sort out the data import GUI and problems with sizers. It was quite frustrating, but I managed to get most of the problems sorted out finally. It is now connecting to various databases and showing a sample of data which users can browse and select what they want to import from.

Oh, and it imports the variables too which is good. It is so nice when problems eventually finish. I have lots more work to do tomorrow (csv importing - I wrote my own csv module to deal with little problems like missing data in the middle of a row) but I am also going to my wife's family's village for a fiesta. It's been raining all day, so here's hoping the weather improves. Here's a picture of the village in sunnier times.

Either way, the work is coming along really nicely now. The project is not yet 50% finished (my estimate), but it already imports data from databases, allows a range of operations on them, and can produce even complex descriptive analysis. It's looking good so far.

11 May 2008 »

I've been busy playing with Python 3K at home. It seems to be nice though I haven't dug deep enough / far enough to notice real changes outside of the 'print' statement changes.

In other work, I'm managing to tame wxPython again and am producing a consistent and simple interface for importing data from different sources (databases, spreadsheets, text files). It could form the basis of a data manager, but it's all for the statistics program which is itself coming along.

The program has an interactive interpreter which is fun: it's all based on Python's 'code' module and I've organised it so that users can import data with awkward field names (like: 'Variable (1) & Variable (2) mixed'), and they can still be used on the command line, thus:

Variable (1) & Variable (2) mixed.mean

Not a big change, but it's one less thing to explain to demanding users. The work on the main GUI is still ongoing (choosing a test is the hardest thing) but we're getting there.

The thing will be released under the Affero GPL license so it's even relevant to this site.

In administration things, I managed to get some more marketing research done (all promising but lots of things to think about), and the company is getting closer to being officially founded. It's all very exciting stuff.

I have a questionnaire here if anyone feels like completing it: it should take about 5-10 minutes and concerns people who use computers to perform statistical analysis. I cannot offer any money in return (we have zero investment - any offers will be carefully considered!), but it would be extremely helpful in getting open source to the top.

The questionnaire is at Survey Monkey. TIA to anyone who completes it.

570 older entries...