-
Still here, still alive, still kicking.
Recap: After having dinner with a colleague, I realized why I couldn't finish my thesis: Having followed my supervisor's advice (read: orders) to take his approach to analysing my data, the results were indeterminate. After 3 years and $200K spent on equipment, salaries, etc, and the only answer I could give was 'we can't find an effect using this approach', I finally accepted that I couldn't submit a thesis that could be easily co-opted for political and other ends. Back to the drawing board.
After my NT box crashed (boom) and took my research db (and two years of my work) with it, I had little choice but to redo all my calculations from scratch using postgres. Relearning everything I thought I knew (from originally using access, natch) is somewhat painful, the kind of 'character building' activity that our parents told us about years ago, but we never truly believed. so far, I've successfully reimported the three years of data from our study catchments, and have ~80% of the data prep queries working properly. The remaining 20% are touchy in that they depend somewhat on the nature of the data returned by the earlier queries, so it's time consuming. On the other hand, I have 1 kloc of beautifully commented SQL. :)
My biggest criticisms of postgres are the lack of left outer join (tho I know it's coming in the next couple of months) and lack of native crosstabulation capabilities (TRANSFORM / PIVOT predictes), crosstabs being queries having aggregation at both the row and column level. That said, some reasearch and more error and trial later,
crosstabulation queries are possible !!!
-
SELECT lakename,
-
COUNT(CASE WHEN year = 1996 THEN depth ELSE NULL END)
AS SY1996,
COUNT(CASE WHEN year = 1997 THEN depth ELSE NULL END) AS SY1997,
COUNT(CASE WHEN year = 1998 THEN depth ELSE NULL END) AS SY1998
GROUP BY lakename
;
lakename|sy1996|sy1997|sy1998 --------+------+------+------ C12a | 0| 0| 1 C23a | 1| 0| 1 C24a | 7| 5| 7 C29a | 2| 0| 3 C2a | 4| 3| 5 C40a | 20| 18| 16 C44a | 11| 9| 10 C48b | 5| 1| 0 C9b | 11| 11| 7 FBP10a | 5| 8| 3 FBP9b | 0| 6| 11 FP15b | 7| 9| 5 FP24a | 4| 4| 6 FP27a | 14| 15| 9 FP2a | 5| 5| 3 FP30a | 6| 7| 6 FP31a | 8| 7| 5 FP32a | 5| 5| 4 N106a | 4| 4| 2 ...
-
TRANSFORM count(depth)
SELECT lakename, year
FROM crosstab
GROUP BY lakename
PIVOT ON year
;
... But even SQL Server 7 doesn't do that either.:)
Still, it works, at least on this limited scale. I'd hate to have used this approach on some species diversity work I did awhile ago, on a few massive datasets (well, massive by ecology standards :) of 50-200k rows, and about 50-80 different species of interest (usually the species names are used as the column heads, so that means a table with with about 50-80 cols and x rows (one row / plot / date). Something's gonna have to give here, cuz I may have another one to do after I finish. Something in either Perl or Python to build the table. Hmmm. <rubs hands>
And in the meanwhile, back to our regularly scheduled data work up.