2 Jan 2011 JoeNotCharles   » (Journeyer)

The Best Albums of 2010, part 2: Making a Ballot

Yesterday I linked to 9 lists of the Best Albums of 2010, from magazines, web sites, and one radio show, and promised to explain how we distilled those lists into one canonical Top 25. The first step is to turn each list into a preferential election ballot.

The standard First Past the Post voting system used in pretty much every political election in North America is… not very good. Its one virtue is that it’s simple to explain: you can vote for one, and only one, candidate, and the candidate with the most votes wins. The problem is that this often makes it really tough to decide who to vote for. To get your preferred result, you need to consider a whole host of things other than “how good this candidate is”. You need to vote tactically. The obvious example is when you don’t think your favourite candidate can win – do you vote for them anyway, or switch your vote to your second choice? (It’s a tough decision if your candidate has ALMOST enough support to be viable.) Another example is if you mainly want one candidate to LOSE – you still need to pick one of their opponents to vote for. There are more serious problems in an election like the Canadian Parliament or US Electoral College, where a bunch of individual elections each elect one winner, and then the team with the most winners gets the grand prize, but let’s consider only elections where everyone votes directly for a single winner, like a city mayor or state governor.

Every voter can, in theory, rank all the candidates in order of preference (although if they don’t have strong opinions or aren’t very informed, that ranking may just have their favored candidate in first and everyone else tied for last…) Judging whether a voting system is any good basically involves measuring how much a voter’s “true preferences” contribute to the election’s outcome. First past the post is a poor system because it forces voters to leave out most of the information about their preferences, and in fact encourages them to “lie” by listing a candidate who isn’t actually their first choice. A better system would give each voter a more complicated ballot in which they could list all their preferences. Say, by putting a 1 next to their favoured candidate, a 2 next to their second candidate, etc. Or with a computer touch screen which removes each candidate’s name as the voter touches it, and keeps track of the order the voter chose them in. Or, although this has obvious practical problems in a large election, by writing down each candidate’s name in order on a sheet of paper (which is exactly what we have with our 9 best-of-2010 lists!) There are many physical ballots we can imagine that could record a voter’s complete preferences. But in order to study or simulate a voting system, it’s helpful to have a standard notation. Whoever counts the votes – human or computer – can start by translating each physical ballot into this standard notation.

The notation commonly used is to give each candidate a symbol (such as the first letter of their name or party), and for each ballot, list the symbols for each candidate on one line, from the most liked on the left to the least liked on the right. The symbols are separated by “>” to show that the candidate to the left is preferred to the candidate on the right, or “=” to show that they’re tied. (And, if not all candidates are listed, all the unlisted candidates are assumed to all be tied for last place.)

So, with the Canadian political parties Conservative (C), Liberal (L), New Democrat (N), and Green (G) – assume we are not in Quebec so the Bloc Quebecois is unavailable – we have examples like the following:

The extreme right-winger would like the Conservatives to win, and at all costs wants the NDP and Green Party to lose – they’d vote “C > L > N = G”.

The extreme left-winger is the opposite – “N = G > L > C” (assuming they can’t decide between NDP and Green).

Or you may have an NDP supporter who thinks the Green Party is also a good left wing choice, but that the Liberals are no better than the Conservatives – “N > G > C = L”.

Or an NDP supporter who thinks that the Green Party are a bunch of upstarts who can’t be trusted – “N > C = L > G”.

Or any weird and wonderful combination of these.

So, we have 9 lists of songs, with over 200 songs between them. The first thing we need to do is get a symbol for each song, and make sure we can turn that symbol back into a name when it’s time to output the results. Save each list into a text file (with just the songs, one per line – no rank numbers). This may take some cutting and pasting. Then go through each list and make sure that each song is spelled EXACTLY the same each time it appears (the Unix commands sort and uniq may help with this).

Now we want to read all the text files and turn them into two data structures: a map of Candidate Number to song name, and a set of ballots in the above format. Since I’m using pyvote, a Python program to count votes, the natural way to do this conversion is to write another Python script.

The script will read each file given on the command line and write out two files. “candidates.txt” is the complete list of all candidate songs, in order of Candidate Number – given a number, the song name is on that line number of candidates.txt. “ballots.txt” is the list of ballots – so for our best-of lists, it will be 9 lines long. Since WordPress removes indentation in code blocks, which is fatal for Python code, I’ve put the script on pastebin:

The Python Script.

It’s pretty simple – read each line of each file, generate a number for the song, and save the name and number in a map called “candidates”. For each file, write the numbers of each song as a string joined by ” > ” to “ballots.txt” – since none of these lists have ties, we don’t need to worry about “=”. Finally, sort the candidate map by number, and write each name in “candidates.txt”.

Save the Python file as “convertLists2Ballots.py” and run it with “convertLists2Ballots.py <list of text files>”.

The candidates.txt file this spits out isn’t very interesting, but ballots.txt looks like this:

41 > 1 > 12 > 35 > 11 > 46 > 52 > 70 > 77 > 122 > 6 > 94 > 9 > 50 > 26 > 20 > 130 > 13 > 132 > 19 > 134 > 136 > 80 > 144 > 42 > 16 > 40 > 67 > 21 > 8 > 28 > 22 > 31 > 60 > 29 > 2 > 91 > 100 > 172 > 176 > 93 > 178 > 179 > 36 > 184 > 95 > 193 > 66 > 196 > 34

1 > 22 > 115 > 30 > 4 > 41 > 16 > 5 > 46 > 12 > 11 > 21 > 3 > 38 > 7 > 42 > 66 > 50 > 19 > 133 > 73 > 137 > 2 > 143 > 53 > 82 > 52 > 26 > 68 > 154 > 158 > 74 > 166 > 54 > 92 > 31 > 77 > 8 > 173 > 95 > 37 > 93 > 181 > 60 > 183 > 9 > 192 > 35 > 6 > 97

69 > 1 > 3 > 2 > 30 > 42 > 55 > 43 > 44 > 74 > 5 > 13 > 105 > 125 > 7 > 19 > 129 > 51 > 12 > 6 > 8 > 36 > 20 > 4 > 26 > 148 > 23 > 150 > 45 > 82 > 18 > 163 > 109 > 10 > 33 > 169 > 170 > 15 > 174 > 14 > 97 > 110 > 92 > 50 > 39 > 46 > 191 > 89 > 63 > 98 > 62 > 108 > 202 > 204 > 207 > 99 > 9 > 88 > 212 > 37 > 24 > 21 > 61 > 219 > 17 > 16 > 222 > 35 > 226 > 102 > 40 > 112 > 68 > 25 > 232

13 > 21 > 69 > 116 > 70 > 118 > 62 > 79 > 48 > 32 > 71 > 26 > 5 > 56 > 3 > 128 > 61 > 88 > 11 > 83 > 1 > 43 > 142 > 145 > 9 > 34 > 51 > 44 > 2 > 41 > 159 > 164 > 63 > 60 > 18 > 99 > 24 > 23 > 175 > 30 > 4 > 35 > 7 > 52 > 186 > 14 > 190 > 33 > 55 > 199 > 200 > 96 > 203 > 59 > 100 > 208 > 20 > 210 > 211 > 213 > 214 > 216 > 90 > 108 > 37 > 220 > 223 > 224 > 227 > 228 > 229 > 230 > 36 > 112 > 233 > 65 > 237 > 238 > 239 > 25 > 242 > 19 > 244 > 246 > 107 > 64 > 39 > 248 > 109 > 110 > 114 > 67 > 252 > 253 > 256 > 257 > 258 > 40 > 38 > 261

10 > 6 > 1 > 2 > 72 > 8 > 19 > 29 > 121 > 17 > 4 > 7 > 15 > 23 > 27 > 47 > 3 > 75 > 45 > 12 > 18 > 78 > 16 > 87 > 106 > 5 > 33 > 28 > 14 > 11 > 49 > 162 > 165 > 167 > 22 > 113 > 57 > 58 > 13 > 91

10 > 11 > 101 > 1 > 72 > 4 > 47 > 22 > 58 > 2 > 54 > 123 > 124 > 17 > 5 > 127 > 3 > 53 > 29 > 28 > 15 > 49 > 141 > 146 > 75 > 7 > 149 > 89 > 31 > 85

10 > 1 > 3 > 25 > 17 > 14 > 4 > 119 > 48 > 6 > 9 > 18 > 2 > 7 > 76 > 5 > 104 > 84 > 24 > 20 > 44 > 139 > 43 > 107 > 147 > 32 > 8 > 86 > 34 > 81 > 51 > 59 > 12 > 27 > 36 > 83 > 96 > 171 > 79 > 33 > 65 > 177 > 15 > 40 > 61 > 187 > 188 > 111 > 198 > 56

1 > 11 > 5 > 117 > 71 > 2 > 4 > 25 > 3 > 10 > 16 > 14 > 90 > 126 > 73 > 64 > 6 > 49 > 8 > 7 > 135 > 140 > 102 > 9 > 78 > 12 > 80 > 151 > 153 > 155 > 157 > 161 > 37 > 15 > 23 > 168 > 67 > 13 > 31 > 53 > 98 > 28 > 180 > 24 > 185 > 54 > 189 > 57 > 197 > 113 > 63 > 201 > 19 > 205 > 206 > 22 > 209 > 27 > 58 > 17 > 215 > 217 > 218 > 18 > 38 > 29 > 221 > 225 > 32 > 68 > 39 > 94 > 62 > 231 > 234 > 235 > 236 > 66 > 240 > 241 > 30 > 243 > 245 > 106 > 103 > 20 > 247 > 249 > 111 > 101 > 104 > 250 > 251 > 254 > 255 > 105 > 259 > 114 > 260 > 21

10 > 2 > 6 > 15 > 3 > 4 > 9 > 120 > 20 > 27 > 1 > 8 > 45 > 24 > 17 > 14 > 13 > 103 > 131 > 65 > 57 > 138 > 81 > 38 > 25 > 59 > 39 > 5 > 152 > 76 > 156 > 160 > 64 > 18 > 16 > 56 > 55 > 85 > 23 > 84 > 86 > 47 > 32 > 182 > 21 > 34 > 194 > 195 > 48 > 87

Incomprehensible to a human, but pyvote will be able to read this and then try lots of different voting systems on it. Tomorrow I’ll show how to make this happen.

(As an aside, instead of a bunch of text files, it’s pretty common to get data you want to turn into votes as a spreadsheet. To process this with Python, save your spreadsheet in CSV format – that’s “comma separated value”, a simple text format that can’t handle formatting or formulas – and then read it into Python using the csv module.)


Syndicated 2011-01-02 06:27:42 from I Am Not Charles

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!