Where can we find the updated mod_virgule code?

**SPOJ**

Phew. This took a lot of work. I exchanged the first place with the guy that's currently second several times until finally changing algorithms and getting to the first place with a decent margin.

Changelog, *ex posto facto*:

- 0.1: trial division followed by Fermat primality test.
- 0.2: Rabin-Miller.
- 0.3: accelerated a bit multiplication followed by
division operations in the modular exponentiation code (from
now on referred to as
`expmod`) with x86-specific inline assembly (it has 32-bit -> 64-bit multiplication and a 64-bit -> 32-bit division and remainder instructions). - 0.4: found a "cheat" that allowed me to go back to Fermat test, at a big performance win.
- 0.5: lots of micro-optimizations. Used a sliding bit
window in
`expmod`. Got to the first place for the first time. - 0.6: realized that the only way to get faster would be
to eliminate the division operation in
`expmod`. Replaced the division with multiplication by the inverse (some call this "Barrett reduction"). Was a teeny weeny bit faster on my machine, but slower at spoj. The reduction involved a 64-bit multiplication (three multiplication instructions on 32-bit x86). :/ - 0.7: Montgomery exponentiation!

For future reference, here's a neat trick to find the multiplicative inverse of a number modulo 2^k, lightning fast (this was needed in the pre-calculation phase of Montgomery). First, two facts:

- a^-1 = y*(2 - a*y) (mod 2^k), where y = a^-1 (mod 2^{k-1}). We can use this fact alone to derive a moderately fast algorithm to get a^-1 mod 2^k.
- if q = 2^{k/2}*a + b (with b odd), then q*(2*b - q) = b^2 (mod 2^k), so q^-1 = (2*b - q)*((b^2)^-1) (mod 2^k).

To get q^-1 (mod 2^k), with k even, first get (b^2)^-1 from a table (for e.g. k=32 it will be a small table) and then use fact #2. With k odd, do that for k-1 and then use fact #1.