More bit-twiddling intrinsics
Today I pushed the changes for 6823354 which adds intrinsics for {Integer,Long}.{numberOfLeadingZeros,numberOfTrailingZeros}() methods. The speedups are quite good:
|
Integer | Long | |||
numberOfLeadingZeros | numberOfTrailingZeros | numberOfLeadingZeros | numberOfTrailingZeros | ||
Intel Nehalem | 32-bit | 3.18 | 3.96 | 1.36 | 1.90 |
64-bit | 3.83 | 3.74 | 2.02 | 2.17 | |
AMD Shanghai | 32-bit | 1.94 | 3.55 | 0.98 | 2.44 |
32-bit w/ lzcnt | 4.90 | - | 1.46 | - | |
64-bit | 2.52 | 3.09 | 1.86 | 3.26 | |
64-bit w/ lzcnt | 6.77 | - | 3.71 | - | |
UltraSparc T2 | 32/64-bit | 2.01 | 2.22 | 1.55 | 1.91 |
"w/ lzcnt" in the table means the numbers are using AMD's LZCNT (count leading zeros) instruction which is part of SSE4a.
The SPARC intrinsics need a hardware implementation of the POPC instruction.
Yet I haven't found a real-world application that uses these methods extensively (including bitCount), but if anyone knows one, please let me know.
Syndicated 2009-05-07 11:03:50 (Updated 2009-05-07 18:10:47) from twisti's weblog