# Older blog entries for ssp (starting at number 14)

15 Jul 2011 (updated 4 Sep 2011 at 02:06 UTC) »

Sysprof 1.1.8

A new version 1.1.8 of Sysprof is out.

This is a release candidate for 1.2.0 and contains mainly bug fixes.

Syndicated 2011-07-15 17:48:39 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

4 Jul 2011 (updated 4 Sep 2011 at 02:06 UTC) »

Fast Multiplication of Normalized 16 bit Numbers with SSE2

If you are compositing pixels with 16 bits per component, you often need this computation:

```uint16_t a, b, r;

r = (a * b + 0x7fff) / 65535;
```

There is a well-known way to do this quickly without a division:

```uint32_t t;

t = a * b + 0x8000;

r = (t + (t >> 16)) >> 16;
```

Since we are compositing pixels we want to do this with SSE2 instructions, but because the code above uses 32 bit arithmetic, we can only do four operations at a time, even though SSE registers have room for eight 16 bit values. Here is a direct translation into SSE2:

```a = punpcklwd (a, 0);
b = punpcklwd (b, 0);
a = pmulld (a, b);
b = psrld (a, 16);
a = psrld (a, 16);
a = packusdw (a, 0);
```

But there is another way that better matches SSE2:

```uint16_t lo, hi, t, r;

hi = (a * b) >> 16;
lo = (a * b) & 0xffff;

t = lo >> 15;
hi += t;
t = hi ^ 0x7fff;

if ((int16_t)lo > (int16_t)t)
lo = 0xffff;
else
lo = 0x0000;

r = hi - lo;
```

This version is better because it avoids the unpacking to 32 bits. Here is the translation into SSE2:

```t = pmulhuw (a, b);
a = pmullw (a, b);
b = psrlw (a, 15);
b = pxor (t, 0x7fff);
a = pcmpgtw (a, b);
a = psubw (t, a);
```

This is not only shorter, it also makes use of the full width of the SSE registers, computing eight results at a time.

Unfortunately SSE2 doesn’t have 8 bit variants of pmulhuw, pmullw, and psrlw, so we can’t use this trick for the more common case where pixels have 8 bits per component.

Exercise: Why does the second version work?

Syndicated 2011-07-03 12:11:20 (Updated 2011-09-04 01:42:00) from Søren Sandmann Pedersen

30 Jun 2011 (updated 30 Jun 2011 at 11:39 UTC) »
6 Aug 2008 (updated 4 Jul 2011 at 18:28 UTC) »
17 Jun 2008 (updated 4 Jul 2011 at 18:29 UTC) »
10 Oct 2007 (updated 4 Jul 2011 at 18:29 UTC) »

9 Sep 2007 (updated 4 Jul 2011 at 18:29 UTC) »
10 Dec 2006 (updated 4 Jul 2011 at 18:29 UTC) »
27 Jul 2006 (updated 4 Jul 2011 at 18:31 UTC) »
25 Feb 2006 (updated 4 Jul 2011 at 18:30 UTC) »

5 older entries...