<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for monniaux</title>
    <link>http://www.advogato.org/person/monniaux/</link>
    <description>Advogato blog for monniaux</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Sun, 7 Sep 2008 05:54:34 GMT</pubDate>
    <item>
      <pubDate>Thu, 19 Jan 2006 14:06:03 GMT</pubDate>
      <title>19 Jan 2006</title>
      <link>http://www.advogato.org/person/monniaux/diary.html?start=0</link>
      <guid>http://www.advogato.org/person/monniaux/diary.html?start=0</guid>
      <description>Compare the following three C expressions (a and b are doubles and we are on a IEEE-754 compliant system):

&lt;p&gt;   double x = ( (a &amp;lt;= b) ? a : b );
  double y = ( (a &amp;lt; b) ? a : b );
  double z = ( (a &amp;gt; b) ? b : a );

&lt;p&gt; Obviously, for the purpose of comparing real numbers (and also infinities), these functions all perform the same task: compute the minimum of a and b.

&lt;p&gt; Are they equivalent? No, if you consider strict compliance with the standards.

&lt;p&gt; Consider a "not-a-number" value (NaN):
  double a = 0./0.;
  double b = 1;

&lt;p&gt; Those expressions respectively yield 1, 1, NaN.

&lt;p&gt; Consider the different +0 and -0 values:
  double a = 0.;
  double b = -0.;

&lt;p&gt; Those expression respectively yield 0, -0, 0.

&lt;p&gt; A compiler that strives to comply with standards, such as gcc, will emit different code for those three functions.

&lt;p&gt; On AMD64 (or x86 in SSE mode), gcc compiles the expression for x into a sequence involving a SSE comparison operator and some bit masks (implementing ? : without using jumps). But (with optimization turned on) it compiles y into a single minsd instruction.

&lt;p&gt; -ffast-math turns on "unsafe" optimizations that result in x being compiled the same as y.

&lt;p&gt; Now, this may seem innocuous enough. Unfortunately, we had the code for x inside a loop performing matrix computations, inside a "bottleneck" procedure taking up a significant part of computation times. Changing it into the y formula yielded a 2.5&#xD7; speedup in a benchmark.</description>
    </item>
  </channel>
</rss>
