Coding Challenge: Increment a date/timestamp string

Posted 2 Nov 2004 at 06:21 UTC by ncm Share This

I encountered a funny problem. I have a file name containing a 14-digit date/timestamp string, like "20041102235959". My code has no interest in what time it indicates, but just needs to transform it into another value that represents one second later. The challenge: write an elegant function to increment a textual timestamp.

You might think, "I can do that in five lines with sscanf, mktime, and sprintf!" And you should think that, although you'd find the lines longish.

But consider the string as a pure 14-digit number, with digits that have very odd add-with-carry relationships. What does the code look like that adds one to such a number without depending on any external library? Such a function could be a lot faster than calling mktime etc., and (more importantly) a lot more fun to code. Curiously, the ASCII (actually, BCD) representation is ideal for leap-year identification, demanding only bitwise operations.

Assume the input string represents a valid 14-digit date/time value, null-terminated, that may be overwritten. I'll post a link to my most elegant (33-line) solution in a few days. Extra credit for skipping over optional non-digit punctuation, e.g. "2004/11/02 23:59:59".


This probably isn't what you meant, posted 2 Nov 2004 at 09:07 UTC by ringbark » (Journeyer)

000000 IDENTIFICATION DIVISION.
000000 PROGRAM-ID. KS-BUMP.
000000 AUTHOR. RINGBARK.
000000 ENVIRONMENT DIVISION.
000000 CONFIGURATION SECTION.
000000 SOURCE-COMPUTER. NCR-IMOS.
000000 OBJECT-COMPUTER. NCR-IMOS.
000000* it's a long time since I last wrote the first three divisions
000000* so please excuse me if I'm not quite right with them
000000 DATA DIVISION.
000000 LINKAGE SECTION.
000000 01  WS-DATE-TIME.
000000     03  WS-YEAR        PIC 9999.
000000     03  WS-YEAR-RDF REDEFINES WS-YEAR.
000000         05  WS-CC      PIC 99.
000000         88  WS-CENTURY-LEAP-YEAR VALUE 20, 24, 28, 32, 36, 40, 44, 48,
000000                                        52, 56, 60, 64, 68, 72, 76, 80,
000000                                        84, 88, 92, 96.
000000* ignores century years before 2000 deliberately
000000         05  WS-YY      PIC 99.
000000         88  WS-LEAP YEAR        VALUE 04, 08, 12, 16, 20,
000000                                       24, 28, 32, 36, 40,
000000                                       44, 48, 52, 56, 60,
000000                                       64, 68, 72, 76, 80,
000000                                       84, 88, 92, 96.
000000         88  WS-CENTURY          VALUE 00.
000000     03  WS-MONTH       PIC 99.
000000     88  WS-30-DAY               VALUE 09, 04, 06, 11.
000000     88  WS-31-DAY               VALUE 01, 03, 05, 07, 08, 10, 12.
000000     88  WS-FEBRUARY             VALUE 02. 
000000     03  WS-DAY         PIC 99.
000000     03  WS-HOUR        PIC 99.
000000     03  WS-MINUTE      PIC 99.
000000     03  WS-SECOND      PIC 99.
000000 PROCEDURE DIVISION USING WS-DATE-TIME.
000000 MAIN SECTION.
000000 BUMP-TIME.
000000    ADD 1 TO WS-SECOND.
000000    IF WS-SECOND NOT EQUAL TO 60
000000       GO TO BUMP-TIME-X.
000000    MOVE ZEROES TO WS-SECOND.
000000    ADD 1 TO WS-MINUTE.
000000    IF WS-MINUTE NOT EQUAL TO 60
000000       GO TO BUMP-TIME-X.
000000    MOVE ZEROES TO WS-MINUTE.
000000    ADD 1 TO WS-HOUR.
000000    IF WS-HOUR NOT EQUAL TO 24
000000       GO TO BUMP-TIME-X.
000000    MOVE ZEROES TO WS-HOUR.
000000    ADD 1 TO WS-DAY.
000000    IF WS-DAY LESS THAN 29
000000       GO TO BUMP-TIME-X.
000000* preceding condition to skip next ugly condition most of the time
000000    IF ( WS-31-DAY AND WS-DAY = 32 )
000000    OR ( WS-30-DAY AND WS-DAY = 31 )
000000    OR ( WS-FEBRUARY AND 
000000         WS-LEAP-YEAR AND WS-DAY = 30 )
000000    OR ( WS-FEBRUARY AND
000000         WS-CENTURY AND WS-CENTURY-LEAP-YEAR AND WS-DAY = 30 )
000000    OR ( WS-FEBRUARY AND
000000         NOT ( WS-LEAP-YEAR OR WS-CENTURY ) AND WS-DAY = 29 )
000000       MOVE 01 TO WS-DAY
000000       ADD 1 TO WS-MONTH.
000000    IF WS-MONTH EQUAL TO 13
000000       MOVE 01 TO WS-MONTH
000000       ADD 1 TO WS-YEAR.
000000* fails after 9999, but field not big enough to handle Y10K problem anyhow.
000000 BUMP-TIME-X.
000000    EXIT PROGRAM.

You're right., posted 2 Nov 2004 at 13:54 UTC by ncm » (Master)

Thanks for the reminder why we all (still) hate COBOL, and for the most graceful possible example of the opposite of elegant.

A sample case, posted 2 Nov 2004 at 16:25 UTC by nymia » (Master)

The algorithm is shown below.

           1  11  11  1    
  2004 11 01  11  59  59
+                      1
========================
  2004 11 02  00  00  00

Bad sample, posted 2 Nov 2004 at 16:51 UTC by ncm » (Master)

For the record, nymia's sample has the hours wrong: they roll over after 23, not 11. Consider, rather:

        1  1  11  11  1
  2000 02 29  23  59  59
+                      1
========================
  2000 03 01  00  00  00

Be sure to get the centennial and quadricentennial Februaries right!

Elegance, posted 3 Nov 2004 at 08:51 UTC by ncm » (Master)

I'm starting to get solutions in the mail. The question came up: what do I mean by elegant?

Every elegant solution iterates or recurses. Recursive solutions are ever so slightly more elegant that iterative ones. Solutions that accommodate punctuation, and reproduce it in the result, are more elegant. Solutions that operate per-character, rather than stepping by twos, are more elegant. Solutions that avoid multiplications and divisions are more elegant. Solutions that avoid conversions between ASCII (or EBCDIC, as the case may be) and binary are more elegant. Solutions that work the same on ASCII and EBCDIC native codesets are more elegant. Solutions that use bitwise operators rather than sequences of comparisons are more elegant. Solutions that check for a leap year only when necessary are more elegant. Solutions that use lookup tables only for truly irregular specifications are more elegant.

I've been refining my solutions; they still in no wise resemble anything else I've seen. The iterative one is 29 lines. The recursive one is also 29 lines, and while it's a bit slower, it accommodates punctuation properly, and is nicer to look at.

not sure, why not do it like this?, posted 3 Nov 2004 at 17:48 UTC by tarzeau » (Observer)

date +%s -d now+1second; date +%s

you should check the source of date for details.

Some solutions, posted 3 Nov 2004 at 22:13 UTC by ncm » (Master)

I've got a few solutions via e-mail. In the interest of stirring up some inspiration, here they are, in order of receipt. Can they be improved upon?

Yoann Aubineau -- (yaubi)
Colin Leroy -- (colinleroy)
Mauro Persano -- (mpr)

They are slightly altered for presentation; edits welcome. I'll save mine out for a couple more days. Some good strings to test with are

  19991231235959
  20000228235959
  20000229235959
  20040228235959
  21000228235959
  19000228235959
  16000228235959

Here's my best !, posted 4 Nov 2004 at 14:59 UTC by freetype » (Master)

that's 31-lines, with skipping over non-digits, and three spacing lines that can be removed to make it a 28-liner.

That made it for a good half-hour fun, but this kind of code tends to be completely unmaintainable. I hope you don't rely on such things for production :-)

void  bump_timestamp( char*  stamp )
{
  int    maxs[14] = { 9, 5, 9, 5, 9, 2, 9, 3, 9, 1, 9, 9, 9, 9 };
  int    vals[14], pos[14], nn, mm, v, month; 
for ( nn = 0, mm = 0; nn < 14; nn++ ) { for ( ; (unsigned int)(v = (stamp[mm]-'0')) >= 10; mm++ ); pos[13-nn] = mm++; vals[13-nn] = v; }
if ( vals[5] == 2 ) maxs[4] = 3; /* adjust hour max */ if ( vals[9] == 1 ) maxs[8] = 2; /* adjust month max */ if ( (month=(vals[8]+vals[9]*10)) == 2 ) { /* handle february */ if ( vals[7] == 2 ) { maxs[6] = 8 + ( ((vals[10]+2*vals[11])&3)==0 && ( vals[10]!=0 || vals[11] != 0 || ((vals[12]+2*vals[13])&3)==0 ) ); maxs[7] = 2; } } else if ( vals[7] == 3 ) { /* handle other months */ maxs[6] = (0x2AD5 >> (month-1)) & 1; }
for ( nn = 0; nn < 14; nn++ ) { if ( vals[nn] < maxs[nn] ) { stamp[pos[nn]] += 1; break; } stamp[pos[nn]] = '0' + (((1 << nn) & 320) != 0 && vals[nn+1] == maxs[nn+1]); } }

Another useless optimization, posted 4 Nov 2004 at 15:05 UTC by freetype » (Master)

The final loop can be written as

  for ( nn = 0; nn < 14 && vals[nn] >= maxs[nn]; nn++ )
    stamp[pos[nn]] = '0' + (((1 << n) & 320) != 0 && vals[nn+1] == maxs[nn+1]);
  stamp[pos[nn]] += 1;

Which reduces the program by 4 lines. We're now at 24 lines, and something really, really, difficult to understand without aspirin :-)

Shame on me for spending time on this thing

Ugh, posted 4 Nov 2004 at 16:06 UTC by mpr » (Journeyer)

Ugh. ncm posted my entry. For the record: I wouldn't call it elegant - "fugly" would be more appropriate. I just tried to follow some of ncm's elegance guidelines (recursive, one digit at a time, and so on). Dang, now I got to improve it.

Hour Segment Logic Circuit, posted 4 Nov 2004 at 19:29 UTC by nymia » (Master)

I have not written the code yet, but I would imagine the hour segment will be coded using the logic circuit shown below.

C = Carry
A,B = Addends
R = Sum
R' = Exception


R0 = A0 + B0 + C0
R1 = A1 +B1 + C1
R0' = R0 - 4 if R1=2
R0' = R0 if R1 in [0,1]
R1' = 0 if R1 = 2
R1' = R1 if R1 in [0,1]

                      A1 B1 C1      A0 B0 C0
                       o  o  o-      o  o  o
                       |  |  | \     |  |  |
                       \  |  /  \    \  |  /
                        =====    \    =====
                        |   |     \    | |         
                        |   |      ----+ |
                        C2  .R1          .R0
                            ..........   .
                            .        .   .
                            .        o   o
                        .....        |   |
                        .            \   /
                        .             ===
                        .               |
                        .               |
                        .               .
                        .   .............
                        .   .           .
                        o   o           .
                        |   |           .
                        \   /           .
                         ===            .
                           |            .
                           |R1'         .R0'
                           |            .

ASCII Art, posted 5 Nov 2004 at 00:56 UTC by ncm » (Master)

nymia: please don't feel obliged to post the diagram for the day-of-month/leap-year part. :-)

By the way, I didn't say so before, but these times are GMT, so there's no Summer Time adjustments to worry about. Entrants so far seem to have figured that out for themselves.

A small bugfix, posted 5 Nov 2004 at 12:09 UTC by freetype » (Master)

Oops, my latest optimization didn't handle correctly the case where the year did overflow (i.e. 9999 => 0000). Here's a new version of the code where I did get rid of the multiplication by 10. That's 27 lines with spacers, or 22 without them.

This is getting meaningless

void  bump_timestamp( char*  stamp )
{
  int    nn, mm, v, vals[14], pos[14], maxs[14] = {9,5,9,5,9,2,9,3,9,1,9,9,9,9};
  
  for ( nn = 0, mm = 0; nn < 14; nn++ ) {
      for ( ; (unsigned int)(v = (stamp[mm]-'0')) >= 10; mm++ );
      pos[13-nn]  = mm++;
      vals[13-nn] = v;
  }
  
  if ( vals[5] == 2 ) maxs[4] = 3;  /* adjust hour max */
  if ( vals[9] == 1 ) maxs[8] = 2;  /* adjust month max */
  
  if ( vals[8]==2 && vals[9]==0 ) {  /* handle february */
    if ( vals[7] == 2 ) {
      maxs[6] = 8 + ( ((vals[10]+2*vals[11])&3)==0 && ( vals[10]!=0 || vals[11] != 0 || ((vals[12]+2*vals[13])&3)==0 ) );
      maxs[7] = 2;
    }
  } else if ( vals[7] == 3 ) {  /* handle other months */
    maxs[6] = ((0x29AA >> vals[8])^vals[9]) & 1;
  }
  
  for ( nn = 0; nn < 14 && vals[nn] >= maxs[nn]; nn++ )
    stamp[pos[nn]] = '0' + (((1 << nn) & 320) != 0 && vals[nn+1] == maxs[nn+1]);
    
  if ( nn < 14 ) stamp[pos[nn]] += 1;
}

Posting solutions, posted 5 Nov 2004 at 15:00 UTC by ncm » (Master)

Probably any more solutions posted should be links to diary entries. (The ">>" link at the top of the entry has the URL.) You can avoid cluttering the "recent diaries" page by immediately putting in another diary entry. Besides keeping this page clean, it means you can edit your entry after posting, as you get new ideas.

It's fun to see people rediscovering ideas I had, and also to see ideas I never would have thought of. I particularly like the "if ((1 << n) & mask)" trick, which I've used in other contexts but didn't think of here. Literal masks should be hex, though, like "0x140" instead of "320" -- this isn't an obfuscation contest! Clarity is elegant too. That means, too, that extra-long lines count against you, and blank lines don't.

Don't be shy about stealing from posted solutions. That's how progress happens, by combining good ideas. The field is far from mined out -- I can see major improvements possible in everything posted so far. My hope is to evolve a "most elegant possible" solution, at the end, with subcategories for iterative, recursive, and "other".

One more point of elegance: tables that are static const are better than writable tables patched during the run.

wow, posted 7 Nov 2004 at 21:01 UTC by cmm » (Journeyer)

it is quite surprising to see how many people can increment a string.

My best so far..., posted 9 Nov 2004 at 06:25 UTC by ncm » (Master)

I have three.

First, here is an iterative version, 22 lines.

Second, here is a recursive version, 21 lines. I like the three cases of indexing into a (different) constant literal string. This one handles one byte of punctuation per digit-pair, but it's a lot slower than the others.

Third, here is an unrolled version, 24 lines. By the rules it's not elegant, but like ringbark's entry, it might be called graceful, if a bit — as Mauro remarked — evil. If you were inclined, like nymia, to implement it in hardware, you might start with this one. (If you prefer to see shorter statements, consider this one instead, 27 lines.)

To notice: In month = (stamp[4] * 10 + stamp[5]) & 0xf; these are character values being operated on, and the "& 0xf" at the end cleans up. The iterative and recursive versions clean up the month and day units-digits after the fact, but from opposite directions. All avoid running the complicated tests on digit 7 if it happens to be a '9'.

Can you improve on any of these?

Improvement, posted 9 Nov 2004 at 09:50 UTC by freetype » (Master)

First, there is a bug in your code, which incorrectly computes 2010 as a leap year. You should replace the (stamp[3] & 3) != 0 with something like (((stamp[2] << 1)+stamp[3]) & 3) != 0) instead. Of course, this will probably mean one more line to each one your routines ;-)

Seeing various solutions posted here, including your own, I suppose you were kidding when you said:

    this isn't an obfuscation contest! Clarity is elegant too

Any contest which values minimal line counts will naturally pick best solutions among the most obfuscated and unmaintainable programs.

And now, here's a 19-lines recursive routine which is capable of skipping over punctuation (remove one line if you don't need to support that). It also doesn't use any multiplication. I suppose it's possible to do better, but I'll let this as an exercise to the reader ;-)

int  bump( char*  s, int  n = 0, char*  y = NULL, char* m = NULL )
{
  int  max = "999919392959590"[n], result = (n >= 14);
  while ( ((1 << n) & 0x1551) && *s && (unsigned)(*s-'0') >= 10 ) s++;
  if ( n < 14 && *s && bump( s+1, n+1, (n==0) ? s : y, (n==4) ? s : m ) ) {
    max -= (6 & -( n==9 && s[-1]=='2' )) + (7 & -( n==5 && s[-1]=='1' ));
    if ( n==7 ) {
      if ( s[-1]=='2' && m[0]=='0' && m[1]=='2' ) {
        max = '8' + ( ((y[3]+(y[2]<<1))&3)==0 && ( (y[2]|y[3])!='0' || \
                      ((y[1]+(y[0]<<1))&3)==0 ) );
        if ( *s >= max ) s[-1] = '3';
      }
      else if ( s[-1]=='3' ) max = '0'+ (((0x29AA >> (m[1]-'0'))^m[0]) & 1);
    }
    result = (*s >= max);
    *s     = !result ? *s+1 : '0'+(((1 << n)&0xA0) != 0);
  }
  return result;
}

PHP Solution, posted 9 Nov 2004 at 14:28 UTC by prozac » (Journeyer)

// timeinc - increment a time/date stamp by N seconds

function timeinc($time, $n) {

return $time + $n; }

This works.

One can verify it by use of mktime() to create two times one second apart, and then get the difference between them for the interval between two seconds. It just so happens that it is 1 in PHP.

PHP Solution doesn't work, posted 9 Nov 2004 at 14:57 UTC by freetype » (Master)

Prozac, I think you misunderstood the problem. The PHP "timestamp" is not what we're trying to increment here. The current competition's input is a human-readable formatted date, not the number of seconds since the Unix epoch. The crux of the problem is to write a routine that can modify the human-readbla estring directly, without using helper routines

Of course, this is totally useless, or I wouldn't dare posting there :-)

Python solution, posted 9 Nov 2004 at 20:32 UTC by eskimoses » (Journeyer)

Here is a concise solution in Python:

def dateinc(s) :
  def carry(list, modulus) :
    if len(list) == 1 : return (list[0],)
    else              : q = carry(list[1:], modulus[1:]); return (list[0] + q[0] // modulus[1],) + q
  s = reduce(lambda x,y:x+y, map(lambda x : (x >= '0' and x <= '9') * x + '', s))
  result = map(lambda x,y:x+y, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  modulus = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][result[1]],24,60,60]
  if result[0] % 4 == 0 and (result[0] % 100 != 0 or result[0] % 400 == 0) :
    modulus[2] += 1                               # Account for leap day.
  result = carry(result, modulus)                 # Carry the extra second.
  result = map(lambda x,y:x%y, result, modulus)   # Reduce each element.
  result[1:3] = map(lambda x:x+1, result[1:3])    # Readjust to 1-base.
  return reduce(lambda x,y:x+y, map(lambda x : (x < 10) * '0' + str(x), result))
print dateinc('1999/12/31 23:59:59')
print dateinc('20000228235959')
print dateinc('20000229235959')
print dateinc('20040228235959')
print dateinc('21000228235959')
print dateinc('19000228235959')
print dateinc('16000228235959')

Python solution (redux), posted 9 Nov 2004 at 22:02 UTC by eskimoses » (Journeyer)

Save one line by propagating the carry bit:

def dateinc(s) :
  def carry(list, modulus) :
    if len(list) == 1 : return [list[0] // modulus[0], list[0] % modulus[0]]
    else              : q = carry(list[1:], modulus[1:]); return [(list[0] + q[0]) // modulus[0], (list[0] + q[0]) % modulus[0]] + q[1:]
  s = reduce(lambda x,y:x+y, map(lambda x : (x >= '0' and x <= '9') * x, s))
  result = map(lambda x,y:x+y, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  modulus = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][result[1]],24,60,60]
  if result[0] % 4 == 0 and (result[0] % 100 != 0 or result[0] % 400 == 0) :
    modulus[2] += 1                               # Account for leap day.
  result = carry(result, modulus)[1:]             # Carry the extra second.
  result[1:3] = map(lambda x:x+1, result[1:3])    # Readjust to 1-base.
  return reduce(lambda x,y:x+y, map(lambda x : (x < 10) * '0' + str(x), result))

Python solution improved, posted 10 Nov 2004 at 12:47 UTC by freetype » (Master)

It's possible to make it a 9-liner :-) Note that the routine is slightly broken because it destroys the punctuation, it doesn't just skip over it.

def dateinc(s) :
  s = reduce(lambda x,y:x+y, map(lambda x : (x >= '0' and x <= '9') * x + '', s))
  result = map(lambda x,y:x+y, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  modulus = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][result[1]],24,60,60]
  modulus[2] += (result[0] % 4 == 0 and (result[0] % 100 != 0 or result[0] % 400 == 0))
  for x in range(len(result)-1): result[-x-2] += (result[-x-1] >= modulus[-x-1])
  result = map(lambda x,y:x%y, result, modulus)   # Reduce each element.
  result[1:3] = map(lambda x:x+1, result[1:3])    # Readjust to 1-base.
  return reduce(lambda x,y:x+y, map(lambda x : (x < 10) * '0' + str(x), result))

Python solution (further reduction), posted 10 Nov 2004 at 13:21 UTC by eskimoses » (Journeyer)

The following uses builtins instead of one-off lambda functions, wherever possible. I've also reworked the carry function into a one-liner that computes the carry in-place.

def dateinc(s) :
  s = reduce(lambda x,y:x.isdigit()*x+y.isdigit()*y, s)
  ans = map(int.__add__, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  mod = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][ans[1]],24,60,60]
  if ans[0] % 4 == 0 and (ans[0] % 100 != 0 or ans[0] % 400 == 0) : mod[2] += 1
  ans = map(lambda x:(ans[x] + reduce(int.__mul__, map(lambda y:ans[y]>=mod[y]-1,range(x+1,len(ans))),x!=len(ans)-1)) % mod[x], range(len(ans)))
  ans[1:3] = map(int.__add__,ans[1:3],[1,1])      # Readjust to 1-base.
  return reduce(str.__add__, map(lambda x : (x < 10) * '0' + str(x), ans))

Re: Python solution improved, posted 10 Nov 2004 at 13:27 UTC by eskimoses » (Journeyer)

Thanks, freetype! Looks like we crossed signals. :) Thanks for pointing out the punctuation problem; I'd misread the original specification. It shouldn't be difficult to fixup the result using the original string as a model, but I won't take the time to work that out now.

Not an Obfuscated Code contest, posted 10 Nov 2004 at 20:06 UTC by ncm » (Master)

It'a a fine point whether ("0000010100000"[i]) is better than ('0' + ((0xA0 >> i) & 1)). The first involves an access to memory that is probably in L1 cache; the second just needs a few ALU operations. On a P4, however, a shift (C >> i) takes i cycles because Intel omitted the barrel shifter from the P4. (Evidently they couldn't make it work in just one cycle.) Most other modern CPUs have them, but many minimal ones don't. Given the ambiguity, the indexed solution seems clearer.

I don't think a Python (or Perl, or PHP, or VB) solution can really qualify here. Any of such code necessarily depends on all kinds of library support, so that probably most of the instructions executed during the call cannot be traced to anything in the source text. (That's not to say that these examples won't serve equally well as evidence of too much free time.)

Thanks to freetype for an excellent entry, and for the correction. Wouldn't you consider, though,

  if (s[-1] == "xxxxx1xxx2xxxx"[n]) { max -= 8 - s[-1]; }
clearer than
  max -= (6 & -( n==9 && s[-1]=='2' )) + (7 & -( n==5 && s[-1]=='1' ));
? I freely admit that the (8 - s[i]) relationship is appallingly arbitrary. In a similar vein, to say ('0' + (((0x29AA >> (m[1]-'0'))^m[0]) & 1)) was fiendishly clever (0x1AA would have sufficed) but is it better than ("0101010110"[m[0] + m[1] & 0xf]) or (in the other direction) ('0' + (((m[0] >> 3) + m[0] + m[1]) & 1))?

correction, posted 10 Nov 2004 at 21:37 UTC by ncm » (Master)

Sorry, that should have been max -= ('8' - s[-1]) or, equivalently,

  if (s[-1] == "xxxxx1xxx2xxxx"[n]) { max = s[-1] + 1; }

Simple in 'C' program to deal., posted 11 Nov 2004 at 21:43 UTC by imp » (Master)

Here's a way that destructively does it. Add a malloc for non-destructive. Thread safty could be improved. It is elegant in that it gets libc to do the work for you. This code also assumes UTC or some other time that doesn't have DST. It ignores leap seconds completely. Most of the other solutions posted here likewise ignore these complications. It won't work accross calender conversion dates either (eg, when the UK adopted the Gregoran calender over the earlier Julian calender). Or any other unknown discontinuity in time. Using mktime and localtime could overcome some of these problems if that's what you are trying to solve.

char *
incr_time(char *s)
{
        struct tm tm;
        time_t t;

strptime(s, "%Y%m%d%H%M%S", &tm); t = timegm(&tm) + 1; tm = *gmtime(&t); strftime(s, "%Y%m%d%H%M%S", &tm);

return s; }

From the original article..., posted 12 Nov 2004 at 04:48 UTC by ncm » (Master)

imp: The original article reads, in part, "You might think, 'I can do that in five lines with sscanf, mktime, and sprintf!' ... But ... What does the code look like that [works]... without depending on any external library?" We frown on use of division because on many CPUs it's a subroutine call.

Why is this concept so difficult?

Further adventures, posted 12 Nov 2004 at 22:10 UTC by ncm » (Master)

Using what we've learned, this one is 18 lines, iterates, handles punctuation, operates byte-by-byte, eschews division, multiplication and BCD<->binary conversions, uses only constant literal arrays (except the argument), rolls year 9999 -> 0000, has helpful variable names and formatting, compiles with "gcc-3.4 -Os" to 114 amd64 instructions, and is almost, but not quite, completely useless.

void increment_datetime(char* date)
{
  char digit = 13, *cur, *mo = date + 4 + ((unsigned)(date[4] - '0') > 9);
  for (cur = date + 14; *cur != '\0'; ++cur) {}
  do {
    int skip = (((unsigned)(*--cur - '0') > 9) && --cur);
    if (*cur != 
        ((digit > 4 && cur[-1] == "xxxxx1xxx2xxxx"[digit]) ? cur[-1] + 1
      : ((digit == 6 && (mo[0] | mo[1]) == '2') ? '2'
      : ((digit == 7) ? ((cur[-1] == '3') ? "0101010110"[(mo[0] + mo[1]) & 0xf]
          : '9' - (       cur[-1] == '2' && (mo[0] | mo[1]) == '2' && 
            (((date[2] << 1) ^ date[3]) & 3) || ((date[2] | date[3]) == '0' &&
            (((date[0] << 1) ^ date[1]) & 3))))
      :   "99991939295959"[digit] ))))            { ++*cur; break; }
    *cur = '0'; 
    if ((digit | 2) == 6) { cur[1 + skip] = '1'; }
  } while (digit-- >  0);
}

I'm sure there's still plenty of room for improvement.

p.s. to imp: No offense meant, you probably just missed seeing that paragraph. It's the other people who mystify me. Do people really imagine that Python, PHP, XQuery don't depend on enormous libraries? I doubt that even 1% of the instructions executed in those implementations have anything to do with the problem being solved.

Re: further adventures, posted 14 Nov 2004 at 19:17 UTC by eskimoses » (Journeyer)

ncm, as you didn't mention C, I read "library function" more liberally than you clearly intended. I didn't "import", after all. ;) But mostly, I glossed over all but "a lot more fun to code" and so decided to have fun with Python's functional idiom. Sorry, never intended to represent it would do well on instruction count.

Don't chide me too much, now; you might tempt me to waste some more time attempting it in C. ;)

Not C, necessarily..., posted 14 Nov 2004 at 20:17 UTC by ncm » (Master)

I'm still hoping to see a crushingly elegant O'Caml solution that compiles to fewer instructions than the best C version.

Much more elegant approach, in my opinion, posted 18 Nov 2004 at 06:18 UTC by jab » (Journeyer)

void itime(char * d)
{
  int i, j, dd[7], tmax[] = {-1, 100, 12, -1, 24, 60, 60},
    md[] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};

for (i = 0; i < 14; i += 2) dd[i/2] = (d[i] - '0') * 10 + (d[i+1] - '0');

tmax[3] = md[dd[2]-1]; if (dd[2] == 2 && !(dd[1] % 4) && (dd[1] != 0 || !(dd[0] % 10))) tmax[3]++;

dd[6]++; for (i = 5; i >= 0; i--) { while (dd[i+1] >= tmax[i+1]) { dd[i+1] -= tmax[i+1]; dd[i]++; } }

for (i = 0; i < 14; i += 2) { d[i] = (dd[i/2] / 10) + '0'; d[i+1] = (dd[i/2] % 10) + '0'; } }

Follow-up, posted 18 Nov 2004 at 06:34 UTC by jab » (Journeyer)

Just realized eskimoses did the same thing in python. Also, my "!(dd[0] % 10)" should be a "!(dd[0] % 4)" (leap on, off, on thing is every 4, 100, 400; not 4, 100, 1000 as I had originally).

But nonetheless, a demonstration that C code can be concise (ala python) without looking insane in comparison.

My god, posted 18 Nov 2004 at 11:53 UTC by freetype » (Master)

ncm, I'm starting to think that you're taking this small competition much too seriously. :-) I seriously doubt that specific CPU implementation details, memory cache latencies and instruction counts are really that interesting. Isn't this article just a fun way to exercise our "hacker" muscle byexploring fun way to exploit our favorite languages ?

Besides, your last implementation is truly great, except that I don't understand why you're using an exclusive or to compute leap years. I mean, since you're obviously focused on instruction count, you should know that something like a+2*b can be computed by a single instruction on x86.

And I'd be extremely surprised to see a shorter solution in OCaml, both in terms of line count and generated instructions. I know pretty well how this language is implemented, and I don't see how you could implement the same with less code without basically duplicating the C code. And does the memory allocator + GC counts as an external library in such case ?

Anyway, take care.

parameters of the challenge, posted 18 Nov 2004 at 16:22 UTC by brlewis » (Journeyer)

I agree with freetype that fun exploration is the main purpose here. However, there are different measures of elegance. If you think more elegance can be achieved by going outside some of the parameters of the original challenge, do so, but state which ones you're going outside of. Otherwise folks who eschewed a particular solution because it was out of bounds might feel like they're being made to look bad.

Competition?, posted 18 Nov 2004 at 17:17 UTC by ncm » (Master)

I don't see this as a competition. It's a challenge: how much unnecessary computation can be squeezed out of the implementation of a small but necessarily messy function? If you're not interested in how your favorite language arranges to get its semantics executed, you're certain to make habitually bad choices, in cases where performance matters. Yes, it's just for fun, but as in everything fun there's something deadly serious lurking in the background.

(Incidentally, now that Moore's Law is petering out, we're all going to be counting cycles again pretty soon, and cursing those of our predecessors who wrote in languages not suitable for compilation to efficient code.)

I'm not focused on instruction count, any more than on line count. Both are approximate measures of the amount of computation invoked. Certainly any instruction that does multiplication is doing way more computation than a shift-xor-and, regardless of instruction count; treble that for divide or modulus. But maybe I miss your point. Are you reminding me that ((a<<1)+b) can use spare address arithmetic machinery (as exposed by the common LEA instruction family), where ((a<<1)^b) must use a complete ALU? Excellent point, maybe. OTOH, xor is cheaper to compute than addition, it's just that there are more adders scattered around in a typical CPU, and addition and xor both take only one cycle anyhow. (I wonder if we'll see CPUs that take two cycles for an addition because of the additional gate delay of the carry logic.)

nymia might have framed the problem most neatly. Suppose your solution were to be implemented in hardware? What would yield the smallest product of transistor-count times cycles? This measure is not language-dependent, and it emphasizes exploiting observations that may be applied in any language. It's expensive to quantify exactly, but we can do a pretty good job approximating it.

A solution in OCaml need not be shorter in line count if it's prettier. Certainly, though, it should have no need to invoke GC during execution of the function in question. I mentioned OCaml just because it is often (and almost uniquely) cited as producing code that is faster than the equivalent C or C++ code, at least for artificial problems (such as this one). A Common Lisp solution would suffice, too, because CL is commonly compiled to machine code.

My last implementation still has several bits of fat, at least. And you're right, I've been taking this way more seriously than it deserves -- but really, isn't that necessarily true about anything we hope to learn from?

multiplication, division, modulo, posted 19 Nov 2004 at 14:42 UTC by brlewis » (Journeyer)

I would expect a good compiler to optimize multiplication, division and modulo into bitwise operations if one of the operands is a constant power of two.

last digit 0-8, posted 19 Nov 2004 at 14:58 UTC by brlewis » (Journeyer)

Is my optimization-focused solution the first one that actually returns after incrementing the last digit by one if no more operations are necessary? I'm a little surprised nobody showed interest in this optimization.

returns, posted 20 Nov 2004 at 01:21 UTC by ncm » (Master)

brlewis: what does { ++*cur; break; } mean to you? And, mightn't it be better never to compute the month at all than to wait to compute it until circumstances come along where you could use it?

Bitwise operation optimization, posted 23 Nov 2004 at 05:28 UTC by robocoder » (Journeyer)

Just adding a qualifier to brlewis's comment:

Depending on the processor, replacing a multiplication, division, or modulo with bitwise operations may depend not only on one operand being a constant power of two, but also on the other operand being unsigned.

re: last digit 0-8, posted 23 Nov 2004 at 05:39 UTC by robocoder » (Journeyer)

It looks like ringbark beat you to the punch ... his reply happens to contain a solution (albeit COBOL) that avoids unnecessary operations.

re: last digit 0-8, posted 23 Nov 2004 at 15:27 UTC by brlewis » (Journeyer)

Sorry I did not communicate clearly in my last post.

My thought was that if you first test for the last digit being 0-8, you avoid any unnecessary operations. If speed is what you're after, you should order your tests such that the most common case comes first. The fact that most solutions don't work in this way shows that people are more interested in concise code than "way fast" code, which I actually agree with.

ncm, you do a number of tests before falling through to the most common case, and one of your tests has a bug. Try incrementing '2002-04-14 05:15:15 rox!'

I don't know the language myself, but the COBOL solution appears to convert the last two digits to an integer regardless of the value of the final digit, then add one to that digit.

Bit twiddling, posted 24 Nov 2004 at 14:28 UTC by mpr » (Journeyer)

Here's a way to increment digits without conditional branches:

/* XXX: 32-bit-centric */

void bump_hhmm(char *h) { int r, i, m, c = 1; static int max[] = { 6, 10, 6, 10 };

for (i = 3; i >= 0; --i) { r = d[i] - '0' + c; m = ((r ^ max[i]) /* zero if r == max[i] */ + 0x7fffffff) /* will overflow to msb if r != max[i] */ >> 31; /* m = 0 if r == max[i], ~0 otherwise */ d[i] = '0' + (r & m); /* zero out result if it's == max[i] */ c = (1 & m) ^ 1; /* next carry */ } }

This is just for hours and minutes, though. I can't think of a "clean" way to extend it to situations where you need two digits to know the carry (hours, days and months).

re: last digit 0-8, posted 24 Nov 2004 at 22:45 UTC by robocoder » (Journeyer)

Checking the last digit for 0-8 may not be faster. Hypothetically:

(a) If we don't check the last digit for 0-8, then the performance is:

90% x (T.increment) + 10% x (T.increment + T.reset)

vs. (b) If we check the last digit for 0-8, then the performance is:

100% x T.checkdigit + 90% x (T.increment)

So, excluding the check for carry, if each operation (T.increment, T.reset, and T.checkdigit) takes a clock cycle to execute, then over 10 runs, (a) takes 11 clock cycles while (b) takes 19 clock cycles.

As for the implicit integer conversion in the COBOL solution: PICs are how variables are declared in COBOL, as far as fundamental data types are concerned.

Gah, posted 29 Nov 2004 at 11:05 UTC by mpr » (Journeyer)

When I wrote "hours and minutes" and "hhmm", of course I meant "minutes and seconds" and "mmss"...

throw a bone to an XSLT hacker?, posted 7 Jan 2005 at 02:30 UTC by connolly » (Master)

Umm... with all the talk of instruction counts and gate delays, I hesitate to bring up XSLT, but this issue is no longer academic for me.

I wrote schedScrape.xsl which takes a schedule expressed in XHTML (ala tantek's lowercase semantic web) and converts it to an RDF vocabuarly for iCalendar, since I already have code to go from there to actual .ics format. See the travel schedule on my homepage for an example.

The trick is, for a date like 7-8 Apr 2005, the iCalendar spec wants dtend to be 20050409 , i.e. one day after 8 Apr. My code ignores this, and has a bug as a result.

XSLT's standard library doesn't have mktime and the like, so I actually need the algorithm in this challenge.

After skimming all the solutions, I don't actually understand the algorithm. Maybe enlightenment will come if I study some more. Or maybe somebody will throw me a bone? Explain the algorithm, please?

d'oh, posted 7 Jan 2005 at 02:36 UTC by connolly » (Master)

enlightment came from the COBOL version. Boy do I feel silly.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page