16 Mar 2013 nikodemus   » (Journeyer)

Efficient Doesn't Equal Performant

This is a bit of a rant, I guess. Sorry about that.

A few years back I needed a binary heap, and I needed one that was fast and thread safe. So I wrote Pileup.

There are other heaps for Common Lisp, and some of them support operations Pileup doesn't implement out of the box, and all of them claim to be efficient.

...and I'm sure that algorithmically they are. However, constant factors matter more often than you might think. I'll single out CL-HEAP primarily because it has such an authoritative name. :)

A tiny benchmark:

(defvar *1M-numbers* (loop repeat 100000
                           collect (random most-positive-fixnum)))

(defvar *4K-numbers* (loop repeat 4000
                           collect (random most-positive-fixnum)))

(defun insert-and-pop (insert pop heap things)
  (declare (function insert pop))
  (dolist (thing things)
    (funcall insert heap thing))
  (loop for thing = (funcall pop heap)
        while thing))

(defun make-insert-and-pop (make insert pop things)
  (declare (function make))
  (insert-and-pop insert pop (funcall make) things))

(defun time-insert-and-pop (insert pop heap things)
  ;; Time 4 runs.
  (time
   (loop repeat 4 do (insert-and-pop insert pop heap things)))
  t)

(defun time-make-insert-and-pop (make insert pop things)
  ;; Time 1000 runs.
  (time
   (loop repeat 1000 do (make-insert-and-pop make insert pop things)))
  t)

(defun cl-heap-make ()
  (make-instance 'cl-heap:priority-queue))

(defun cl-heap-insert (heap thing)
  (cl-heap:enqueue heap thing thing))

(defun cl-heap-pop (heap)
  (cl-heap:dequeue heap))

(defun pileup-make ()
  (pileup:make-heap #'

Results: (median of three runs for each)

;;; CL-HEAP: insert and pop 1M numbers x 4 into a single heap
Evaluation took:
  6.077 seconds of real time
  6.054307 seconds of total run time (5.871448 user, 0.182859 system)
  [ Run times consist of 0.300 seconds GC time, and 5.755 seconds non-GC time. ]
  99.62% CPU
  15,758,370,862 processor cycles
  208,389,696 bytes consed

;;; PILEUP: insert and pop 1M numbers x 4 into a single heap
Evaluation took:
  0.409 seconds of real time
  0.410249 seconds of total run time (0.409089 user, 0.001160 system)
  100.24% CPU
  1,060,531,810 processor cycles
  3,053,296 bytes consed

;;; CL-HEAP: make 1K heaps, insert and pop 4K numbers
Evaluation took:
  38.221 seconds of real time
  38.051652 seconds of total run time (37.799509 user, 0.252143 system)
  [ Run times consist of 0.433 seconds GC time, and 37.619 seconds non-GC time. ]
  99.56% CPU
  99,125,251,225 processor cycles
  1,940,254,144 bytes consed

;;; PILEUP: make 1K heaps, insert and pop 4K numbers
Evaluation took:
  3.468 seconds of real time
  3.476932 seconds of total run time (3.453248 user, 0.023684 system)
  [ Run times consist of 0.021 seconds GC time, and 3.456 seconds non-GC time. ]
  100.26% CPU
  8,991,845,681 processor cycles
  98,563,520 bytes consed

(I was also going to compare parallel performance, but CL-HEAP doesn't appear to be thread-safe, so...)

This is not to disparage CL-HEAP: it supports things which Pileup doesn't, but it clearly isn't written with constant factors in mind, and this shows.

Constant factors matter.

(Admittedly, I tested this only on SBCL, and it might turn out that CL-HEAP does a lot better -- and Pileup a lot worse -- on some other implementation. This does not alter my main contention that you ignore constant factors at your own peril.)

Syndicated 2013-03-16 12:24:17 from Nikodemus Siivola

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!