Older blog entries for nikodemus (starting at number 68)

SBCL Numeric Performance Today

I've been working on and off on a new microbenchmark tool -- primarily for SBCL, but usable for other implementations as well. Last night I finally got around to teaching it how to generate pretty pictures, using Google Visualization API, and wrote a number of microbenchmarks that show the variety of numeric performance in SBCL.

Here it is: Performance of addition variants in SBCL 1.0.53.31 The higher the bar, the better the performance -- the numbers are million iterations of the benchmark per second of run-time.

Each benchmark does a roughly comparable task: adds two numbers. What varies is what the types of the numbers are, and how much the compiler knows about the situation. (In some benchmarks there may be an extra comparison or so per iteration to keep the compiler from getting and flushing out the code as effectless.) There are basically four classes of performance:

  • Superb: Modular inline integer arithmetic. This is performance essentially identical with what you'd expect from C or ASM.
  • Good: Compiler knows the types, the argument types are inline-arithmetic-friendly, the result type is not in doubt (addition of two fixnums can be a bignum), and the function doing the addition is inlined at the site where the results are unboxed and the result is used.
  • Decent: Compiler knows the types, the types are inline-arithmetic-friendly and have an immediate representation, but the function doing the addition is out of line.
  • Bad Generic arithmetic on anything else but fixnums small enough for the result to be a fixnum is just not that great.

What should be of interest to anyone optimizing floating point performance is that type-checking doesn't really cost anything measurable most of the time. All of those benchmarks do full type typechecks except for double-unsafe-sans-result+, and the gain over the safe variant is minuscule.

What matters is that you generate inline arithmetic so that your floating points don't get boxed. On x86-64 SBCL has immediate single-floats, so occastional boxing isn't quite as disastrous (compare single+ and double+), but getting rid of the boxed representations completely is a huge win -- just compare single+ to complex-double-inline+.

Postscript: I know not everyone reading this will be clear on unboxed, boxed, immediate, non-immediate, etc. My apologies. I will try to remedy the situation and write about the different representations and how and why they matter at a later date.

Post-Postscript: I will be publishing the benchmark tool once it settles down, and once I have a chance to test-drive it with something besides SBCL. Could be a while, though. If you urgently need it, get in tough and we'll arrange something.

Syndicated 2011-11-16 11:04:46 from Nikodemus Siivola

What Does Extensible CAS Mean?

Yesterday I said SBCL now had extensible CAS. Was sind paranormale Tonbandstimmen, what is CAS, and why should you care if it's extensible? Turn off the music and I'll tell you.

CAS is short for compare-and-swap. Compare-and-swap is a fairly common atomic operation. It compares the current value of a memory location with another value, and if they are the same it replaces the value of that memory location with a specified new value.

Depending on the language and the exact design of the interface, it might just return true for success, or it might return the old value. In SBCL it does the latter, which is sometimes very convenient, but also means you need to compare the return value to the expected-old-value you specified to know if CAS succeeded.

Because it is atomic, if you have two threads doing CAS on the same memory location in parallel, only one can succeed:

(let* ((x (list nil))
       (a (join-thread (make-thread (lambda () (cas (car x) nil :a)))))
       (b (join-thread (make-thread (lambda () (cas (car x) nil :b))))))
  ;; Because CAS is atomic, we know that exactly one of the threads
  ;; will succeed -- but we can't know which beforehand.
  (cond ((not a)
         ;; A returned NIL, therefore it replaced the CAS with :A
	 ;; and therefore B must return :A and do nothing.
         (assert (and (eq :a (car x)) (eq :a b))))
        ((not b)
         ;; ...and vice versa.
         (assert (and (eq :b (car x)) (eq :b a))))
        (t
         (error "WTF? Broken CAS?"))))

If you have the least bit of threads on your mind, you can imagine how this can be quite useful.

Out of the box current bleeding edge SBCL supports CAS on a number places: car, cdr, first, rest, svref, slot-value, standard-instance-access, funcallable-standard-instance-access, symbol-value, symbol-plist, and defstruct-defined slot accessors with slot types fixnum and t. (Note: slot-value is not currently supported by CAS if slot-value-using-class or friends are involved -- that's still in the works.)

With the exception of slot-value all of those pretty much come down to a single LOCK:CMPXCGH instruction on Intel architectures.

...but what it you have a data structure -- say a queue of some sort -- and want to implement cas-queue-head which does CAS on the first element of the queue. Fine. You can do that without any CAS support from the implementation by using eg. a lock.

...but what if you want to write a macro that operates on a CAS-able place?

(defmacro my-atomic-incf (place &optional (delta 1))
  "Spins till it succeeds in atomically incrementing PLACE by
DELTA. PLACE must be CASable."
  ;; OOPS! We're multiply-evaluating PLACE.
  (with-gensyms (old new n-delta))
    `(let* ((,old ,place)
            (,n-delta ,delta)
            (,new (+ ,old ,n-delta)))
      (loop until (eq ,old
                      (setf ,old (cas ,place ,old ,new)))
            do (setf ,new (+ ,old ,n-delta)))
      ,new))

Now imagine some hapless user doing:

(loop with i = 0
      while (

Where instead of I increasing by 1 each time through the loop and iterating across the whole vector, it could increase I by who-knows-how-many on a single attempt skipping entries and even running out of bounds. Ouch.

Turns out that to write a macro that operates on a CASable place you need something analogous to get-setf-expansion, except for CAS instead of SETF. As of yesterday, SBCL has sb-ext:get-cas-expansion that you can use to write a macro like my-atomic-incf correctly and safely.

(defmacro my-atomic-incf (place &optional (delta 1) &environment env)
  (multiple-value-bind (vars vals old new cas-form read-form)
      (sb-ext:get-cas-expansion place env)
    (with-gensyms (n-delta)
      `(let* (,@(mapcar 'list vars vals)
               (,old ,read-form)
               (,n-delta ,delta)
               (,new (+ ,old ,n-delta)))
          (loop until (eq ,old (setf ,old ,cas-form))
                do (setf ,new (+ ,old ,n-delta)))
          ,new))))

What's more, we've now have the notion of a generalized CASable place, just like Common Lisp has the notion of a generalized SETFable place.

This means that the person writing cas-queue-head can use defcas, define-cas-expander, or even just:

(defun (sb-ext:cas queue-head) (old new queue)
  (cas-queue-head queue old new))

to make their CASable place a first-class citizens on equal footing with the baked-in ones -- so that

(my-atomic-incf (queue-head queue))

will Just Work. (Assuming your cas-queue-head works, of course.)

I think that's pretty nifty. I'm still looking at adding support for (cas slot-value-using-class), which will be even niftier. Who says there's no innovation in open source? (Maybe I'm feeling a bit hubristic right now. I'll come down soon enough when the first bug-reports hit the fan.)

Feel free to turn Laurie Anderson back on now.

Syndicated 2011-11-13 09:52:28 from Nikodemus Siivola

Extensible CAS and Timeouts All Over The Place

More IndieGoGo funded goodies have landed in the SBCL repository:

  • COMPARE-AND-SWAP is now user-extensible, and also supports SLOT-VALUE STANDARD-INSTANCE-ACCESS, and FUNCALLABLE-STANDARD-INSTANCE-ACCESS out of the box.
  • All blocking functions in the threading API and the SB-CONCURRENCY contrib support timeouts and deadlines.

More to come...

Syndicated 2011-11-12 13:02:19 from Nikodemus Siivola

SBCL Threading Work, First Merge

I just merged the first part of the IndieGoGo funded SBCL threading work. This entailed:

  • GRAB-MUTEX now supports timeouts on all platforms.
  • CONDITION-WAIT now supports timeouts.
  • Killing lutexes, replacing them with userspace synchronization. This main affects non-Linux platforms. Performance implications are a mixed bag, depending on what you're doing. Some things perform an order of magnitude better, some things about the same, some things somewhat worse. The stability improvements on Darwin are well worth the costs, though and I will try to address the cases where performance suffers in due course.

There's more to come---stay tuned. ...and please report any regressions to SBCL bug-tracker on Launchpad or on the mailing lists.

Syndicated 2011-11-09 22:09:23 from Nikodemus Siivola

SB-TEXINFO

Docstring processors are thirteen to a dozen, and those derived from SBCL's docstrings.lisp are a legion of their own.

I've been guilty of copying SBCL's docstrings.lisp to various projects myself, hand tweaking it to suit their needs... till finally I found myself with a half a dozen divergent copies. Eugh.

So, SB-TEXINFO was born: it should contain tweaks from all my copies, merged back on top of SBCL's upstream copy.

I plan to merge it back to SBCL as a contrib once I'm happy with it, but the schedule is still open on that.

Syndicated 2011-11-03 18:13:18 from Nikodemus Siivola

Crowdfunding Success, Work In Progress

My SBCL crowdfunding campaign at IndieGoGo is over, with $16,507 raised and all goals met! (There were some folks who wanted to donate either after the deadline or via a different channel who aren't included in that sum, so the actual number is closer to $17k before IndieGoGo's, PayPal's, and taxman's cuts. I expect my final share to be around $12-13k after dust settles.)

Thank You, Everyone. I can't say this often enough. Your support has been overwhelming, and is greatly appreciated.

Sorting out all the perks is probably going to take most of today and tomorrow, then it's back to work on the project.

To summarize, here are the things that are going to happen, in the approximate order I expect to commit them. (This is a superset of the list at IndieGogo, because it includes some intermediate steps which are likely to get their own commits.)

  • SB-THREAD:MUTEX outside futex-platforms to be built on top of userspace spinlocks. CONDITION-WAIT is also getting a userspace implementation when futexes are not available. This will fix interrupt-safety issues with locks on non-futex platforms such as Darwin and Solaris. (Done, merge pending cleanup.)
  • CONDITION-WAIT, WAIT-ON-SEMAPHORE, and JOIN-THREAD in SB-THREAD, and RECEIVE-MESSAGE and RECEIVE-PENDING-MESSAGES in SB-CONCURRENCY to support timeouts and deadlines on all platforms. (Work in progress.)
  • SB-THREAD:MUTEX to be renamed SB-THREAD:LOCK with similar renamings in the functions and macros. This is mainly motivated by the need for incompatible API-changes to lock acquisition and release. Old names will be supported for a fair while, being deprecated very gradually. Feature :SB-THREAD-LOCK can be used to detect the presence or absence of the new API.(Work in progress.)
  • SB-THREAD::SPINLOCK to be replaced with the new LOCK object. Old names will be supported for a while, being deprecated gradually. (Done.)
  • Locks outside futex-platforms are to be mostly fair by default. (Mostly done, bughunt in progress.)
  • Read/write locks to be added to SB-THREAD or SB-CONCURRENCY. (Not started yet.)
  • Sempahore notification objects to be added to SB-THREAD or SB-CONCURRENCY. (Not started yet.)
  • Allegro-style gates to be added to SB-THREAD or SB-CONCURRENCY. (Not started yet.)
  • CAS support will be extended to at least STANDARD-INSTANCE-ACCESS, and made user-extensible. (Work in progresss.)
  • Other threading API change/enhancement, to be specified later.
  • A new portability library called Madeira will be implemented. I'll blog more on it once I'm starting to work on it in earnest. (Not started yet.)
  • SBCL's big compiler lock will be removed, so that parallel compilation is possible. Again, I'll blog more on this when it's time. (Not started yet.)

I will update this post as work progresses.

Syndicated 2011-09-01 11:03:05 from Nikodemus Siivola

IndieGoGo & The Big Compiler Lock

My SBCL crowdfunding at IndieGoGo has continued to be successful beyond my wildest expectations: it has reached $12k funding so far, which means that Madeira has been fully funded in addition to the earlier SBCL specific goals.

This lead me to a bit of a conundrum. Should I add another goal and try for more funding while there's time left in the campaign (can't adjust the duration after the fact), or leave more than good enough alone? At the end, since I lit upon a good goal that fits well with the previous ones, I opted to add it and see if I could reach all the way to the moon...

If funding reaches $16k, that will be enough to allow us to kiss the big compiler lock goodbye.

What is the big compiler lock? SBCL grabs a lock when it compiles anything. The same lock is also used for some non-compiler activities like modifying the class graph. This lock prevents parallel compilation. Even if you don't care about parallel compilation as such, there are other unfortunate consequences: CLOS dispatch (before caches settle down) can involve compiling code. Most of the time BCL is a tiny annoyance, but sometimes it can be a real headache.

If $16k is not reached, any monies over $12k will be split between general SBCL work and work leading to curtailing the effects of the big compiler lock. (Eg. making CLOS dispatch optionally work without the compiler.)

Finally, a big Thank You to all contributors.

Syndicated 2011-08-27 10:45:24 from Nikodemus Siivola

How To Fix A Compiler Bug

I've been asked for an introduction or examples in working on SBCL.

Let's see if we can make sense of lp#738464. The issue is that code such as

(flet ((foo ()))
  (declare (ftype non-function-type foo))
  (foo))

complains about "failed AVER: (NULL (FUNCTIONAL-ENTRY-FUN LEAF))" instead of giving a sensible error.

  1. Replicate the issue.

    Entering the form in Slime lands me in the debugger with the expected error. This is good. It means that the bug is still there, and that the description is valid. If that didn't happen, it doesn't necessarily mean that the bug is gone -- platform specific bugs, bugs appearing only under specific build options, and bugs contingent on specific optimize settings are relatively common.

  2. Study the issue.
    1. What should have happened?

      We should have gotten an error that says "cannot use a non-function type with FTYPE" or something along those lines.

    2. Why did the thing that happened, happen?

      The bad type apparently propagated quite a ways into the system and caused bad things to happen. (No point overanalyzing things.) It is obvious to me that this error does not occur when the system is dealing with the declaration, but much later -- and it's obvious because I've spent a lot of time working on SBCL. If you know nothing about SBCL, here's how you could figure this out:

      • Notice that the backtrace at the point of error says nothing about dealing with declarations, but is full of stuff like SB-C::CHANGE-REF-LEAF, SB-C::LOCALL-ANALYZE-FUN-1, SB-C::REFERENCE-ENTRY-POINT, etc.
      • Figure out where declarations are handled. You could scan through a bunch of code starting from COMPILE and meta-dotting your way forward. You could ask on #sbcl, #lisp, or sbcl-devel. You could use APROPOS or "git grep -e " with various likely things and look if you find something that looks right.

      Let's assume I git-grepped a bit and finally hit upon "git grep -e def.*ftype". One of the matching lines is:

      ir1tran.lisp:(defun process-ftype-decl (spec res names fvars context)

      which looks like it might just have to have something to do with processing FTYPE declarations...

      Before you go and take a look at it, understand this: you don't need to read all of it just now. What you need to do is to make sure it's the thing that processes FTYPE declarations. TRACE is often insanely useful here: hit C-c C-t on top of the function name in Slime, go to REPL, and try the original form from the bugreport again.

      Apropos: M-. inside SBCL sources (for various reasons that would be nice to fix) is sometimes off-by-one-or-two toplevel forms. In this particular case you're likely to end up on process-type-decl (type, not ftype!) So notice that and find the right toplevel form -- or be terribly confused for a while because trace output looks strange. This happened to me just now...

      Having traced the correct definition we see it indeed being called with our NON-FUNCTION-TYPE, so our diagnosis is confirmed: the place that processes the declarations doesn't notice that it's not a function type.

    3. Do I have any idea how to fix this?

      Making process-ftype-decl complain sounds about right. Looking at the code we see that it uses compiler-specifier-type to parse the type. So we need to check that the type returned is a subtype of function.

      If you don't know yet how to do the equivalent of SUBTYPEP on SBCL internal type objects, you can use the same process as earlier to figure this out: read more source, ask around, etc. (The answer is: use CSUBTYPEP).

    4. What if you have no idea how to fix it?

      Spend a while -- 15 minutes or 15 days, whatever -- trying to understand the issue well enough to fix it. Maybe you'll figure it out, maybe you won't. Repeat this excercise over various bugs for a few years, and you'll be an SBCL expert. Understand that sometimes you might need to look for information outside the sources: textbooks, papers, original implementors, etc.

  3. Fix the issue.

    Because I have every intention of actually fixing this just now, I'll assign the bug to myself and mark it as "In Progress". This helps avoid duplicate work.

    We already know the general shape of the fix we're going for:

    (unless (csubtypep ...) ...complain...)

    Now's the time to figure out just what our complaint is. I opt for reporting the invalid type and then ignoring it.

    (unless (csubtypep type (specifier-type 'function))
      (compiler-style-warn "ignoring declared FTYPE: ~S (not a function type)" spec)
      (return-from process-ftype-decl res))

    I edit to above code into the right place in the definition, recompile it with C-c C-c and go to the REPL to test it. Looking good.

    Note on hotpatching the compiler: sure, you can break things easily. It is, however, much faster than doing a full rebuild. Just set up a persistent slime-scratch buffer, and save your work before doing risky stuff. Even if you break something, most of the time you can fix things without restarting -- the rest of the time you do M-x slime-restart-inferior-lisp, or at the very worst you have to go to the terminal to "kill -9" the process. The thing to note is that even if you did do a full build, you'd still run the exact same risks! All that said, there are changes that are hard or impossible to hotpatch: sometimes you do need to do a full rebuild.

  4. Finish up

    I add a test-case for the bug into tests/compiler.pure.lisp:

    (with-test (:name :bug-738464)
      (multiple-value-bind (fun warn fail)
          (compile nil `(lambda ()
                          (flet ((foo () 42))
                            (declare (ftype non-function-type foo))
                            (foo))))
        (assert (eql 42 (funcall fun)))
        (assert (and warn (not fail)))))

    I commit my changes to the local repository with an appropriate commit message. This is a pretty trivial change, so it doesn't need much of an explanation.

    ignore non-function FTYPEs
    
      Fixes lp#738464.
    
      Give a style-warning and ignore the bad type.

    Now I do a full rebuild and run all tests to make sure I didn't break anything.

  5. Send to upstream.

    At this point I would normally write a NEWS entry for the bug, amend the last commit with it, and push to the official repo. However, SBCL is in its montly freeze right now, so I'm holding up for now and just pushed to the pending branch on my tree at Github.

    Non-commiters should at this point just do "git format-patch -1" and add their patch to the Lauchpad bug report, and add the tag "review" to the bug. (NEWS is conflict magnet, so it's easier all around if the person who commits the patch writes the entry.)

Obviously most any point here could be expanded a lot -- but hopefully this provides a starting point of sorts to people interested in digging into SBCL. Fixing tiny buglets like this is an excellent way to get your bearings.

Syndicated 2011-08-17 07:36:35 from Nikodemus Siivola

New crowdfunding goal: MADEIRA

My crowdfunding campaign has been a roaring success: all goalposts I initially set have been reached in 3 days out of 22!

So, time for a new goal.

If the campaign reaches $12k, in addition to the SBCL specific work I will also implement Madeira, a new portability layer for Common Lisp implementations, focusing on threading and parts not covered by CFFI or USOCKET.

The current plan is for Madeira to support:

  • Basic things like access to command-line arguments, environment, EXIT, etc.
  • Bordeaux-Threads -like functionality, but better defined, with sharper edges.
  • CAS. Remains to be seen how general I can make the support, though.
  • Running external programs. (Probably somewhere between TRIVIAL-SHELL and SBCL's RUN-PROGRAM in functionality.)

Another way to describe Madeira would be to say that it should cover most everythingtaht I keep having to #+sbcl when writing portable code.

Gentle hacker, spread the word: it's not just SBCL users who stand to benefit now.

Thanks You again, Everyone.

Syndicated 2011-08-13 09:27:18 from Nikodemus Siivola

Flabbergasted

I'm having my morning cup of black, and just had a look at the SBCL crowdfunding campaign I put up yesterday.

The initial funding goal was reached in less than a day! I'm awed and humbled.

This means crowdfunding looks like a very viable model for funding future SBCL development, though of course the real question is how repeatable this is. I'm sure I'm at least to a degree cashing in on work I've already done as opposed to what the campaign pitch actually says, but the future will tell...

In the meanwhile, please keep spreading the word! The campaign is now at the point where locking improvements and adding timeouts to all blocking functions in the threading API have been funded.

If it reaches $6000, it means getting read/write locks and semaphore notification objects as well.

Syndicated 2011-08-11 06:24:30 from Nikodemus Siivola

59 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!