Older blog entries for tampe (starting at number 113)

The logic at my fingertips


Guile log is an exploration of logic programming where one can choose to maintain the state in a stack or on a translation list like in kanren. Actually guile-log contains an implementation of the kanren interface that are as fast or faster as the original kanren distribution compiled on chicken ( a scheme environment).

The bad thing is that one need to track the state in more advanced application for which a state which is a combination of a mutual structure and the old cons list. Also this means some overhead and makes things complicated!


The good part is that one can combine the speed of the stack based solution found for example in gprolog with the generality and thread friendly solution that kanren provides.


If one like to just stick to the kanren approach one can take advantage to the stack in many ways which makes the coding experience richer. The basic principle is that you get a notion of going back and forth in time between different states and this walk can be traced to understand more about the program and enable some features

One of the features is to code with meta global variables (my naming). They behave as global variables with respect to computation e.g. storing values over backtracking cycles which means that one can enter sub backtracking regions that need to store for example a list of all results and use that list
as a value. The meta comes from the fact that these variables will behave as normal stack variables / kanren variables in the higher level backtracking where the list of results are used. Also when storing the state and restoring the state these variables will follow and be restored correctly.

Guile log is therefor ideal to use in an interactive proof solver for example because you can work from the guile prompt all the time and have all of guile log and guile scheme at your fingertips and e.g. be able store states try a branch go back, go forth and so on.

maybe one day I will port the coq prover to guile-log or something similar.

Aschm! not a cold just a combination of assembler and scheme

But most of my current time has been on hacking on the internals of guile. Wingo has started coding on a new VM for guile based on a register layout e.g. using local variable index in stead of pop and push to get and set local variable data. I on the other hand have ported SBCL's assembler (see aschm on gitorious) and implemented translations of the original VM and the new RTL VM to native code. Quite fun indeed and can show the speed benefits and overhead of using this.

maybe you can by a factor of 2x going to the RTL VM and another 3x compiling to native. To note here is that much of the overhead comes from the fact that all operations usually have some overhead, for example trying looping around and summing an incrementor e.g. calculating 1 + 2 + 3 ... would mean maybe 250Million additions per seconds for nativly compiled version. This is adding guile fixnums and there is some overhead of managing these datatypes, also the RTL feature is transfered to the native one as well e.g. the natively compiled code does not use the registers as much as one can do.

native logically yours

Stefan

17 Aug 2012 (updated 17 Aug 2012 at 21:35 UTC) »
Its raining cats and cpu instructions

Right now I work with taking a specification of virtual operations, the guile rtl vm, and translate them to assembler code. The idea is to stack the resulting assembler instructions and compile to machine code in stead of using named gotos in C. Using this we will get less execution overhead but more code.


Some tests shows that simple loops like incrementing a counter and sum it as well as simple list operations, get a boost by 3-4 times by doing this. Of cause if expensive vm operations are used most of the time you will not gain much by such an approach.

The drawback of the method is that for example a simple addition may look like,


(define-vm-inst vm-add 79 ((U8_U8_U8_U8 dst x y))
(inst mov call-1 (local-ref x))
(inst mov call-2 (local-ref y))
(inst test call-1 2)
(inst jmp #:z slow:)
(inst test call-2 2)
(inst jmp #:z slow:)
(inst add call-2 call-1)
(inst jmp #:o slow:)
(inst sub call-2 2)
(inst mov (local-ref dst) call-2)
(inst jmp out:)
slow:
(inst mov call-2 (Q rsp))
(c-call scm_sum call-1 call-2)
(inst mov (local-ref dst) rax)
out:)


Disclaimer not debugged yet, but you get the picture 14 instructions!

The problem with verbose code like this is that it increases the compilation time of the code, and the size of the resulting programs. What I'd would like to have is the possibility to have macro instructions which get translated
to the specified code directly in hardware and just need to write


(inst mov call-1 (ref rbp 12))
(inst mov call-2 (ref rbp 13))
(inst macrocall 134)
(inst mov (ref rbp 14) rax)


e.g. just like a c call in amd64(Linux) but the macrocall will
be a hardware expansion on the cpu to the expansion found in slot 134. In all a good 3x decrease in instruction length and of much less complexity (no jumps).

A problem with this setup is that if you would want to use the machine registers effectively you may want to specify which registers you shall use in the macrocall like


(inst macrocall 134 call-1 call-2)

Another issue with this is that the same specific expansion code is located in the cpu for all processes and different processes might want to have their own expansion code in there. Not sure how to solve this, maybe each process have an id that describes which set of expansion it uses and when a new process start to execute on the core that key can be used to recognize if a different macro code should be loaded into the cpu when asked for. Of cause this can lead to expensive context switches. But I find it an interesting feature.

Have fun









8 Aug 2011 (updated 8 Aug 2011 at 20:49 UTC) »

A memory trick


I don't remember where I saw it but someone wrote that using CPS style prolog implementation is costly. And It is a little indeed. It will punish you with a factor of about 3x and the irregularities of garbage collection will hit you with a factor of 6x or more if it is done in passes. At least this is the timings I get on a prolog example written in C using a tail call trampoline and proper closures allocated from the heap. Now the same limiting feature would be present in a implementation in a Lisp or any other higher level language with proper closures and a working GC. The cool thing with the closures needed to do backtracking are that they don't need to allocate the closures on the heap. In stead one can maintain a stack to do the trick. I did implement a quick version using this method and the figures above came as a conclusion. So in principle there is a 4-5 fold of savings possible to go from the VM to Naitive compilation. A note here are that the overhead of the VM masks GC issues and there is not much gained from trying to squeeze extra vm instructions or compiler directives into generating special op-codes to allocate the closures from the stack in a special manner.

So how did I implement this. Well actually I have a small compiler written in scheme that output's c-code from a scheme like language, taking advantage of Alex Shinn's excellent fmt library. In this package there are element's to hide tail call trampoline semantics and closure generation to achieve close to scheme semantics. So in order to allocate the closere from a stack I needed an allocated array and companion pointer lam-stack, I added two extra macros, s-lambda, f-lambda. s-lambda allocates just the closure from the next bytes from the stack and are used in closures typically generated by (%and a b ...). f-lambda are the lambda that defines the next continuation at a junction. for this closure we can allocate it again from the stack and then when the lambda is run we simply set the stack pointer lam-stack to the next byte of the
end of the chunk of environment data for the closure that it needs to express the continuation. (that data is copied over, a strategy that works because we do not set! variables) In all, f-lambda semantics imply that we do not consume the stack and a quite small stack is enough for
many implementations.


The careful reader will recognize that this strategy will consume much more stack then needed. f-lambda sematics can be improved to do much less copying by noticing that the variables carried over to the next continuation follows a total order on the set of subsets of variables needed in the sequence of closures at a junction. This means that one can copy the variables only ones and then reuse the stackframe (so it is apparent that the current versions will need quite a lot of stack space at each junction. Also for the s-lambdas one can save some space by similar techniques. This can decrease the amount of copying and needed stack space.

How does the prolog macro package work? Here is %and%:


(define-syntax :and:
(syntax-rules ()
((_ w x) (Y w x))
((_ (cc cut fail s) (:cut:) . l)
(:and: (cc cut cut s) . l))
((_ (cc cut fail s) x . l)
(:let: ((ccc (:s-lambda: s (fail) (:and: (cc cut fail s) . l))))
(Y (ccc cut fail s) x)))))


and %or%:


(define-syntax :or:
(syntax-rules ()
((_ w x) (Y w x))
((_ (cc cut fail s) x . l)
(:let*: ((P (:scm-call: c_newframe))
(f (:f-lambda: s ()
(:scm-call: c_unwind P)
(or-work P (cc cut fail s) . l))))
(Y (cc cut f s) x)))))


Note here that cc is the continuation cut is a failure of an earlier junction. And here is how one defines a prolog function.


(:define: lam-stack (queens3 UnplacedQs SafeQs Qs)
;(:pp: "queens3 Unplaced ~a -> ~a" UnplacedQs SafeQs)
(:match: (UnplacedQs)
( _ (:var: (Q UnplacedQs1)
(:and: (:call: selectq Q UnplacedQs UnplacedQs1)
(:not: (:call: attack Q SafeQs))
(:call: queens3 UnplacedQs1
(:scm-call: c_cons Q SafeQs) Qs))))
(() (:=: SafeQs Qs))))


And we see that there are overhead and some carefulness in the coding but it's a lot of help over trying to write c-code that consists of 2000 lines of code compared to the 80 lines used to write the application. N.B. I didn't find a way to format the code snippets in a good way, sorry!

There are two more way's that can save time. Inlining and using fixnum arithmetics in stead of SCM based arithmetics. Also many new bottlenecks start to appear like the output printing routine. Anyway the conclusion are that a CPS + tail call trampoline is not to bad as a tool to do backtracking searches if done with the right compiler.

Have fun

25 Jul 2011 (updated 25 Jul 2011 at 21:44 UTC) »

Ordering the functional Ilands

note I use the word continuation below for a closures. This is not exactly what is meant by a continuation in scheme but the used closure function as a sort of delayed computation. Then
this computation object is passed as an argument to various functions until it under the right circumstances is evaluated. So call CPS semantics. Continuations can be implemented to a large degree with the help of this technique though.

Using the guile-2.0 virtual machine and adding a few opcodes targeting the evaluation of prolog programs yields quite good performance (from a few tests maybe less then 2x of gprologs VM). This is accomplished although continuations allocated from the heap are used. The conclusion is that this overhead is not too visible compared to the VM overhead. Actually the version I use combine setjmp like sematics for backtracking with continuation that represents a delayed computation and hence lead to slightly less need of heap compared to using continuations for the backtracking as well.

How are these features used? let's consider the quest of a tree search to solve for example the n-queens puzzle. What features do we have? Well for one thing we need variables that we can set to values that should be visible down the tree as we go out on a limb. If then a failure in the search is signaled we would like to go back and the movie for the values of a variable should be played backwards in order to restore a state at a branch point further up so the search can continue. Another thing is the need to sequence facts, first A should be true, then B and so on. lets write this as


(%and% A B ...)


Another basic building block is to model a branch point e.g. try A, if fail then try B etc. e.g.


(%or% A B ...)


finally we need a way to express that we fail if the fact is true e.g.


(%not% A)


In code %and% could be translated as


(compute (%and% A B) CC Fail)
=
(compute A (/.(Fail2) (compute B CC Fail2)) Fail)


where CC represents what should be computed after (%and% A B) have been computed (/.(arguments ...) code ...) can be looked at like a definition of a function with associated variables inside A B etc. captured e.g. the variable addresses is stored in them. This happens indirectly as in scheme or If you look at /. the creation as a creation of a object from a compute (used variables would then have to be passed explicitly on in the constructor arguments). Fail is the backtracker and is a continuation or can be a setjmp like backtracking. Let's assume that it is the continuation computation that represents whats need to be done if the computation fails. Now over to %or%


(compute (%or% A B) CC Fail)
=
(compute A CC (/.() (unwind) (compute B CC Fail)))


Note here that we need to unwind the variables int the failur computation ((/. ()) e.g. run the movie backwards before we restart computing the next branch. Also the failure computation does not have any arguments.
And finally %not%,

(compute (%not% A) CC Fail)
=
(compute A (/.(Fail2) (Fail)) (/.() (unwind) (CC Fail)))


Note here as well if we success e.g. A fails, we will undo any changes to the variables.

This is a closer look at some of the elements needed to construct prolog like semantics. There are something left like unifications and matching But this is a description to better understand the thoughts flowing out on my posts.


Now I have made a little macro package that outputs from a scheme like language pure c-code. I added closures without a correct set! semantics, allocating the closures from the heap. Also a tail call trampoline framework was added. The generated code is three times slower then compiled gprolog and also it's quite obvious that a lot time is lost due to a hefty use of gc and the heap. So this system did not have a significant improvement compared to the VM approach with a setjmp semantics replacing the failure continuation computation. My next quest will therefore be to modify the macro package to allow allocating the closures from a stack instead. Here the key fact to note are that the Failure continuation should be able to reset to the same position on the stack at a branch. In all there should be hope that memory consumption can be bounded enough to make it practical.

Have fun

A parallel brain is more expressive then an unparallel one

Assume that we have deduced a state S and there are two possible choices S1 and S2 to go from there how to zip them together?

One idea is to assume S1 outputs z1 and S2 outputs z2 and just combine these outputs e.g.


(zip-1 (z1 z2)
     (subshell S1 z1)
     (subshell S2 z2))
Here we create z1 and z2 on an outer stack, then run S1, it produces a value into z1 that are deduced on an inner stack. Then it will keep z1 and revert back to the initial state but store a continuation that are reinstated at back-tracking. This will be able to mimic generator behavior through back tracking. It is safe code because the interface is clear and has a potential performance bottleneck in that we need to reinstall the state of S1 and S2 at back tracking.

The other idea is to consider the stack as a DAG and use two branches of the stack describing the deduction S1 and then deduction S2. First S1 is done, then S2. and the state will be done on-top of each other but the action will be stored in branches e.g. a storage of a variable pointer and a value. But at backtracking, then we will just first backtrack S2 and by doing this not touch the other stack branch. Then we will backtrack S1 and not do the undos on the S2 stack branch. If there are a variable touched in both S1 and S2, with backtracking touching it only in S1 or only in S1 unpredictable things can happen. So this method is faster but can produce strange bugs.

Again parallel generation of sequences can be simulated using this trick.

In all, zip patterns call's for a more delicate stack engine then the common stack implementation one need to manage a forest of DAG's that should support continuation style of separation and storage. linked structures are attractive here due to it's simplicity in these situations, on the other hand the speed of allocations from a stack represented by an array is also attractive. Maybe a combination.

Life on a t-shirt

Oh and now over to that jolly song from all of yo together on that crucifix hill in easter time, one two three ...

One of my favorite t-shirt ideas is to have all the religious celebrities together on crucifixes, (mohammed pained as a cloud with perheps lips showing that he sings along) and all of them singing and smiling :-) And the song should of cause be the one from the life of brian - Always look on the bright side ...)

This picture tell the simple minded theo/philo-egoists in the world to fuck off to some deep cave and keep us hard working people to continue with what we are good at, or, Hey teacher, We don't need your silly religious war, leave them kids alone, or simply C.U. cooperate stupid.

Over to some cool mathematical play. Did you know that the fundamental law's of force physics is non-linear? This have a kind of cool direction, let's call it the twilight direction. The thing is that a common theme to express solution to non-linearity, especially when the nonlinear part is weak, is to first just solve for the linear solution X, then solve for the twighlight solution T, which we get by solving the linear equations again but with a source that represent the nonlinear part applied to X and construct the solution as X + T. Similarly this technique can under a fix-point theorem result in an infinite expansion that lead to a solution to the nonlinear equation.

So, basically T is small ripples that lives on top of the real fat and dirty reality called X. And if the connection is really weak we would never detect T. We should have seen it already if that was the case right! Now over to crystallography. What's cool with this technology is that if you can build a crystal of say hemoglobin, (e.g. the same molecule is reproduced over and over again in a periodic lattice), then you can sort of overlay picture signals of all entities in the crystal and by statistical laws the details get many times more clear then by just looking at one single entity with some kind of microscope. So consider some kind of periodic twighlight ripple T. and consider sensors in billions of places throught a volume, wouldn't that magnify the probability of detection considerable, I wonder if humans have build any similar device to try detect the twilight zone, I know the homo crackpottus have tried it in another universe somewhere.

Oh, I put a little "easter egg" somewhere in _this_ universe so that you can perhaps understand my math (yes a little prose attached to the math) ;-) ... happy reading

functionasm

Alex Shinn, has written a library for scheme that does formatting tasks in a functional style, the fmt library. In this library there is a package for outputting c code. Let's play with that.

A next step might be to be able schemify c code. e.g. introduce <define>, <let*>, <begin>, <if> and <recur> + the operators. The innovation needed to do this is to make sure that everything return values and are able to be composed together with some kind of "sewing" mechanism.

The idea of attacking this is to try keep everything functional as far as possible and then at the final step add syntactic sugars in the form of macros. When trying to do this one realizes that we need two phases of evaluation. The reason are this, consider a function application:


int f(int),
----------------
f((<let*> ((a (+ b c))) (<if> q a b)) + b)?
Now the idea of modelling is to let the statments evaluate to a lambda that takes as an argument #f or a symbol. The lambda applied to a symbol will mean that the result of that expression will generate c-code where the tail expressions will set the symbol to the return result. so we do stuff like,

(define (f-if p x y)
    (lambda (ret)
       (let ((pred (gensym "pred"))
         (c-block
           (c-var 'int pred)   ; define pred integer
           (p pred)            ; execute p expression
                               ; pred is the result
           (c-if pred (x ret) (y ret))))))
This will illustrate the technique (we are using autoquoting features in the macro package). Over to the function application. Here we conclude that in order to be general we need to use

   (int arg)
   ((((<let*> ((a (+ b c))) (<if> q a b)) + b)
    arg)
   (c= ret `(f ,arg))
E.g. we introduce an arg symbol to be able to set it. Doing this we need to know about the function signature. Designing the <define> as a function means that we are not able to register the function signature before the body is evaluated and the function signature needed in case of recursion. So one need one extra lambda dereference. I thught that was too complicated though and used macros in stead to accomplish correct registration order.

The function example lack some wow factor. But for the loop construct in clambda, the <recur> statement, it's a must. Here is how the idea of this works


(<recur> loop ((int n 100) (int s 0)
    (<if> (< n 0)
          s
          (<next> loop (+ s 314))))
====================================
Aproximate translation
====================================
 int n = 100;
 int s = 0
loop:
 if (n < 0)
    ret = s;
 else
 {
    s = s + 314;
    goto loop;
  }

This is kind of handy loop construct and again one need to register the loop symbols this time in order to be able to expand the <next> statement later on.

This is about 300 lines of code. For 600 I guess you would be able to introduce a good enough c to produce workable code with syntax line numbers introduced as well to be able to debug after gcc has found your bugs.

Now this is not that complicated and everything is just a quite simple idea of syntactic sugars. But this will enable scheme macro facilities to your c-code (for one thing you will be able to track source code line numbers to be able to debug your macros correctly. Also going between clambda and pure scheme is a smaller step then between c and scheme this means that you can keep a large part of say guile c-code both in c and scheme at the same time.

Have fun

o.o

Can strange things happen, that matters. I don't know, but strange indeed are the day's I live in. To keep sane is a big struggle and without ears plugged with music. I would not know where I would be.

Anyhow here is a spooky story.

I met an old woman a couple a weeks back and she talked about many things, fools and queens, high and low. But what I previously did not remember was how she spoke about a book, her exact words I don't know, but she talked about an o and a dot.

Then I forget about her, and a week later I sat down and thought a poem on a piece of math would be so perfect. Thinking about my old ford and the day's back when I was young I got into the right mood of writing the poem. And the dance went on. At the end I saw that it was not perfect. I wanted to move the text to the right and wrote "o." at the beginning without knowing why, I just did it cause writing a poem is to express feelings and then I just relax and let the hand walk across the keyboard. So, focus! It struck me, just struck me, don't ask me why,..., why don't write "o.o" instead of "o." and so I did. And so I did.

And so I did, I need to keep that python syntax correct, don't I.

And time passes, life went boring and I took a walk to the library. I stood there just feeling empty and pointed the finger at a shelf of books, letting the finger wander from book to book, reading the titles without thinking, and then I picked, as it felt, randomly, the Illusionist by Fowler.

This is a kind of cool book and in the beginning the character of importance move to a lonely place, becomes depressed and decide to shoot himself with a shotgun. And at the moment the trigger is supposed to go off, Fowler writes the characters o and dot. Probably, to make an illusion of end of life.

The effect of this happening to me is tremendous. I had a brother you know, and his life did not end with an illusion. I took a long walk and did not calm down until I met a family of deers that just stood there about 10m in front of me, relaxing, and reminded me about how life goes on.

o.o

8 Apr 2011 (updated 8 Apr 2011 at 15:24 UTC) »
When thoughts flow too fast

Fasten your seat-belts, earplugs in and here we go. The city flows silently and the leaves dance together with plastic bags in an urban eddy. Music in the blood and so much feelings pressing when you take in everything. Silent shadows with faces that pass by and I think, oh well I need to freeze this moment and carve it into a stone cake, the ones you find in the book: uncle tungsten.

I'm planning to be much more moveable in the future and are preparing a smaller laptop with linux for guile-unify development. I had a stress breakthrough and need to take a brake from work. And of cause the most relaxing thing for me to do is to code like an artist and not like a monkey. Probably the resulting code will be too poetic, but I don't care. I hunt for pieces of ideas. Not running them through the fingers like the end is near. In the end I will return to normal productivity but that's for later. Most important thing for universe minus me in the near future is to shelve out a release of this stuff. Then people who would like to play a tune of two of that weird poetry can do that with some guidance. Or at least not need to read the mind of a sole on the other side of the Atlantic see.

Oh, I'm also trying to push out a physics paper, but that have to take the time it takes. What's cool here is that I will try to make it as a combination of poetry and science - if they dare.

So, for me life can be stated with the one liner:

Play it again Sam!

Dream on folks!

6 Jan 2011 (updated 20 Jan 2011 at 05:53 UTC) »
A flow of thoughts

So, I start as usually with my bad prefix.

I haven't had so much fun for ages. The Yang Mill theory is a really great work of illusion. My bet is that people who analyze the equations will not realize the true nature and continue with their hopeless task. You really have to think out of the box in order to solve this.

The world may look straight. But in reality looks queer. (freely translated from Copernicus with help of Einstein)

A huge hug to you all

104 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!