Lightweight scripting/extension languages

Posted 22 Dec 2003 at 02:25 UTC by atai Share This

Extension laguages are designed to be embedded in applications to support customization of the application behavior. Common scripting languages, like Perl and Python, are fairly "large" with powerful run-time engines and libraries and are widely available and "script" writters usually assume their stand-alone existences in the deployment environment.

However, if one is looking for a language that's small enough so its source can be embedded in the distribution of and built as part of the application, Python and Perl may be "overweight." For the real lightweight choices there are Lua and Tinyscheme. Are there others? What are people's preferences and opinions regarding lightweight extension languages?


Tcl!, posted 22 Dec 2003 at 07:46 UTC by davidw » (Master)

Well... sure, why not? Not my fault if the answer is the same for both articles!

In all seriousness, Tcl is pretty good for what you want, at least down to a certain point, especially if you are willing to do some hacking. For instance, in the Tcl CVS tree there is a reduced-footprint branch that was prepared for Cisco - they use it as an embedded scripting language on some of their fancier routers. ETLinux is also worth a look, although they use a custom Tcl of their own which is based on an older version of the language. Tcl is a good candidate in general for this space because of its nice C API, medium-sized footprint, and the very simple and flexible syntax (you can write control structures in Tcl itself, kind of like scheme). Looking at the negatives, Tcl is not as small as Lua (it does more), and maybe it has too many things built into the core that you would have to attempt to hack out, depending on what you want or don't want, even though there is a bit of code there already to assist you in this.

Other possibilities that I'm aware of include elastiC and Forth, although the latter isn't really a "scripting language" in the sense that the others are. It can be extremely small, though. Ficl is a reasonably well implemented version to play with - I ported it to run on the eCos embedded operating system as a means to interact with the OS. Speaking of which, Lua runs on eCos too.

I think, though, the conclusion that you will find is that it really depends on what you are trying to do. If you want to interact with the user, something "command oriented" like Tcl is great. If you want the smallest thing possible, Forth is probably the way to go. Lua is really fast and looks reasonably powerful. So, tell us what you need and we'll see what we can come up with.

Not Tcl!, posted 22 Dec 2003 at 10:19 UTC by etrepum » (Journeyer)

Tcl is extremely slow and has some evil syntax. How can you recommend that to someone? I've seen Lua used quite a bit for the purpose that is described in the article (i.e. stuff like Celestia, games, etc.).

In any case, "small enough that its source can be embedded" seems to be a pretty stupid goal unless you actually have some serious memory/disk constraints such as with a cell phone, PDA, or game console. I can't imagine why one would use Perl to extend an application, but Python (from experience) or Ruby (just guessing) should be good candidates.

Scriptix, posted 22 Dec 2003 at 15:58 UTC by elanthis » (Journeyer)

I actually wrote my own scripting language for this purpose. Mainly because I write in C++, and needed a language that made extending and using pre-existing C++ classes as painless as possible.

Scriptix website

The language is sort of Java-like, is rather fast (altho there are plenty of cases where I can make it even faster), uses the Boehm GC for memory management (which your C++ app needs to use if you mix classes from your app with Scriptix data types), supports multiple scripted threads of executation, and so on.

Language FUD, posted 22 Dec 2003 at 20:19 UTC by davidw » (Master)

So, let's start with last things first in the anti-Tcl post. The article's author does indeed need to specify what he wants, or we can't really help him much. I'm willing to give him credit for having a reason for asking for something that doesn't require much in the way of resources for his embedding/extension language.

Which brings us back to Tcl. It was designed to be embedded. It has an extensive C API that gives you access to all kinds of internal goodies. If you don't mind crossing that barrier, the rest of the internals are also pretty easy to get a grasp of. It's well written, well documented C code, despite the fact that it has, of course, grown over the years and there are warts here and there.

As to its syntax, saying that it is "evil" is of course a matter of taste. Personally, I like what Tcl offers because it's simple and clean - everything is a command. Of course, it does bow to practicality, which is the reason why it's not another lisp, but I like that - they seem to have found a happy medium (at least for me) between Algol style syntax and Lisp's purity (which I find unweildy). For those interested, Tcl's rules can be summed up in one man page: http://www.tcl.tk/man/tcl8.4/TclCmd/Tcl.htm

As far as Tcl's speed, or presumed lack thereof - well, it's not the fastest thing out there, but it can certainly compete with the likes of Ruby and PHP according to the "Great Language Shootout". In any case, "extremely slow" is certainly an overstatement.

But to each his own - I find a lot to like in Tcl, especially when I started looking "under the hood" at the underlying C code. It's good, solid stuff with solid engineering processes behind it.

As for the original poster, we'll have to hear more to decide what might work best in that situation...

JavaScript, posted 22 Dec 2003 at 20:23 UTC by judge » (Master)

JavaScript is great. mozilla.org's js engine is very portable and appears to be quite fast. Furthermore, js is easy to integrate into existing code an, has a sane syntax and lots of people know it.
Aparently KJS is quite good too.

Re: Not Tcl!, posted 22 Dec 2003 at 22:15 UTC by patthoyts » (Master)

The myth that Tcl is slow seems to have an unholy reluctance to go away. There is no doubt that once-upon-a-time Tcl was indeed slow. However, in common with all other modern script languages, the interpreter core now includes a byte compilation stage which speeds things up quite dramatically.

Speed of execution is not however the aim of using script languages. Tcl encourages very fast development - especialy when writing graphical interfaces. Furthermore, Tcl is by far the simplest of the major script languages to extend with compiled modules. Only VBScript/JScript which can be extended using COM objects come close for easy extension (if you consider writing COM objects simple). The ability to easily extend the language means that a Tcl program never need be slow. The few bottlenecks in your application can be fixed by writing a small amout of C leaving the rest of your app easy to maintain and fast enough.

Easy to extend.., posted 23 Dec 2003 at 01:15 UTC by etrepum » (Journeyer)

You can use unmodified Objective C libraries directly from Python (via PyObjC).. far easier than COM, but you can do that too. You can also use unmodified C libraries directly from Python (via ctypes), but you do have to know the headers in order to properly call the functions.

Currently, PyObjC only works with Apple's Objective C runtime at the moment, but GNUStep support is being worked on.

'Lightweight' is no realistic goal, posted 23 Dec 2003 at 11:47 UTC by tjansen » (Journeyer)

Keeping a language 'lightweight' is not a realistic goal. If you integrate a language into a product and it is successful, sooner or later there will be a bunch of people who are using this language a lot, many of them >8h per day. They are your most important target group and they don't care whether the language is 'lightweight', but will demand all functionality that can make their life easier. Every attempt to keep a language simple is doomed to fail, as an example look at Java's evolution from 1.0 to what is proposed for 1.5. The only approach that can work is to avoid duplication and make the syntax so powerful that you can move functionality from the core language into libraries.

What are the goals, considerations, requirements?, posted 25 Dec 2003 at 20:08 UTC by robocoder » (Journeyer)

Examples:

  • Memory footprint (static and dynamic)
  • Speed (interpreter, byte-code execution, or compiled object code)
  • Extensibility of the libraries
  • Extensibility of the language itself
  • Expressiveness of the language
  • When to catch syntax errors (pre- or post-deployment)

Picking a language (or rolling your own) also involves guesswork and personal bias because the decision is limited by our ability to foresee future uses (or user expectations).

A while back I looked at CSL (an embeddabled scripting language with C-like syntax) and C-Smile (ability to save compiled byte-code for later execution).

Why I use Tcl in Altogether, posted 27 Dec 2003 at 07:56 UTC by brouhaha » (Journeyer)

I use Tcl in Altogether, a micrcode-level Xerox Alto simulator (work in progress). I use it for both scripting and as a debugger command language. It was pretty easy to replace my original hand-crafted argc/argv-style "parser" with Tcl.

When I had some trouble with embedding Tcl (long since worked out), I asked some questions in the newsgroup. I was told that I was doing things "wrong". Instead of adding Tcl to an application as an extension language (as Tcl was originally intended), I was told that the model has changed, and that now the "correct" paradigm is to start with Tcl and build your application by adding onto it. I was extremely unimpressed with this concept, coming as it did from the Tcl "experts". I solved the technical problems on my own, and it's not clear to me that I can reasonably expect any degree of support from those experts.

Because of this, and because I think Lua is a smaller and (IMNSHO) more elegant language, I started to switch from Tcl to Lua. There's actually a Lua-based branch in the Altogether Subversion repository, but it is not functional. I ran into two problems, one aesthetic and one technical in nature.

  1. Since I'm using the extension language to parse all commands the user enters, I don't like the need for parenthesis around arguments (like a function call). For instance, if the Altogether user wants to examine 37 words of memory starting at location 017020, he should type the command "examine 017020 37", not "examine(017020,37)". I want an extension language that looks like a traditional command interpreter. Tcl does that. This is the reason I rejected Scheme, Guile, and FORTH. If a Smalltalk-like extension language were readily available, I might be willing to consider that, although perhaps it would be too verbose (e.g., "memory examineAt: 017020 count: 37").
  2. The interface for adding C functions to Lua is more sophisticated than that of Tcl, exposing a stack that contains objects of various types. While in general I think that is a win, converting Altogether to use Lua was going to require rewriting all of my original command functions to use that rather than the simple argc/argv approach.

I was one of the software engineers at Telebit (R.I.P.) working on the NetBlazer router. We used Tcl as a scripting language, though I didn't ever use it much at the time. I thought it was fine for that, but new router features started getting implemented in Tcl that IMNSHO should have been native code.

I wasn't aware of the reduced-footprint version of Tcl mentioned by davidw, but it probably would suit Altogether just fine.

Tcl extension vs embedding, posted 27 Dec 2003 at 10:01 UTC by davidw » (Master)

There are certainly Tcl experts who will give you the advice you mention above, but there are plenty of others, myself included, who don't really agree, or at least think that there are plenty of situations where Tcl should function as an embedded language. This wiki page: http://mini.net/tcl/9303 contains some discussion amongst Tclers about the issue. I use Tcl to extend Apache, so I can't abandon that model myself, and in reality I think it works pretty well.

I think it's a good idea to be wary of "the one true way" with anything, and I certainly disagree with it for Tcl, which is by its very nature supposed to be flexible enough to support a variety of programming paradigms.

I agree with your desire for a very simple syntax, and recommend that you continue with Tcl if it works for you. Normally, the people on comp.lang.tcl are quite friendly - it is infact one of the few newsgroups I still find useful. A little bit of gentle explanation on your part ought to be enough to make people understand that you are not going to rewrite your program to fit their way of doing things.

Shell syntax, posted 27 Dec 2003 at 18:50 UTC by demoncrat » (Master)

Not to push Scheme at you or anything (I wouldn't argue against Tcl if that's what you're comfortable with), but you can have a Scheme shell that accepts "examine 017020 37" just by omitting the outermost pair of parentheses. A guy I knew at Caltech was scripting the Magic chip layout tool that way and he found it very comfortable. You can also Lua's explicit stack management in your code using a Scheme implementation with a conservative garbage collector.

No single path, posted 27 Dec 2003 at 22:10 UTC by mx » (Journeyer)

I've embedded several scripting engines, into several types of commercial applications. Experiences overall have been mixed, and the best results came from two company-brewed little-languages. The public engines I've worked with included the Moz JS engine, Tcl/Tk, Perl, Java, and a Small-C variant.

The JS engine was the least suitable of the bunch, and proved to be very difficult to port, mostly because it was well-optimised for 32-bit architectures (and the optimisations were not documented). We did port it, but it's been a poor experience overall. The documentation is very light, and its threading model was a thorn in our threading model (especially because it uses the netscape portable library, which added yet-another dependancy).

Small-C was effective, but didn't really fit the users. This is an important point, and is one we missed a few times (Perl, JS, Java also didn't fit well).

Tcl, Java and Perl are larger engines (different sizes, but all suffered similar problems). Each engine has it's own data-model, and proxy to appication code tended to be expensive, especially for our quasi-real-time systems (it was ok for user interfaces, but users didn't really want to type in code in the UI-applications). The dual-data model problem is common with public engines, unless you build your application around the data accessable from within the VM. The glue-work for Java and Perl was tedious, though we were able to automate most of it.

Speaking of glue, one of the best engines I've seen was a Scheme variant, thought I've never used it in production code. The engine I reviewed had a simple mechanism for connecting data and callbacks, much simpler than any other I've seen. I suspect that the simpler interface may have lost some power though, but I never hit the ceiling in my short time with the engine.

The custom little-languages I've built had the advantage of sharing a data model with the application. One of the engines was embedded in a data-collection framework, and was distinctly event based (almost to the point of escaping a procedural definition). The language was recieved well by the people who used it, and has lasted for several years (still in use). And, the implementation was under 1k lines, including all of the glue.

The second custom language was co-developed by a coworker, and is a simple stack-based VM. The syntax is a clean XML, which allows for a simple user-interface (xml is easy to machine-generate). The script is never hand-edited (outside of unit-testing), and the users love the GUI-script builder. The engine performs well too, mainly because it's optimised for a very narrow set of tasks.

We developed one custom-engine that was a failure from the start. It was based on the concepts of a previous in-house engine, but we tried to make it generic (and for no good reason). Generic was one of those pure engineering goals that had nothing to do with what people wanted. The engine took many months to build, and was a failure on every level (it was a defacto second-system). The XML-based script was the replacement, which brought-back the focus, though as a tiny subset of the failed engine. Smaller is better!

I've learned that scripting languages are good for applications. While re-use is good, it usually results in using a broad-engine, which has implications in performance, and user grokability. It's important to know the people using the application, and what they really want (not just what is technically viable or desiarable). GUI-based languages seem moronic (from a geek-point-of-view), but can really simplify support problems related to scripting, and are self-evident for nearly any user. Custom lanugages can really benifit an application, as you can control the language and performance focus, which beats out re-use any day. And, custom languages don't have to be a large-effort, though they can easily become that if you lose focus.

Here's mine, posted 31 Dec 2003 at 16:59 UTC by sej » (Master)

comterp is a stand-alone scripting language with full-strength C expression syntax (minus the tertiary operator) and Lisp-like semantics. Plus a few extras like dotted attribute lists (property lists) and APL-like streaming (or vectorization). I've embedded it in a drawing editor, but there is a stand-alone MANIFEST if you want to strip it out of the ivtools source tree.

Extend, don't embed!, posted 1 Jan 2004 at 04:38 UTC by nelsonrn » (Master)

This is the wrong philosophy entirely! I will tell you exactly what will happen if you try to embed a lightweight programming language into your application. People will like your application and will use it. They will seek to extend it beyond the capabilities of the language. You will be put in the uncomfortable position of having to make your programming language heavyweight, or telling these users that they are out of luck (been there, done that).

What you should do instead is write your program in Python, and when you find that you need more than Python can do, write an extension to Python in C. That's what you were planning to do anyway. What that will do is 1) force you to find out what part of your program REALLY needs to be written in C, and 2) make it available to other people to do cool things.

By doing this, people will have the full extensibility of Python available to them. For example, there's a quip that every program gets extended to the point where it can send email. By extending Python, your program starts OFF able to send email. :-)

Onyx, posted 4 Jan 2004 at 09:54 UTC by trs80 » (Apprentice)

Lambda the Ultimate linked to Onyx today. It's stack based, so probably best if you're targeting ex-postscript programmers ;-). Slightly more seriously, it apparently can be configured to be as lightweight as required (you probably don't need regexes in your bootloader), and has a syntax good for data files (a feature of Lua), is threaded (via pthreads).

A couple more interpreters for perusal, posted 11 Jan 2004 at 05:15 UTC by MisterBad » (Master)

The Onyx page mentions a few more interpreters it's like. Two I thought interesting were:

  • FICL: the FORTH-inspired command language. Mmm, FORTH.
  • rep: a Lisp derivative, notable for being embedded in the Sawfish window manager.

Figured I'd post them here, for review.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page