A Metric for Computer Language Viability

Posted 10 Oct 2002 at 00:18 UTC by itamar Share This

Both Sourceforge and Freshmeat track software, including which programming language said software was implemented in. Freshmeat tracks released software, while Sourceforge contains many projects that have never been released, or in some cases don't have any code at all. This difference should allow us to see how successful a programming language is in moving a project from an idea (defined as "SF project") into a real, functioning program (defined as "program posted to Freshmeat"), and what percentage of projects have died along the way.

The Language Mortality Ratio for a given language will be defined as the ratio between the number of projects on Sourceforge and the number of projects on Freshmeat for the given language. The lower the number, the better.

Some sample ratios based on Freshmeat list and SF list:

  • C: 1.93
  • PHP: 3.49
  • Python: 2.35
  • Perl: 1.71
  • C#: 23
  • Modula: 0.25
  • Logo: 50/0 (infinite, undefined?)

Some obvious conclusions can be drawn from this list. First of all, C# is far from being a viable solution for the needs of the enterprise. Secondly, the language with the lowest (and thus the best) Language Mortality Ration, Modula, is by far the most successful language in terms of project success (more real projects than projects in progress!) . The next time you consider which language to use for a new project, Modula should be a serious contender.


Many thanks to all the people on #python, posted 10 Oct 2002 at 00:28 UTC by itamar » (Master)

Many thanks to #python people for helping with suggestions, a name for the ratio, and doing some of the calculations. I couldn't have done it without you!

Tcl does pretty well too, posted 10 Oct 2002 at 00:36 UTC by davidw » (Master)

Looks like lack of publicity helps, in some cases:-)

573 / 301 = 1.90

Hard to know what's being measured exactly, posted 10 Oct 2002 at 00:45 UTC by walken » (Master)

I mean, I know how you come up with the number, but is a property of the language itself or only of the user community ?

Not enough, posted 10 Oct 2002 at 03:07 UTC by djm » (Master)

First of all, C# is far from being a viable solution for the needs of the enterprise

I don't think it is possible to infer this conclusion from this investigation:

1. C# developers are unlikely to use SourceForge or Freshmeat. This skews the results.

2. A casual glance at freshmeat's front page seems to indicate an abundance of "PHPmyMP3play" and "myPERLripeer" type scripts. This skews the results and lessens their relevance for serious projects/applications.

3. SourceForge is jovially referred to as SourceForget for a reason. This further skews all the results to the negative.

For the record, posted 10 Oct 2002 at 03:51 UTC by itamar » (Master)

This was very much tongue-in-cheek (although the numbers are accurate). If the enterprise bit wasn't enough to tip you off, the Modula bit should have - there are only 4 Modula projects on Freshmeat...

No conclusion can be drawn, posted 10 Oct 2002 at 13:56 UTC by pfremy » (Journeyer)

I don't think any conclusion can be drawn from your study, because:

- some projects are on sourceforge and not on freshmeat, and the opposite.

- some projects have not release yet but are on sourceforge. The low number of C# project on freshmeat simply means that, and not that the language is dead

- some projects are indeed dead but they show up both on freshmeat and sourceforge.

While a study like yours would be interesting, you need far more figures to support it. For example, you need to take the time into account.

What would interest me is: how many project on sourceforge reach the stable state ? How many are dead ?

freshmeat, posted 15 Oct 2002 at 02:51 UTC by Liedra » (Journeyer)

Speaking as an editor at freshmeat, pfremy, we don't often approve projects that use C# unless they are actually known to work under Linux implementations (as we're a Unix/PalmOS software site, not a Windows one). Sourceforge hosts Windows projects too, so that could skew things a little :-) Perhaps if a more serious look at something like this were made, this should be taken into account.

A better metric, posted 15 Oct 2002 at 05:35 UTC by Mysidia » (Journeyer)

Well, the number total sourceforge projects to total freshmeat projects per language isn't useful, since being on freshmeat is not a necessary condition for being an active sourceforge project. Moreover, since the metric is total-based only, it includes projects that are on freshmeat but not sourceforge.

Use just one of them and apply a weighted average:

let p = the set of the sourceforge projects in the given language that have existed for at least 6 months

let s = number projects in p with source code downloadable from their project page, either as file release or cvs entries.

let r = the number of projects in s with at least one file release

let N = the number projects in r with a release in the past 6 months

Q=|q|, S=|s|, R=|r|

Scaled metric value = 10 × (P + 2×S + 3×R + 4×N) / (9×P)

The result is a number on a scale of 1-10, so for example, if you have: 5000 projects (P, the size of p = 5000), 2000 have some source code available, 1000 have made releases, and 500 have made a release in 6 months, then, you have:

M = 10 × (5000 + 2×2000 + 3×1000 + 4×500)/(9×5000) = 3.11

And then you can throw in the complexity of having different time intervals, and averaging in other subsets like "number projects either younger than 6 months or in phases beyond I planning", but that's not necessary to be more useful than the metric of sf to freshmeat projects.

Once you've got the metric all figured out; however, finding a way to collect the information you need could be a problem; you clearly need to be able to collect more information than a comparison of totals to decide if a particular project is dead or not (you can really only decide that it's not dead and assume that what you don't determine to be alive is dead).

You can search for projects by language, but there's probably no option provided by the sourceforge system for "include only projects with file releases or cvs entries of XX size or greater in results" and "include only projects with file releases XX size or greater in results", so this would seem to need some kind of automaton, but it is quite possible there's a simpler way that I haven't thought of.

Avoiding /0, posted 4 Nov 2002 at 14:54 UTC by realblades » (Journeyer)

To have a metric that lasts and to avoid dividing by zero. I would rather use a value between 0 and 1 that is the probability of a project written in certain language living or dying based on statistical "evidence".

Something like (I'll say it in scheme):

(define (lang_viability tries successes)
            (if (= tries 0)
              0
              (* (/ 1 tries) successes)))

I believe that returns the correct value. To put it roughly into text:

 
P(s,t) = s ( 1 / t )
Where s is successful or non-dead projects and t is projects announced.

The following should also be true:

s <= t 
0 <  t
0 <= s

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page