CPAN Ratings for Perl: Possible Problems

Posted 22 Mar 2004 at 14:43 UTC by scrottie Share This

Just like this and a thousand whiny articles, users are now able to post reviews of Perl modules on CPAN. CPAN is the well known, well used repository for modules for Perl. Anyone may contribute, many contributions ultimately become part of Perl or very popular extensions. Dealing with quality and redundancy have been struggles but the open environment has let it grow into perhaps the largest library of reusable code known to man. Large code repositories have an interesting set of problems and CPAN has lessons to teach. I argue that openness is critical to success and its opposite is easy to accidentally fall prey to. This article should be interesting to anyone using high level languages or anyone interested in code reuse.

Perl's CPAN, the Comprehensive Perl Archive Network, contains tens of thousands of modules by thousands of authors. A dozen windowing toolkits, dozens of database interfaces, too many network protocols and file formats, interfaces to other programs and languages, and a gross assortment of oddities makes it a staple of serious Perl programers. It's size and usefulness spurred other languages to emulate it, but it the philosophy behind it that is resonsible for its success. Anyone may contribute a module. The powers that be elect to include these modules in indices, but all are searchable. Redundant modules are tolerated. There are often several modules that attempt to solve the same problem. Automated testing of included test suites on various platforms and naming requirements to be included in an index are the only signs of administrative input. Those and the removal of anything malicious and the manual application process to be granted a userid on the system. Documentation from modules is online, formatted nicely for viewing, and modules may take advantage of the bug tracking system.

Many intermediate programmers have gone on to become advanced programmers from the feedback and suggestions they've gotten from novices and gurus alike. Writing a module and releasing it to the world is a growth experience.

This openness is responsible for the explosive growth of the system, and the result of half-arsed attempts of intermediate programmers, not to mention no longer maintained code and inferior "me too" re-implementations litter the site. Discussions of how to cope with things done in poor style, long broken, or overly redundant keeps poping up. No single plan seems to fit. If old modules are expired, then mature, popular, stable code is thrown away. Some of the best modules haven't changed in years, even though they may have gone through years of growth and bug fixes before that. Whether something is redundant or not is subjective and can't be automatically tested. Some important popular modules are written in poor style because style has changed over the years, and, again, style is hard to quantify (in general, quality is hard to quantify).

So a system of ratings was introduced. Users can rate a module and explain why they do or don't like the module. This is a form of closedness, as alluring as it sounds. While it might be worth while to solve the problems at hand, I for one don't think it does, and I think it causes harm.

1. People write bad reviews as a way of asking for help. Complex but good modules tend to get bad reviews because people become frustrated with them. Some things are inherently complex and even a brilliant object model can't save them. People tend to voice an opinion when they have a complaint rather than a compliment - we expect things to work, and we scream when they don't.

2. Previously, module authors got their feedback privately or atleast tactfully in the form of email or bug reports on the bug tracker. This feedback helped them grow to be a better programmer. Communication was written addressing the author of the software rather than addressing the public so it read like "you might consider doing X to avoid Y problem". When phrased as an address to the public it sounds like a repremand - "Joe should do X to avoid the Y problem this module has". This is humiliating and makes CPAN authorship competitive rather than cooperative.

3. While the feedback is clearly a system of opinion, it is aggreated into a number of stars that psychologically seems authoratative. An author looking at the display for his module and seeing only one star because a single user gave it a review that happened to be a bad review is damaging. Our first attemps are always lacking and this encourages people to give up rather than try again.

There are other solutions. Make the existing discussion lists more prominate and let people off the street chime in with opinions and encourage module authors to ask for help. Make the bug tracking system handle feedback as well as bugs in a seperate category. Even making it more statisticly pure and requiring all users to vote on the quality of the module or accepting no votes would be an improvement, or a trust metric system could reduce noise associated with random people chiming in. Taking a page from Freshmeat and Sourceforge and simply reporting on vitality, number of contributors, number of open bug reports, and so forth would let people decide for themselves whether the module meets their criteria without hurt feelings or confusingly terse information.

C, Perl, Python, Ruby, Java, and numerous other languages are finding real strength in code sharing in the form of libraries, objects, and modules rather than just entire applications. Especially with server side languages and scripting it is common to bring on dependencies readily. Coping with code sharing is a relatively new frontier, one with a lot of lessons still to be learned and problems to be solved. It is part of a world where programmers cater primarily to other programmers and open source projects scale beyond what one core team can do. It is sings of a whole culture and commerce rising behind the scenes, with nitches, specialization, channels, and all that. It is cool and exciting =)


Ratings? Where?, posted 23 Mar 2004 at 05:18 UTC by forrest » (Journeyer)

Maybe I'm missing something obvious, but I just went poking around the CPAN site and found no mention of ratings.

I'm dubious about a rating system as well, although I don't see the issue about public "maybe you could do X" comments. I see that in mailing lists , and no one seems to take it badly. Of course, the suggestions are being directed to experienced programmers who presumably don't have fragile egos about their coding. Usually there are many experienced developers on the list, and "you could do X" generates more suggestions, often resulting in a better solution.

RE: Ratings? Where?, posted 23 Mar 2004 at 05:59 UTC by brondsem » (Journeyer)

The ratings are of "distributions", not "modules". So for example, for Template::Extract it's at the distribution page not the module page

bug reports, posted 23 Mar 2004 at 15:04 UTC by dan » (Master)

Taking a page from Freshmeat and Sourceforge and simply reporting on vitality, number of contributors, number of open bug reports, and so forth would let people decide for themselves whether the module meets their criteria

I think I read that in the Debian packaging system, the number of open bug reports against a package was to a first approximation proportional to the number of people using it, rather than having anything to do with the quality or bugginess of the package.

When I'm assessing suitability of a library or module for my own use, my primary recourse (if it has more than one developer, at least) is to the project mailing lists to see what impression of the development process I get. It's subjective and can't be reduced to a single metric, but works pretty well for all that.

Is this common practice?, posted 22 Sep 2004 at 03:44 UTC by MartySchrader » (Journeyer)

Do we see the same thing happening in the PHP, ASP, JSP, Javascript, etc. worlds? Are there getting to be enough "sets" of code floating around that we need some kind of rating or trust metric system to separate the men from the boys, so to speak? I thought that PHP was going to take over where Perl left off. What's happening in the PHP library world? Sorry, folks -- I can't follow all of this stuff for myself.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page