a thread on the c programming language

Posted 18 Jan 2001 at 20:36 UTC by apgarcia Share This

ModernRonin started an interesting thread on c that I would like to see continue a while longer, rather than fall into the merciless bit bucket of not-so-recent diary entries.

2001-01-17 13:30:00 ModernRonin

I recently took a job in which the primary language of the development group is C++. This has not been a terribly happy thing for me because while I can code and read C++ decently enough, I do not like the language very much. C++ has a lot of good ideas underlying it, but it's just terribly implemented. The syntax is god-awful, the rules for inheritance of code and data members are convoluted as hell, and the compiler does things behind your back that can easily cause fits for even an experienced programmer.

I much prefer C. As someone said in a discussion post on Kuro5hin said, "If you must use the wrong language for the job, I'd rather see you use C than C++. It's true that C gives you enough rope to hang yourself. But so does C++, and it also comes with a premade gallows and a book on knot tying." That said, though, C ain't perfect either...

Another thing that catalyzed my thinking about the faults of various programming languages is that I've been reading Writing Compilers and Interpreters (second ed): An Applied Approach using C++. I've been trying to come up with a simplified C-esque language I could write a compiler and/or interpreter for that would attempt to eliminate some of the more glaring flaws of C.

What are some of those flaws? Well, just off the top of my head:

  • Assignment vs. comparison. Hasn't this tripped up many a novice and even occsionally an experienced programmer? We need to differentiate between assignment and equality. Using the same or similiar symbols for both operations is a just asking for headaches.

  • Pointers. Pointers get programmers in a lot of trouble. And not just in obvious ways, either. Ever assigned one struct to another that had a string pointer inside? How long did it take you to realize what was really going on? (C++ has this problem even worse, thanks to its amazingly over-wrought object architecture.)

  • Memory managementMalloc() and free() cause a lot of trouble, even for experienced programmers. C has the honor of having its memory allocation scheme so badly designed that a whole company (Pure Software) makes a very comfortable living selling a third-party library (Purify) to help us track down our memory leaks. This is ridiculous.

  • Keystroke Efficiency On a more general note, I find that many programming language these days require the use of the shift key and special symbols more often that I'd like. My idea of a good identifier is one that you don't have to hit the shift key for. Parenthesis, curly brackets, asterisks and ampersands should be infrequently used characters, not the mainstays of the language syntax.

How do we address these weaknesses? Well, to quote the Perl programmer's motto, "There's more than one way to do it." However, here are some of my proposed solutions to the items mentioned above: (Please, pick on these and tear 'em up! I'm submitting this precisely so people will tell me what's wrong with my ideas...)

  • Assignment vs. Comparison. Assignment gets a new symbol: "<-", as in "A <- B + C". Read as: "A gets B plus C." Comparison continues to use "=". In pratice, "<-" is a real pain in the ass to type. So either a macro will need to be made or another symbol will need to be substituted. Perhaps ":", in the tradition of Pascal.

  • Pointers. Most of the problem with pointers comes from a pointer not pointing at what it should, either because it was never pointed there in the first place, or because it got re-pointed (perhaps set to NULL). So I propose that a pointer can only be set once, and is immutable once set. This will get rid of a lot of ugly crap, not the least of which is the horrible practice of casting between different pointer types. Also, before a pointer is set, it is in a "non-initialized" state, and dereferencing a non-initialized pointer will be prohibited. I believe both these conditions can be enforced at compile time. Those conditions said, however, altering the thing pointed to by a pointer is still allowed. Without this functionality, there'd be no point! (If you'll pardon the pun. ;])

  • Memory mamagement This is the 21st century, folks. We have gigahertz CPUs and hundreds or thousands of megabytes of memory. Almost no task that a modern PC's main CPU undertakes is hard realtime critical. (If someone has such a task, I recommend they not use my little theoretical language here.) Garbage collection algorithms have come a long way and no longer require unbounded time to operate. We've already tossed the bad features of pointers out the door, let's continue in the spirit of keeping what we like and having the machine deal with the ugly stuff and specify that a well-optimized, bounded-run-time garbage collector be part of the language.

  • Keystroke Efficiency The first obvious one to me is to substitute "[" and "]" for "{" and "}". Assuming an otherwise C-like syntax, this change should take all of 30 seconds for C programmers to get used to. (What we do about arrays, I don't know - maybe my little toy lanague here won't have them.) Function calls can have a new syntax, funcName:arg1,arg2, etc... This eliminates having to type parenthesis every time you want to call a function, a wrist killer if there ever was one. Strings will get Pascal-style '' delimiters, 'like so', though the backslash notation for escaping characters can stay. We'll keep ; for the end of a statement too, that's actually one I like.

Now, a quickie example of the last point...

Original C:

int main(void)
{
  printf("Hello, world!");
  exit(0);
}
  

New style whatever it is:

int main.void
[
  printf.'Hello, world!';
  exit.0;
]

Try typing these two and see which one feels quicker. I think the new function syntax is a real win, anyway.

That's all I have at the moment. So... Questions? Comments? Additions? Flaming rants about my shoe-size IQ? ;]

2001-01-17 23:45:16 bagder

ModernRonin wrote a long "explanation" to where the flaws in the language C are, in his (first) diary entry. I found them interesting but not very accurate.

pointers. By saying pointers are a design flaw of the language, you just killed the whole language C. How are you gonna be able to fiddle low-level without pointers? Set them once you say. I say you're not real.

memory management. You can't create a 'memory management model' that is anything else but function calls for C to be what it is. Sure, you could add more or different functions but the model would remain the same. Also, Purify is owned by Rational these days if I'm not mistaking. If the existance of a tool points out a flaw, then my God there aren't many non-flawed things these days... Your phrase "we have gigahertz CPUs and hundreds or thousands of megabytes of memory" also fails to recognize the importance of C, which is not as much in your everyday bloatware in a GB-memory workstation PC, but in embedded systems, real-time purposes, kernels, tiny machines etc.

keystroke efficiency sounds very much like religion. I doubt many C-programmers find the use of the shift key very annoying. A langauge has to be readable and really quick to those who're experts as well, and as C expert I don't have the slightest problem with *&[]{}->.

Yes, you have "solutions" to these problems but I have a one single solution to all of your identified problems: change language. You appearantly would be better off with something like Java, Python, Delphi or even Perl.

2001-01-18 08:02:51 mobius

ModernRonin:
While I don't agree with you on most of your points (pointers _need_ to change what they're pointing at, otherwise they are exactly the same as non-dynamic variables), I admit that keystroke efficiency is an issue. However, I don't think it's the fault of the language, but rather the fault of the keyboard layout. A programming- specific keyboard layout(think Dvorak, but optimized for {}[]:; etc) would be the obvious solution.

I've been playing with a crazy/stupid idea regarding keyboard layouts, but I won't go into it until I have some actual code. ;)

I started [re]learning Windows programming yesterday. I do the coding in a DOS port of Emacs, compile in cmd.exe, and run the executable on Win2k. It's surreal. [grin]

2001-01-18 09:51:46 graydon

ModernRonin raised the "how could C be improved" issue (without sacrificing its simplicity and minimalism, I assume), and the language geek in me can't resist commenting:

  1. hygenic macros or at least inline functions (some in c99)
  2. explicit sized integer types (in c99)
  3. namespaces & imports rather than #include
  4. pointer restrictions (some in c99)
  5. runtime-variable length arrays without resorting to alloca (in c99)
  6. a backtick operator
  7. a jump convention without resorting to setjmp/longjmp

since I'm quoted..., posted 18 Jan 2001 at 22:26 UTC by mobius » (Master)

I might as well reply. :)

Yes, the difference between = and == is a PITA. I agree that any new language should probably have a less typo-prone syntax. However, C is not the only language with problems like this. VHDL uses the <= operator to either assign a value or to represent LTE, depending on context. The syntax you proposed, <-, would need similar context checking to differentiate it from a comparison to a negative number. Similarly, replacing { } with [ ] would need further special case checking. I think some of the most intuitive yet flexible syntax is in MOO coding.

Regarding memory leaks: check out Debauch and memleak(which Debauch is based on). You don't need to buy a library when free ones exist. :)

As is obvious from my diary entry, the issue of keystroke efficiency interests me. If it is really an issue to the programmer, I think a language-specific keyboard layout would be beneficial. Perhaps move { and } to an unshifted key, put 'z' and 'x' somewhere out of the way, bring the () keys out of exile... I don't know if any of those would be worthwhile, but they might be. The tools exist for making keyboard layouts; there's no reason to write a new language to save a few keystrokes.
constructive comments: One thing that I would love to see in ANSI C is deep breaks; i.e. breing able to break out of multiple nested loops with one command. Hmm, the syntax break (n); comes to mind, where n is the number of loops to break out of. As well, yes, variable length arrays would be nice.

Keystroke efficiency is bogus, posted 18 Jan 2001 at 22:57 UTC by jbuck » (Master)

How many keystrokes to type the code in is much, much less significant than how comprehensible the code is, because any code that matters is going to be read, by humans, far more often than it is written. Arguing about == vs = is pointless; there are millions of people who know enough C to immediately know the difference, while any new syntax you invent, however clear, will have to be learned.

What is more important is a related notion of efficiency: write things once. Ideally, each design decision is reflected in only one place in the code. When this can be achieved, all kinds of good things follow; most importantly that this one place in the code can be changed if needed, revising the design decision, and the code still works. This is the very basic concept of information hiding, which is part of, but which predates, object-oriented programming. Information hiding is important enough, for any software project that you expect to evolve, even to sacrifice some efficiency for. And all software that isn't such crap that you toss it immediately will evolve.

Good C programmers can achieve this property in their code, but it's easier in C++. But some of the worst C++ code I've seen has been produced by people motivated by keystroke efficiency: they overload operators in unnatural ways so they don't have to type as much, and no one can understand their program. Both C and C++ are rich enough languages to allow for elegant or for horrible programming. A new general-purpose programming language isn't going to make people suddenly write only good code.

C, posted 18 Jan 2001 at 23:52 UTC by nymia » (Master)

Can't resist replying so here goes...

I like C and I like it a lot. But I think introducing new features on top of it is probably stretching it too far. Why? because C abstracts the machine and the instruction set and that's about all the nice features one will get from it: an escape from assembly. People who are used to code in assembly and are aware of the runtime can easily relate to C because it provides the abstraction. Like the runtime stack which maps the calling convention of procedures nicely. Parameters are pushed and popped on the local stack. Also, code blocks provide lexical scope like:

{
   int x;
   x = 1;
   {
     int x;
     x = 2;
   }
   {
     int x;
     x = 3;
   }
}
What C does is that it allocate 3 symbols on the local stack, one for the outer and two for the inner. The {} symbols mean something and they're used to define a region or code block in which symbols are defined to exist. Moreover, memory is treated as cells making C a more compelling choice for peeking and poking memory.

But we can't just use C forever. That argument I agree and I'm certainly hoping there will be many languages coming soon that will provide the things we need. We'll just have to keep on designing and developing until it arrives.

My point is let's keep C the way it is and move on and make other languages.

deep breaks, posted 19 Jan 2001 at 01:14 UTC by jlbec » (Master)

mobius: deep breaks are already in ANSI C.

for ()
{
    for ()
    {
        if ()
            goto DEEP;
    }
}
;
DEEP:
A break is merely a strictly defined goto.

Other languages to think about..., posted 19 Jan 2001 at 03:17 UTC by adubey » (Journeyer)

You have many interesting ideas here :) But Isaac Newton got it right when he said, "If I have seen further, it is because I have stood on the backs of giants".

Thing is, at Newton's time, those "giants" were heretics :) There are many interesting ideas in programming language design that people look over because they come from heretical languages.

A "nice" version of C already exists. It's called Ada :) Ada has many of the features of C, but is vastly cleaned up (using some of the suggestions you have here), but has a slightly more verbose syntax. (OK, who am I fooling? The syntax is *much* more verbose.)

But syntax isn't semantics. I wonder how well Ada would have fared if it had a more C-like syntax? It was around by '95, perhaps half of the lustre of Java would be gone...

Some other interesting languages to look at are CWEB, Haskell and Python.

CWEB and Haskell because they use an often-ignored but very useful idea called "literate programming". The key to literate programming, invented by CS great Donald Knuth, is that English is easier to read than code. By letting the programming language take this to its natural conclusion, you get much better programs.

Haskell and Python because they take your idea of keystroke efficiency to their natural conclusion: the "{" and "}" are there for the compiler, not the programmer. Usually indentation is there for the programmer (but not the compiler). As it turns out, it's just as easy for compiler writers to look at the indentation rather than the braces. More keystroke efficient, and fewer bugs as it turns out. (You can miss a brace, but it's dead obvious if you miss a tab).

Deep breaks, posted 19 Jan 2001 at 03:17 UTC by Pseudonym » (Journeyer)

mobius: The RenderMan shading language has deep breaks (and deep continues) pretty much as you have described them. For relatively small pieces of code such as shaders (they almost never get above 1000 lines or so), it works well enough, but for something like C it could be a major maintenance headache when the loop gets restructured. You really want named break/continue targets (e.g. those of Perl) so you can say "break OUTERLOOP" instead of "break 3".

the bigger picture, posted 19 Jan 2001 at 03:30 UTC by beppu » (Journeyer)

Let's change topics and consider Perl for a moment. What makes it so ridiculously popular? There are a lot of factors at work here, but what is something that Perl has, that no one else has?

CPAN! What other language can boast a repository of code as large and as diverse as cpan? What's even better is that almost all the modules can be installed with the following sequence:

perl Makefile.PL
make
make test
make install
One might say that automake+autoconf can provide something similar, but those are a bitch to set up. A typical Makefile.PL is between 6 to 10 lines of perl code. It's so refreshingly easy. So now, not only do you have a large repository of code; you also have a standardized build (and test) framework that almost every module adheres to. It gets even better -- nearly everything in cpan.org is under some kind of Free license.

Let's review. cpan.org provides:

  • a large repository of perl modules
  • that voluntarily adhere to a standardized build and test framework
  • and it's predominantly Free

Cooperation was never this easy, before. People are working together without even knowing it. It's beautiful. The sad thing is.....

No one else is doing this. No one. I'm surprised Python hasn't copied this aspect of Perl. I'm even more surprised that Ruby doesn't have something like cpan set up, yet. It would benefit them so much.

Every once in a while, a discussion pops up about how C could be improved, and the same kinds of suggestions as made here are repeated -- little syntax issues or the dangers of pointers, for example. These are certainly legitimate concerns, but I feel they miss the bigger picture.

I think that if you search a bit, you will find that the kinds of problems the article poses have already been solved in some form or another in languages that exist right now. Has it helped? Maybe a little, but a language does not live by its technical merits alone. A language must also exist within the context of a culture. If a language try's to foster a healthy culture, a community will begin to form around it. The community strengthens the language which then strengthens the community which strengthens the language ... and it doesn't stop.

Perl's lesson to the world is this: COOPERATE.

[jsb]

Still a skeptic on these old arguments. , posted 19 Jan 2001 at 05:08 UTC by dto » (Journeyer)

Assignment/comparison. The "right answer" in this one is usually taken to be Pascal (see ModernRonin's recent diary). But I haven't seen any convincing arguments as to why

:= =

is such a big improvement over

= == .

In both cases one operator is used for assignment, and one for comparison, with the two having a symbol in common and one being twice as large as the other. I think that if one fails to distinguish = and == in C, one might similarly confound Pascal's operators. One just needs to pay attention, if not during coding then during compilation when the machine will warn you if it sees something suspect.

But focusing on the characters is barking up the wrong tree. My opinion, based in part on experience, is that mistakes are usually not related to our operators looking too similar. It is because assignment and equality are closely related concepts: after an assignment, the corresponding equality holds true. One operator makes it true, the other tests that truth. I think it would be possible to confuse them on occasion even if they were as different as # and ^^.

Namespaces. This is probably not a bad idea. However, I disagree with ModernRonin again in that there would have to be some kind of escape hatch or "import" mechanism, so that you could refer to something in another file/package without qualifying the name. Otherwise you get silly redundancies like math.cosine(), which would also cancel out that Keystroke Efficiency goal. :-)

Moreover, a module should be able to import names because its task may have a certain focus. A geometry module will likely be doing a lot of math calls, and I think it's better to just specify this with an import command than to repeat some prefix again and again.

Pointers. The suggestion is to turn pointers into the equivalent of C++ "references" where you must initialize them, and cannot assign them thereafter. (I will for now ignore that this can make the distinction between assigning a reference and copying an object very vague, an issue that was brought up in the example about assigning one struct to another where one of them owned storage.)

This will prevent wild pointers, but it will also make prevent many valid uses of them. There are many instances in which you will want to reassign the referent of a pointer/reference variable. The implementation of almost any data structure would qualify as such a situation, unless you are crazy about implementing linked lists the way they used to do them in Fortran with parallel arrays.

This is less important for languages with built-in data structures, of course. And if you're designing a language such as this, then leaving out pointers is no problem as long as you provide library or language support for the things people would otherwise be using them for.

Garbage Collection. I wish this argument were cast like this a bit less often. There are situations in which the disadvantages of manual memory management are acceptable and those of GC are not, and there are situations in which the disadvantages of GC are acceptable and those of manual management are not. I do not see how progress demands one or the other. Both are useful at times.

Not really, posted 19 Jan 2001 at 06:57 UTC by ali » (Apprentice)

A few quick notes, which of course not say something about your shoe-size IQ, unless you have really large feet:

Memory management

A GC *may* sometimes be a nice thing, but it isn't suitable for everything. Not only that it slows down applications, it brings some other problem: You are intruducing the GC to help beginners, which is the same what sun did with java. The result simply is that beginners have no feeling about memory size and do new's like hell in their programs.

Keystroke Efficiency

On a german keyboard, []{} are on Right-Alt-7 to Right-Alt-0. Swapping those doesn't give me anything, it's still a finger-bone-breaking issue. :)

Your language proposal isn't even a usable language. Consider these overloaded functions...

   int func.int,int;
  
   int func.int;

... and this statement ...

  a = func.func.3,4;

... now which function is called for the inner and which for the outer? Both "func( func(3), 4 )" and "func( func(3,4) )" could it be.

Re: Not Really, posted 19 Jan 2001 at 09:27 UTC by jaz » (Journeyer)

ali: Your take on garbage collection indicates a lack of familiarity. GC is neither intrinsically slow (often it is, in fact, faster than manual memory management), nor (and this is the important point) is it something "for beginners." GC is most useful for large, complex pieces of software-- you know, the kind where no one ever gets the memory management issues right when they do it manually. GC has, unfortunately, gotten a bad reputation because of a lot of "toy" implementations. But it proved its value a long time ago, in the old LISP systems. Of course, not every type of program should use GC. But many should-- many that don't.

Re: Re: Not Really, posted 19 Jan 2001 at 10:00 UTC by ali » (Apprentice)

jaz: It may, of course, be that I've only seen such "toy GC's" so far, but my impression is that removing the need for manual deallocation immediatly leads to bloated programs which are both slow and memory-suckers. (Lisp, for example, is something I never touched).

I also agree that there are languages which couldn't exist without a GC, but in this thread we were talking about C (or however the proposed language dialect would be called)

To be honest, I have never seen a program where manual memory managment was nearly impossible to implement (Including a bunch of really complex optimization programs). In all cases, it turned out that some kind of "ownership" mentality did it all right, that is, every allocated memory structure is owned by some other structure, function, or module, and is deallocated whenever the owner's lifetime is over (function exit, structure deallocation, or module's end). So far, no program I ever wrote or saw broke this idea.

But, still, I'd gladly bow for your greater wisdom. Perhaps my horizon is just too close. Who knows?

Sounds like the goals of Java, posted 19 Jan 2001 at 12:31 UTC by ztf » (Apprentice)

ModernRonin is right, C and C++ have some features that are powerful yet dangerous and tough to use right. It's probably best to think of C as "portable assembly language." I say this as a longtime C hacker. :^)

But it sounds to me like this "improved C/C++" is really Java, minus the kitchen-sink class libraries. Because fundamentally, Java is just pointer-safe C/C++ with garbage collection and some cleaned-up syntax. Oh, and a specified abstract virtual machine to run in. [donning asbestos suit now ...]

My opinion: C is a great language as-is for building operating systems, embedded systems, and libraries and components. And, if you know C well, you will be OK using it to attack other problems as well.

But other problem spaces are also served quite well by "safer" languages, and we have those in abundance: Perl, Python, Tcl, Scheme/Guile, etc. Oh, and there's this Java thing that fits in somewhere as well.

I guess I'm not enough of a computer language nerd to want to go tweaking C. Like many other things, it's very easy to write a bad computer language. If C/C++ is not a good fit, pick a language that fits the problem better.

Taken to the next level... you have Java., posted 19 Jan 2001 at 12:53 UTC by burtonator » (Master)

Regardless of Political and Free Software/Open Source arguments. Java is awesome. I that would suggest everyone with an interest in language design read "The Java Language Specification" by James Gosling, Bill Joy and Guy Steele. They did an awesome job.

The point is that SUNs implementation of all this has been pathetic. Java is Closed/Proprietary and they won't even send it to a standards committee! That said the GNU community will have an awesome Java compiler (GNU Java Compiler) in GCC 3.0. This should be largely JDK 1.1.8 compliant and ready for prime time.

The issues you bring up WRT Java and C++ are something that I haven't had to think about for a number of years now :).

....

Minor nitpicks like these won't result in any clearer code, posted 19 Jan 2001 at 15:32 UTC by Ricdude » (Journeyer)

Assignment/comparison confusion is rampant in C and C++. However, gcc (and I suspect other compilers) can catch this condition, and at least issue a warning to you, so you know to take a closer look at your code. Substituting ":" for "=" will just hurt the human parsers. You can recude your foul rate in this area by taking the habit of putting a constant for comparison first, e.g. "if ( 0 = x ) { ... }". The compiler won't let you get away with that assignment. Granted, this doesn't work for variables, but healthy usage of assertions will point these problems out early on.

Pointers are extremely confusing in C and C++. Which is why I believe that everyone should code a generic doubly linked list implementation, just to make sure they do understand the concept. The majority of my programming errors have been due to missing my target by one level of dereferencing. This is just a hazard of the C/C++ programming language, and is learned by experience. C++ also offers references as a language feature, but if you don't understand pointers, you're not likely to understand when references are better suited to your task.

Memory management is the reigning champion of C/C++ programming mistakes. I personally believe that most of this is due to a woefully inadequate string implementation in C, that forces the programmer to check everything twice. C++ at least gives you a reasonable string class so you don't have to micromanage the memory allocations for each string operation. There are also garbage collectors for C and C++, as well as free memory integrity checking libraries (dmalloc, electricfence, etc.).

Keystroke efficiency is a red herring. If you are spending most of your time at the keyboard, you're not spending enough time doing design work. Not to mention that you'll spend ten times the amount of time trying to figure out why a piece of code isn't doing what you want it to. A decent choice of symbols makes it much easier for the human parser to do its thing properly. Code a few hours a day for a year, and whether or not you have to hit the shift key won't make a big difference when reaching for those funny symbols.

All in all, it sounds like you might want to try taking Java, Python, or Tcl for a spin. If you take the time to learn what C++ has to offer, you will realize that the biggest flaws in the language are there for compatibility with existing C code. If those are your biggest complaints about the language, you'll probably be happier programming in one of those other languages.

Try this with garbage collection, posted 19 Jan 2001 at 17:21 UTC by hanwen » (Journeyer)

I have never seen a program where manual memory managment was nearly impossible to implement

Unfortunately, that says more about your experience than about garbage collection in general.

Here is something that is useful in C, but very hard to do without proper GC: linked lists with shared tails. For example,

  node* prepend (node *p)
  {
   node* n = make_node ();
   n->next = p;


return n; }

...

verylonglist = ... l1 = prepend (verylonglist); l2 = prepend (verylonglist);

l1 and l2 share their tail, and the tail may only be freed if both l1 and l2 go out of scope. This is hard to tell when that happens, especially if the lifetimes of l1 and l2 are different.

Constructions like these abound in functional languages (where values are generally read-only, and consing onto list tails is a very natural thing to do), but they do have an application in C: sharing tails can save a lot of memory in the right circumstances.

Anyways, if you think GC is Bad Idea, I urge you to read The GC FAQ and Paul R. Wilsons survey of uniprocessor GC techniques (which debunks many common myths about GC)

A shoddy workman blames his tools, posted 20 Jan 2001 at 06:49 UTC by mfleming » (Master)

...and that's pretty much all I have to say :)...

I shouldn't be so blunt, but I've worked with so many languages and so many systems. Each one of them have had their fatal flaws. Except in rare cases, each abstraction system has difficulties representing every practical task. The key is to let your mind discover the strengths and weaknesses of the system you are using, and see how to adapt those to the task at hand.

coelacanth, posted 20 Jan 2001 at 22:42 UTC by apgarcia » (Journeyer)

two of modernronin's points are related to a more general idea, one that the butt-ugly fish book discusses:

One problem is that C is so terse. Just adding, changing, or omitting a single character often gives you a program that is still valid but does something entirely different. Worse than that, many symbols are "overloaded" -- given different meanings when used in different contexts. Even some keywords are overloaded with several meanings, which is the main reason that C scope rules are not intuitively clear to programmers. Table 2-1 shows how similar C symbols have multiple different meanings.

Table 2-1, Symbol Overloading in C

Symbol     Meaning

static Inside a function, retains its value between calls At the function level, visible only in this file

extern Applied to a funciton definition, has global scope (and is redundant) Applied to a variable, defined elsewhere

void As the return type of a function, doesn't return a value In a pointer declaration, the type of a generic pointer In a parameter list, takes no parameters

* The multiplication operator Applied to a pointer, indirection In a declaration, a pointer

& Bitwise AND operator Address-of operator

= Assignment operator

= Assignment operator

== Comparison operator

<= Less-than or equal-to operator

<<= Compound shift-left assignment operator

< Less-than operator << Left delimiter in #include directive

( ) Enclose formal parameters in a function definition Make a function call Provide expression precedence Convert (cast) a value to a different type Define a macro with arguments Make a macro call with arguments Enclose the operand of the sizeof operator when it is a typename

[ ... ]

The more work you make one symbol do, the harder it is for the compiler to detect anomalies in your use of it. It's not just the kind of people who sing along with the Tiki birds at Disneyland who have trouble here. C does seem to be a little further out on the ragged edge of token ambiguity than most other languages.

-Peter van der Linden, Expert C Programming

With the abundance of symbols we can have in our systems today, there is no good reason why some of this should not be remedied.

What is C?, posted 21 Jan 2001 at 08:06 UTC by moshez » (Master)

C is a language, which tries to be something very specific -- portable assembler. It is quite good at that. I will even go further, and say it is wonderful at that. Not that there are no competitors (FORTH comes to mind), but it pretty much dominates the market.

If you want to use a language which gives you better mechanisms, do not program directly to the CPU. Use Java, use Common Lisp or (my personal favourite) use Python.

Of course, the big advantage of Python is that it is close enough to C to make extending in C a very easy thing so you can do what all of humanity has been doing for ages: OPTIMIZE IN ASSEMBLER, code in high level languages.

A bad workman blames his tools, posted 21 Jan 2001 at 17:56 UTC by dan » (Master)

mfleming writes "A bad workman blames his tools"

A good workman invests in and takes responsibility for his tools. If all you have is a cigarette lighter it doesn't matter how good you are at plumbing; you still shouldn't be soldering pipes for central heating systems with it.

just so we don't get caught up in this......, posted 22 Jan 2001 at 17:41 UTC by dto » (Journeyer)

This is a response to apgarcia and his "coelacanth" bit above, about C's set of operators.

I don't agree with the author of your quote. Normally C gets criticized for its set of operator symbols being too large; it's even weirder to think we might be better off using Unicode operators, so that there will be no ambiguity about symbols.

C does use the same ASCII characters for more than one purpose. But I do not think we automatically need more symbols just because we have encodings for them. (I will ignore the obvious issue that keyboards don't have easy ways to type Unicode characters.)

Where he may be right are static, extern, and the & operator. Another symbol could have been used for taking the address of an object that did not lend itself to easy confusion with logical AND (which does deserve the ampersand in the absence of a key for the real mathematical symbol.)

Dollar sign would have been fine. I don't know what programmer or compiler will really confuse this one binary operator with the other unary one, but I will grant the point that it could be more clear.

We should not allow these irregularities to mask the fact that for the most part, C's set of operators and keywords has a strong internal logic and regularity. (If you want to discuss the precedence and associativity table that is another thing.)

The bit about the less-than and greater-than signs is rather wacky. Programmers don't confuse preprocessor directives with logic expressions. The compiler never sees #include lines so it is not working harder there either.

Compound operators. The point of having these is that they are regular in structure and can be built without having to memorize all of them. You will occasionally run into old compilers that do them backwards, but in any case the similarity of the compound assignment operators to their counterparts is an aid to clarity and not a hindrance.

The discussion of the void keyword seems to ignore the logic behind C's declaration syntax, and its correspondence with that of expressions. Declaration mirrors use, and understanding this shows you why C's uses of the void keyword are completely consistent.

  • Void means "nothing." As a return type this means the function returns "nothing."
  • As a parameter list it means that the function accepts "nothing" as a parameter.
  • A declaration like void *p; makes total sense if you realize that declaration is supposed to mimic expressions. What is the type of (*p)? You can't dereference generic pointers, so the answer is "nothing". The right half is "of the type" left-half.

I work as a CS tutor at university, and this reasoning always helps them figure out one of C's most sticky syntax points: function pointers. Take int *(*fp)(void);. What is the type of the expression *(*fp)();? This calls the function and indirects the pointer returned by that function. Yielding int, the left- half of the declaration. ANSI C provides a bit of syntactic sugar with the usage of function pointers (you can just do fp()) but otherwise it is completely consistent.

To sum up... the quoted text shows that C makes many symbols do double-duty, but fails to establish its being a problem in most of those cases. IMO it's better for many things to have an internal logic whose grokking has benefits than for everything to be intuitively obvious but irregular.

matter of judgement, posted 22 Jan 2001 at 17:58 UTC by apgarcia » (Journeyer)

well, i didn't really feel like typing in much more than i did. in the end, though, it's a matter of judgement, and you're perfectly entitled to that assessment.

nice example, posted 22 Jan 2001 at 18:31 UTC by apgarcia » (Journeyer)

this section, called "the $20 million bug", is from the introduction of the same book (i.e., not in the same part of the book as the other excerpt i posted).

In Spring 1993, in the Operating System development group at SunSoft, we had a "priority one" bug report come in describing a problem in the asynchronous I/O library. The bug was holding up the sale of $20 million worth of hardware to a customer who specifically needed the library functionality, so we were extremely motivated to find it. After some intensive debugging sessions, the problem was finally traced to a statement that read:

x==2;

It was a typo for what was intended to be an assignment statement. The programmer's finger had bounced on the "equals" key, accidentally pressing it twice instead of once. The statement as written compared x to 2, generated true or false, and discarded the result.

C is enough of an expression language that the compiler did not complain about a statement which evaluated an expression, had no side-effects, and simply threw away the result. We didn't know whether to bless our good fortune at locating the problem, or cry with frustration at such a common typing error causing such an expensive problem. Some versions of the lint program would have detected this problem, but it's all too easy to avoid the automatic use of this essential tool.

declarations, posted 22 Jan 2001 at 18:59 UTC by apgarcia » (Journeyer)

i think i've pretty much exhausted my quota of fair use. after this, if you're interested in hearing what van der linden has to say, please read the book.

The idea that a declaration should look like a use seems to be original with C, and it hasn't been adopted by any other languages. Then again, it may be that declaration looks like use was not quite the splendid idea that it seemed at the time. What's so great about two different things being made to look the same? The folks from Bell Laboratories acknowledge the criticism, but defend this decision to the death even today. A better idea would have been to declare a pointer as

int &p;

which at least suggests that p is the address of an integer. This syntax has now been claimed by C++ to indicate a call by reference parameter.

The biggest problem is that you can no longer read a declaration from left to right, as people find most natural. The situation got worse with the introduction of the volatile and const keywords with ANSI C; since these keywords appear only in a declaration (not in a use), there are now fewer cases in which the use of a variable mimics its declaration. Anything that is styled like a declaration but doesn't have an identifier (such as a formal parameter declaration or a cast) looks funny. If you want to cast something to the type of pointer-to-array, you have to express the cast as:

char (*j)[20]; /* j is a pointer to an array of 20 char */
j = (char (*)[20]) malloc( 20 );

If you leave out the apparently redundant parentheses around the asterisk, it becomes invalid.

A declaration involving a pointer and a const has several possible orderings:

const int * grape;
int const * grape;
int * const grape_jelly;

The last of these cases makes the pointer read-only, whereas the other two make the object that it points at read-only; and of course, both the object and what it points at might be constant. Either of the following equivalent declarations will accomplish this:

const int * const grape_jam;
int const * const grape_jam;

The ANSI standard implicitly acknowledges other problems when it mentions that the typedef specifier is called a "storage-class specifier" for syntactic convenience only. [...]

You probably want FullPliant instead, posted 23 Jan 2001 at 15:57 UTC by Malkuse » (Apprentice)

Considering the problems discussed, I'd recommend taking a look at Pliant (http://helio.pliant.cx/). It solves the pointer/referense trouble by having explicit reference types (in C, pointers are not actually data types but rather just crude memory segment locators). These pointers also solves the memory management trouble like so: reference counting. When something gets pointed to, its reference count goes up. When a pointer goes out of scope (or is explicitly changed to point elsewhere or to NULL), the reference count of the something goes down. When count goes zero the something is deallocated by the runtime system. Anything pointed to by (members of) the deallocated something will have their reference counters decreased as well.

Reference counting is much faster and much more precise than GC. This is so since the something can be deallocated at once when no one points to it anymore. GC is inherently bad at this and if your something is a thread, then it could still get a lot of runtime and produce endless nonsense before the GC gets to run and kill it. Reference counting is also faster than GC since the runtime system does not have to search for stuff to deallocate; The reference counter knows which memory segment it is coupled with.

Pliant also have a very clean syntax with quite few meta-shift-bouble-bucky charcters.

In fact, if I could make one single change to the C ANSI standard it would be to bring over the reference/pointer system from Pliant. The */& shit would stay but could then be reserved for strange hardware driver stuff where it may actually make sense.

harebra / klasa

Reference Counting, posted 24 Jan 2001 at 08:20 UTC by ali » (Apprentice)

Reference counting is rarely used since it has a major flaw. Consider this:

struct thing
{
  struct thing * anotherthing;
  char muchmemory[much];
}

... function(...) { struct thing th;

th.anotherthing = &th; }

After function() returned, it's thing has a reference count of 1, since the pointer in the thing still points to itself. That will never be fixed (since the application lost access to the structure), and the structure will exist forever.

I guess you're opting for a bus error, posted 24 Jan 2001 at 19:17 UTC by Malkuse » (Apprentice)

Hi ali

In this case it seems that use of your referense -- after the function returns -- should generate a bus error or a segmentation fault. When allocating something so that it becomes local to a function in this way, the memory should be reserved on the stack by the language's runtime system. When th goes out of scope (on return), the memory stack should roll back and deallocate the struct. Returning a pointer to stack allocated material like this should generate a compile time error in any sane compiler (with the possible exception of C maybe).

Even if you had allocated th on the heap you should be ok in case of Pliant, since it has some means of detecting circular referenses. Check the docs.

harebra / klasa

Garbage collection, posted 26 Jan 2001 at 02:03 UTC by Omnifarious » (Journeyer)

I wanted to post a couple of points about C and C++ not

having garbage collection:

  • It's the programmer's responsibility to use an

    appropriate resource control mechanism.

  • Garbage collection is only the 'magic' solution for

    memory. What about other resources? Is the runtime system

    smart enough to run the GC when you're out of filehandles?

    How about when you're out of foos?

  • Do you really want garbage collection to be an integral

    feature of a language you used to right an OS?

In my opinion, a language that gives you choice and

flexibility in resource control techniques is much better

and a big win over a language that doesn't.

In my StreamModule system, I use reference counting for a

data structure (the structure representing a sequence of

bytes) that can only be a DAG by the rules under which it is

built. This is exactly the appropriate technique. The

reference counting overhead is incurred in a very

predictable and controllable manner.

I even have a small infrastructure (smart pointer classes

and a mixin class that has the counter) built up to make it

easy for a certain data structure to participate in

reference counting where appropriate. In fact, in the

combining list tails problem that someone mentions is a

perfect candidate for using this infrastructure.

If ever I wanted cycle-resistant mark & sweep style

garbage collection, I could find it or implement it and use

it for exactly those data structures it was appropriate for.

With a little discipline in how you use pointers and a

little programmer prodded compiler assistance, you can

operate a non-conservative mark & sweep collector in an

environment that doesn't enforce one from the top down.

In Python, Perl, and other such 'scripting' languages,

garbage collection is the right answer. In Java, it's a

descision I strongly question. In C, C++, Ada, FORTRAN, and

other such 'systems' languages, it's the wrong answer.

Oops, posted 26 Jan 2001 at 02:06 UTC by Omnifarious » (Journeyer)

I cut and pasted from a 'view source' window, and it stuck in extra <p> tags at all the line breaks.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page