We have often been asked why TINA has not been re-written in C++. Our answer to this is that we have simply never seen the need, but in order to understand why we take this apparently aloof attitude it is necessary to understand the common criticisms of the C language. In order to cover what are considered to be all of the relevant points I have taken a popular book (Deep C Secrets) written at the time of the emergence of C++ and summarise the main criticisms of C while giving my own particular view of the language from the point of view of research programming. I have refined these comments in discussion with others in our group (Tony Lacey (AJL) and Patrick Courtney (PC). This way I hope to give a balanced review of the language and it's use which covers the majority of the criticisms. The final conclusions stand as a fair reflection of our view on the potential costs and benefits of continuing to use C for research programming. We hope that this is enough to convince the sceptical that we are not just continuing to use C out of laziness or ignorance, we really don't think that C++ is any better for what we require. Deep C Secrets, (The C++ Tsunami). NAT 2/8/2000 Deep C Secrets starts with several chapters which make a strong criticism on the complex syntax of the C programming language. As reviews go this is rather thorough, probably as the author was a member of the team which wrote the SUN C compiler and is therefore very familiar with all (and I mean ALL) of the intricacies of the compiler. However, reading the book I got the impression that many of the criticisms were slightly unbalanced. Arguments were made regarding the authors opinions of ways that the current syntax was inadequate, but little was said to explain why it had been done this way in the first place. Though as the author says himself at one point C was defined over a period of time and benefited from real practical experience. Precedences. ----------- At one stage, some criticism is made of the apparent illogical ordering of parsing precedences (p 45). The author provides a list of what he thinks are the worst offenders and on the whole I would have to agree. But, and this is a big BUT, what is not said is something like the following: Precedences are defined in the C language so that there are default options for interpretation of code where the user programmer has not been explicit. Most of these precedences are obvious, some are not so obvious and some are completely subjective. There will always be some subjective definitions because such priorities are often needed to broker non-commensurate data (IE: apples and oranges). The illogicalities pointed out here are the more subjective ones. Clearly, if the programmer is not explicit regarding the syntax then the compiler will have to arbitrate. However, just because the compiler can do this does not mean you should rely on it. Good programmers invariably simplify the obscuration of the syntax using brackets for their own benefit, rapid coding and future code maintenance (p 47). Only bad programmers, (and often computer scientists), consider it clever and appropriate to write lines of code with the minimum number of characters and so produce unreadable code. (This could be regarded as a good point if it is used as a mechanism to weed out hackers from serious programmers.) Once you start including prioritisation in a language you might as well finish, and it would probably have been considered shoddy by any standards body (ANSI ISO) not to. The alternative, (IE: that there are no precedences and all statements must be explicit) would probably be considered too pedantic for most programmers. Finally, you can always use lint and style checkers which will tell you not to do such things. You should certainly debug software as you write it to confirm that the software does what it's supposed to and not rely on naive expectations of the definition of syntax. With typical daily programming productivity estimated at 10 lines of debugged code for the average programmer (in any language) you have plenty of time for this. While we are on the subject, C is criticised for not having lint as an integral part of the compiler (p 59). Once again this is a strange criticism. At the time that C was being specified both cc and lint probably took a reasonable lengthy time to run, and lint's report syntax was obscure for a long time and cc's error reporting was more understandable. Thus it was natural for the two to be kept separate. I don't believe that this had anything to do with programmers assuming they would always be right as stated in the book. More recently you could argue that compilers are now sufficiently fast that there is time for lint. However, many code editors now include lint-like syntax checking along with colour highlighting. If anything, the argument for not including lint is probably swinging back the other way. I accept that the book is a few years old and may not have witnessed this trend. Overloading ----------- There is also a relatively large section on symbol overloading (p43) (eg: various uses of const, >> and >, & and && etc.). Again this is complete and accurate and one can only guess at why this was done in the first place, or even why it persisted into ANSI C. Particularly as it is so easy to fix if you need to; eg: /* Demystify.h or the syntax of C+ by NAT */ #define readonly const /* eliminates confusion with read only data */ #define intern static /* eliminate confusion with static function */ #define nopar void /* eliminate empty parameter list ambiguity */ However, such practises are ruled out in the first chapter of this book as bad programming style. Funny, that wouldn't have been my first choice for attack! If an agreed standard could be reached for such fixes, code could be updated and maintained in a completely backward compatible fashion. Code would be more readable, and that's what we want isn't it? It's fun to consider how many of these might be necessary and how far they could go to correctly interlock with existing syntax. Of course the more of these you define the more reserved constructs you must have in the language. So if its so easy, why hasn't it been done? For one thing, once you start putting ideas down on paper it is surprisingly difficult to improve on the existing system. Saying that this is the wrong starting point is not a valid response, as all languages will require logical constructs. Another answer might just be that it simply has not been considered an important criticism. And here's why, here's one of the examples of poor syntax given in the text (p53) p+++q is this p+(++q) or (p++)+q ? Answer: wrong question! Why wasn't it p+(++q) or (p++)+q ? Or... where did this computer scientist do his degree? Once again the poor language syntax is illustrated by poor programming style. (Anyone who programmes this way really deserves what they get (PC)). Blaming the language at this point is the same as criticising English for the work of poor poets (or Bernard Manning). Having criticised overloading of characters the author also complains about a case where it wasn't done (p 23) NUL and NULL. You just can't please some people, but if we must; #define END NUL /* is this ever going to be used? */ #define inca(x) (x++) /* once again there could be several of these */ #define incb(x) (++x) But is `inca(p)+q' really any better than `(p++)+q' ? Though a large source of errors in programming occurs in `if' statements, so if we are to remove overloading anywhere, perhaps it should be in the logical tests; #define GT > #define GE >= #define AND && The real problem here is that books like this, which criticise the language, rather than just explaining simple programming methodology, effectively legitimise poor code. This gives computer scientists the excuse they need to continue defending the search for a language which will transform a Determined moron into an expert programmer. While more can be achieved with just a small amount of effort put into learning safe programming methods. C is simple to learn and flexible to use, that is the power of the language. Once you have appreciated that this is useful you cannot honestly criticise the same flexibility when it results in poor code in the hands of the careless or bloody minded. Furthermore, I simply don't believe that it is possible to specify a language which cannot be mis-used by the Determined. After basic syntax, follows: indentation, function and variable naming, file organisation, functional specification, library design, programming paradigm (eg: Object Orientation, recursion) and debugging, all still in the hands of that Determined programmer who you can't trust with the syntax and equates complexity to genius. Encapsulation ------------- Another issue which is mentioned in several places in the book is the issue of encapsulation. Clearly, with hindsight, it was a mistake to make all functions have global scope by default. But I get the feeling that programmes were a lot shorter when C was first developed. Once again however there is a mechanism ``static'' (``intern'' in my C+) which can be used to isolate functions within a file and not to do so is just poor programming. One might hope that the advent of function prototypes in ANSI C is a step towards explicit confinement in future versions of the language. The current assumption that implicit global functions return an `int' is clearly only exploitable in a rather limited number of scenarios and most good programmers take the time to provide all prototypes. The move over to encapsulation could go quite smoothly. A further point is made that all functions once made global are available everywhere. I can see the point, but as I understand it, this is not strictly true, as these functions will only be available if the function prototype is present in a file. A common mechanism to tie prototype files to source file may one day fix remaining problems in the linker. The system could work like this: For each C file `my_prog.c' the linker looks for a header file with the name name `my_prog.h'. If it finds one it checks the contents for function prototypes and compares them with the functions in the C file. Then for each program which includes the header file `prototypes.h' the linker looks for the C file `prototypes.c'. If it finds one it searches this file first for functions with the prototypes specified. The first comparison checks that functions are defined the same way that they will be used and the second provides encapsulation at the file level. Both of these are completely consistent with existing use of the language and would provide enough encapsulation to keep even a PASCAL programmer happy. The first can be achieved by including the header file directly at the top of the source C programme, though this requires the rather redundant inclusion of `my_prog.h' at the top of every `my_prog.c'. In the mean time user programmers quite rightly assess the simplicity of any language usage by the number of files they must modify in order to change a function. ( Prototypes are odd and you need to understand why you want them. Do you want them to prevent the programmer from misusing a function, alternatively they may just be used to exemplify the function call. Either way, the requirement of a separate file away from the original definition is an overhead for the programmer. (AJL)). In the move from K and R to ANSI we went from the minimum of 2 file modifications to 3. Until prototypes are used fully it would have to be said that this was a backwards step. Returning to the book, the comment is also made that in other languages (such as ADA) functions can be defined within functions, and this provides better encapsulation. This I think, is just sloppy thinking, if it is within a function it is also within the file, and if it is within the file it could be declared ``static'' in C. However, this is not the first time that I have come across this argument, so perhaps I have missed something. Parameter sharing is a useful possibility, but this is not necessary for encapsulation and even limits code reuse. Finally, if you really really REALLY want true encapsulation, why not write another programme. The whole point of UNIX is that you can do this simply and easily and it will work! Of course it does rely on a true multi-tasking operating system in order to run the programmes concurrently. (In some ways object brokers such as CORBA and COM are designed for just this task (AJL)). Best of the Rest .... ---------------- There were some parts of the book which I agreed with unreservedly, the fall through default of switch statements for example (p 38). But if we are using the numbers game to justify language syntax why don't we drop the `;' terminator and use line continuation instead, as in FORTRAN? The chapter on pointers and arrays was also accurate though familiar. I have also felt for a long time that many of the inconsistencies, for example with the way that two dimensional arrays are treated in functions, could be better handled by the compiler. The safest option in the meantime is always to allocate your own arrays. The author did however, manage to completely miss the important use of inter-positioning in library development, mentioning it only in passing as an annoying linker artefact. Neither was any mention made of the much maligned pointer syntax. Probably because the author knew that this is both powerful and useful. And I'm sorry (not wanting to shock or offend) but pointers really are quite useful, particularly in algorithmic programming, both for efficiency and flexibility and they remain the core basis for most C programme libraries. I learned two things from the book which may be of future use: 1/ The way to ensure that function prototypes and library routines define equivalent routines is to include the function prototype in the source file, (as discussed above). 2/ How to pass strings around without having to explicitly copy them by wrapping them in a struct (p70). But many of the other suggestions I would not use due to the non-transparency of the methods and I will continue to use typedefs (p 83) using the convention of a leading capital to signify a struct. The many anecdotes relating to various programming projects were amusing and held my attention at times when I was so agitated with the balance of content that I might otherwise have given up. Is C++ any Better? ------------------ In the final chapter the author gives a very brief description of C++ and seems to rather begrudgingly accept that widespread use of this language is inevitable. He does however, also say that good languages will stay around and poor ones will be forgotten and seems to hint that this assessment still needs to be made. Now several years later I would say that such an assessment is being made and it is not all good news for C++. Practical evaluations of C++ have shown that the language does not deliver improved code re-use [1]. Unfortunately, this is not guaranteed by better encapsulation, and the unrestrained use of Classes is a hindrance. The author of Deep C Secrets already suspected that library level re-use of code was less important than assumed by the C++ developers. The author notes that C++ had not managed to deal with any of the problems of C which could have been fixed, but had only proliferated an even more confusing language which destroys the ``nothing hidden'' philosophy. This observation predicted the practical reality that C++ is harder to maintain and debug with recent confirmed consequences in programmer effort of a factor of 3 [2]. This is not offset by the slight improvement in initial code generation, which is anyway quickly regained with the use of a few simple support C libraries. It is also not helpful that, from my observations, the only person who can really use a class library effectively is the programmer who designed it. If you do need to use other peoples software you had just better hope that it never needs modification, as the much loved feature of Object Oriented programming, being non-sequential, results in an inability to assess the likely effects of changes to Classes. Many hardened C++ programmers will freely admit that many of the special features of the language are better not used, Templates, Overloading and Inheritance, they may be fun to use but they result in poor maintainability. If you put Classes in this list too then there is little left to commend C++. It also provides far more potential for mis-use than the potential ambiguity of syntax in C. Though when poll-ed the majority of C++ programmers actually like the language and intend to continue using it [3] (equally, programmers who produce software which has to be relied upon may refuse to use C++ altogether.). The other two features of the language which promised to improve on C, automatic memory management and data streaming, also have real problems in practical situations. As mentioned here, the Destructors and Constructors break the ``nothing hidden'' philosophy of C, with predictable consequences. However, what is not said is that the use of streaming to output of anything other than simple data structures guarantees that the stored data cannot be read by anything other than another C++ program using the same Class library (compiled with the same compiler). This inhibits mixing of C++ with existing software, as does the practical inefficiency of trying to pass pointer based data structures constructed in a C library into C++. Though C++ fans are often dismissive of this problem and might say that everyone should re-write all software in C++, this is not a practical or even justifiable alternative. First it would need to be proven that C++ can practically achieve something USEFUL that C cannot do. Also, as even C++ requires a C compiler, most ardent C programmers feel quite secure. Of course C++ has had it's successes, it works quite well with data bases and user interfaces, which are two applications where you would probably predict that the Object Oriented paradigm might be appropriate. However, it also has some notable failures: use with hardware, parallelism and incremental software development, where pre-compilation and Object Oriented programming get in the way. If you can't follow the logic behind this and don't believe it, try asking someone with EXPERIENCE of the alternatives who's opinions you trust. Meanwhile the Object Oriented fanatics who started this off now recommend JAVA and, if OO is what you want, they may even be right! Conclusions ----------- Overall, I think the weakness of the early chapters of Deep C Secrets are (paradoxically) due to the strengths of the author. As a compiler designer he has had exposure to a wide variety of abuses of the language, each of them equally important in terms of constructing an ANSI compliant compiler. However, this has focused his attention on marginal use of the language. For example in 12 years of C programming and with 50,000 lines of code under my control I cannot remember a single example of the +++ construct or any of the other poor syntax examples given in this text. I do know good programming style when I see it however, and it's a pity this book didn't dwell a bit more on this issue. Though it could have been difficult to stretch this out when the message is so simple. Keep it simple and readable! Use lint! Use the debugger! If my analysis of the content of this book is correct then, barring a few simple changes, there really isn't anything seriously wrong with C that common sense programming style doesn't address. That's quite reassuring after years hearing comments like ``it's not a serious language like C++ and Pascal''. It is for this reasons I hope that in the rush to define C++ (to restrict code style to one that cannot be abused by the Determined) the standards bodies have not forgotten C and what might be done to genuinely improve it. It's a shame that the presentation of the first few chapters of this book are sufficiently negative to give a poor impression of the language to any naive programmer who might have hoped for (or even expected) more guidance. But perhaps, if he could see the C++ tsunami on the horizon, he was already writing for a different audience. footnote 1: grep +++ */*/*.c ..... on the TINA libraries gives tools/coreg/csf_count.c: format(" ++- %f +++ %f \n",count[6],count[7]); footnote 2: Purely in the interest of fun I have attempted to put together my own simple solution to syntax overloading in the C language. As it actually solves most of the problems without modifying use of the language, whether you can ever bring yourself to use it will be determined by how genuinely important you consider the problem to be. /* demystify.h or the syntax of C+ NAT 2/8/2000 */ #define readonly const /* eliminates confusion with read only data */ #define intern static /* eliminate confusion with static function */ #define nopar void /* eliminate empty parameter list ambiguity */ #define max(x, y) ((x) >= (y) ? (x) : (y)) /* majority use of constructs */ #define min(x, y) ((x) <= (y) ? (x) : (y)) #define choose(test,x,y) ((test) ? (x) : (y)) #define bit(mask,num) (mask&(1<>(num)) #define binand(mask1,mask2) ((mask1)&(mask2)) #define EQ == #define NE != #define AND && #define OR || /* this for consistency with AND. */ #define GE >= /* to avoid overloading of = */ #define LE <= #define GT > /* now for consistency with GE */ #define LT < #define NULL 0 /* this eliminates easily mistyped double characters, eg: &&,==,>>,|| */ /* and makes a cleaner distinction between assignments and assertions. */ /* the above assumes that the remaining common constructs are unambiguous */ /* +=, -=, /=, *=, |=, =, *, / , %, +, -, ++, --, !, &d, *d, d->x, d.x */ example: intern Tstring *edge_getloop(Edgel * edge) { Ddlist *start = NULL; Ddlist *end = NULL; Edge_conn *cptr; Edgel *eptr; Edgel *from; if (edge EQ NULL OR !binand(edge->type , EDGE_GET_CONN_MASK)) return (NULL); cptr = (Edge_conn *) prop_get(edge->props, EDGE_CONN); if (cptr->count NE 2) return (NULL); start = end = dd_link_alloc((void *) edge, EDGE); from = edge; edge->type |= EDGE_LOOP; for (eptr = cptr->c1; eptr NE edge; eptr = cptr->c1) { end = dd_link_addtoend(end, dd_link_alloc((void *) eptr, EDGE)); cptr = (Edge_conn *) prop_get(eptr->props, EDGE_CONN); eptr->type |= EDGE_LOOP; if (cptr->c2 NE from) SWAP(Edgel *, cptr->c1, cptr->c2) from = eptr; } start->last = end; /* to complete the loop */ end->next = start; return (str_make(LOOP, start, end)); } Not a vast change in programme content, but notice particularly the improved clarity of EQ vs = , and |= vs NE. Note also, >> and GT, & and AND, !! and OR, all of which are potential typographical syntax errors which are no longer possible. References [1] W.B.Frakes and C.J.Fox, Sixteen Questions About Software Reuse, Communications of the ACM, 38, 6, pp 75-87, 1995. [2] L.Hatton, Does OO Sync with How we Think? IEEE Software, pp 46-54, May/June 1998. [3] M.Cartwright and M.Shepperd, Maintenance: the Future of Object-Orientation, CSM '95, CSM, Durham, England 1995.