Dealing with your compiler

Editing source files

Every compiler system has a different way to do this. IDEs (Integrated Development Environments) typically comprise an editor, compiler, and debugger all rolled into one. These all-in-one systems are convenient, if often less flexible than separate tools. (Think of it as a Swiss Army Knife compiler. It will handle small cooking or whittling jobs, but for really big ones you might need a real kitchen or a workshop. On the other hand, it is generally a lot more convenient than a whole shop-full of individual tools.)

Once you know how to create several separate files, you will probably need to learn your compiler's naming conventions. Many C compilers these days come bundled with additional compilers for entirely different languages like Fortran, Ada, Pascal, and C++. The ‘C compiler’ is often just a front-end that dispatches out to the correct compiler based on the source file name. Files whose name end with a ‘.c’ extension will be passed to the C compiler, those with ‘.f77’ to the Fortran compiler, and those with ‘.cpp’, ‘.c++’, and even ‘.C’ (uppercase) may be given to a C++ compiler. Be particularly careful with the latter -- although C++ strongly resembles C, the languages are often subtly different, so that a valid C program is sometimes also a valid C++ program that does something quite different.

Invoking the compiler

Again, each system will have a different method of doing this. Many systems have now adopted something like the Unix(R) ‘make’ utility, with which you can define a set of rules for building your program out of a number of separate source files, or create a ‘project’ file with similar rules. If one is available, you should almost certainly learn and use it. A good program-builder will return the effort spent learning it within just a few programming projects.

In many cases, the compiler will, by default, run in ‘extensions enabled’ mode. That is, it will not detect violations of the C standard that it will find in ‘strict’ mode. If some or all of your code is supposed to be portable, you may want to change this. For instance, the GNU C compiler, by default, compiles a rather different language with all kinds of useful and debatable extensions; to get strict conformance, you must invoke it as gcc -ansi -pedantic (both flags are required).

Your compiler will probably also run, by default, with less than the maximal warning set. On the whole, warnings are usually good -- most reasonable compilers have some reasonable set of warnings that will tell you when you have made a mistake, even if the C standard does not require a diagnostic. Different people have different ideas as to exactly which warnings are reasonable, however, so you will probably want to tune these to your particular tastes. If you have not yet developed any particular taste, you might try turning on all warnings. With gcc this requires multiple -W flags: the obvious -Wall actually means ‘turn on many of the ones the gcc developers thought were especially good’.

Warnings, Errors, and Diagnostics

While C compilers typically print ‘errors’ -- messages that cause compilation to halt -- and ‘warnings’ that allow compilation to proceed anyway, the C standards do not distinguish between these, or even define the two terms. Instead, the standards use the phrase ‘constraint violation’. Given a program -- strictly speaking, a ‘translation unit’ -- that contains one or more ‘constraint violations’, any C compiler must produce ‘at least one diagnostic’. The standards do not mandate the form of that diagnostic; while an ‘error’ is often appropriate, a ‘warning’ will do as well. Worse, the standards do not require a compiler to distinguish required diagnostics and any other, non-required messages. Indeed, a compiler can simply emit one message for every program, regardless of whether or not that program contains any constraint violations.

In other words, a hypothetical ‘evil’ compiler could, every time you run it, print the message:

warning: this translation unit may or may not contain a constraint violation

and then never print another word. This would satisfy the ‘letter of the law’ in either the C89 or C99 standard, yet be entirely useless. If any code requires a diagnostic, the compiler has already produced one, and if not, the diagnostic that came out was one of the allowed, spurious diagnostics.

Any compiler that did this would be unlikely to catch on, of course; and in fact, compilers generally do a good job of emitting diagnostics for each constraint violation, pinpointing (or at least coming close to) the problem area, and continuing on and identifying any additional problems. Of course, compilers are not omiscient -- indeed, many compilers are particularly stupid -- and the problem pointed out may in fact not be where the real problem is at all. Over time you will learn to recognize ‘misleading’ diagnostics, such as the ones that point to a perfectly good line, because, e.g., a previous line was missing a semicolon. The set of misleading diagnostics will also vary from one compiler to the next.

Keep in mind that many compilers differ as to whether any particular constraint violation is an ‘error’ or a ‘warning’. Either one satisfies the need for a diagnostic, and in most cases, compilers do not label each warning as to whether it is required, or simply one of those things this particular compiler tries to be helpful about. In other words, a message like: ‘warning: integer to pointer conversion without a cast’ will usually mark a constraint violation, while ‘warning: integer constant so large that it is unsigned’ is not a constraint violation. There is no way to tell, just by looking at the message, whether your code will work on some other compiler. (The integer-to-pointer conversion may be considered an error on a stricter compiler, for instance.)

When you do get a warning, it is a good idea to examine the code carefully for constraint violations or any other problems. If there is nothing actually wrong with the code, or if you need to do something inherently ‘wrong’ (but which works on your system and achieves something that cannot be done ‘right’), you may be able to disable the warning individually, or put the code in a file that is compiled with that warning suppressed. In some cases, there may be a trivial way of rephrasing the construct that avoids the warning, and perhaps even makes the code clearer or better. For instance, if you want to use the integer constant 65535, which is large enough that it will be unsigned on some C compilers, you can write 65535U to force the number to be an unsigned int on all C compilers.

Debugging and Undefined Behavior

Even a program that contains no errors obvious to the compiler -- no constraint violations and nothing that triggers a warning -- is not necessarily correct. Programs that contain what the standards call ‘undefined behavior’ are particularly tricky. Undefined behavior is, in effect, a ‘hole’ in the language.

This ‘hole in the language’ has both good and bad aspects. The ‘good aspect’ is that it provides a ‘hook’ for an implementation to extend the language. For instance, a translation unit could contain the directive:

#include <graphics.h>

This directive actually triggers undefined behavior. That means the implementation is free to do anything -- including, now, to be able to do graphics, which are otherwise impossible in Standard C. Once you invoke undefined behavior, the implementor is released entirely from the ‘contract’ in the C standard, so now he can provide all kinds of useful features that the standard would otherwise prohibit. The undefined behavior is an escape clause.

On the other hand, the ‘bad aspect’ is pretty much the same thing. The undefined behavior releases the implementor from his ‘contract’, so now you have only yourself to blame if anything goes wrong. Moreover, the implementor gets to pull out this same escape clause if you do anything that violates a ‘shall’ that is outside a ‘constraints’ section of the applicable standard. This can free him from what would otherwise be an onerous task. For instance, using a pointer that contains no valid value, or adding two signed integer values where the result overflows, gives rise to undefined behavior. If all implementors were required to do something special, such as catch the error at runtime and give you a chance to debug it, this would slow down all uses of pointers, and every signed integer add, on some systems. Every time you went to use a pointer, the implementor would have to first check the validity of that pointer. Every time you went to add two integers, the implementor would first have to check for overflow.

Some might object that modern computers generally ignore the integer overflow entirely (giving a wrong, but predictable, result for the sum of two such integers), so the C standards could mandate that such overflow is ignored. In fact, however, some modern computers do, or can, check for such overflow, and it is hard to argue that it is better to compute the wrong answer, even if that can be done as fast, or to require the wrong answer even when finding the wrong answer is slower.

By allowing the implementor this escape clause -- and putting the onus of avoiding unwanted undefined behavior on the programmer -- the C standards allow implementors to produce lean, fast code. By having a ‘hook’ on which to hang extensions, the C standards allow programmers to write code that is deliberately machine-specific, so as to do something that might otherwise require resorting to assembly language.

In any case, once something goes wrong -- often from undefined behavior -- your development system will probably give you some way to debug your code. This may take the form of post-mortem debugging, where the system writes out everything it knows about the state of the program at the time it went awry, or interactive debugging. Your debugger should have some way of examining variables, looking at active function calls, and so on. Debuggers tend to be very different and there is not too much that can be said about them in general, save for one thing: if your compiler does much code optimization, the code you debug may be different from the code you wrote. The essence of optimization lies in deciphering the meaning of some source-code construct, and coming up with machine code that produces the same answer, yet does not necessarily use the same steps. It is tempting to turn off optimization when debugging. Sometimes this is helpful, but sometimes it will make the problem appear to go away, without actually fixing the real problem. For instance, suppose the problem has to do with undefined behavior and a bad pointer that was never given a value. When optimization is on, the ‘bad’ pointer happens to share a machine register with other code. The other code sets the register to a value that causes the runtime error. When you turn optimization off, the ‘bad’ pointer holds some other value, and the program does not abort when you use the pointer -- but since the pointer is in fact bad, using it causes some other data to be overwritten. Some time -- perhaps many minutes -- later, in some other source file far from the point of the bug, the program may misbehave in some other way.

Also, the process of optimization involves program analysis that can identify mistakes, such as the use of a variable that has a garbage initial value. (Standard dataflow analysis techniques do this.) This means that sometimes a compiler can only produce certain warnings when optimizing. Debugging unoptimized code is still a valuable technique, of course.

The best debugger of all, however, is the one between your own ears. Read the code; see what you expect to happen; and compare that to what actually does happen. Pretend the misbehavior of the program is a murder mystery, with various clues. Variable x became 7 when you expected it to be 10. What might explain that? Was Mr Green hiding in the conservatory, hitting it with the lead pipe? Or was it Professor Plum, in the library, with the stray pointer? If you can explain what did happen, and find a piece of code that not only seems wrong, but also would do exactly that, you can change that code, recompile, and re-test -- and if the program now runs (or gets past that particular problem), you can be fairly sure that you have indeed fixed it.

back