A lot of programmers believe that goto
statements are the best way to do error handling in the C
programming language. I've even seen that sentiment expressed
as a meme. I strongly disagree. Why should I care, given that
I'm on holiday within sight of the beach? I guess I've had
this argument with so many programmers over the years, that
I'm hoping putting my thoughts down will be somehow
cathartic.
I'm aware that in espousing this point of view, I may appear sophomoric. On the contrary, I consider myself a hardened veteran. "Learn the rules well, so you know when to break them" and "rules are made to be broken" are ever-popular maxims. They have some truth, but I also think they are popular in part because they flatter the ego. We should all beware of the fallacy of the converse. (I break the rules, therefore I am a pro.)
Programming is as much an art as it is a science, and much of
art is governed by rules. Pointillism
is a fine example. There are discernable movements in
computer science as well as in the art world. One such
movement is
structured programming. Even if you continue to use
goto
freely in your programs, I think there's
value in examining alternatives.
My aim is to show that C provides ample support for error
handling without recourse to goto
. My examples
use functions from <stdio.h>
because
everyone should be familiar with them.
For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce.The C Programming Language (1978, 1988) is arguably the most important book about C, written by its inventors: Kernighan & Ritchie. They did not use
goto
in their book
and explicitly advise against its use:
With a few exceptions like those cited here, code that relies on goto statements is generally harder to understand and to maintain than code without gotos. Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all.Those two sources alone should persuade anyone susceptible to arguments from authority, yet usage of
goto
seems
to be as widespread as ever, despite a widely publicised
"goto fail" bug in Apple's SSL code. (A bug caused by
missing braces, but Barnum was wrong about there being no such
thing as bad publicity.)
In the last twenty years, I've written code in many styles.
One employer had a rule of no more than one label per
function; another had a coding standard based on MISRA C,
which forbade not only goto
but also any kind of
early exit (although almost everyone ignored it). I also went
through a misguided phase of scrutinising all of the object
code generated by an
old-fashioned C compiler, to ensure that it resembled the
assembly language I would have written myself. That level of
fastidiousness entailed quite a few goto
statements. Currently, like Laplace, I have
no need of goto
in my code.
Many coding standards deprecate goto
; I've never
seen one mandate it. Consequently, for most programmers,
using goto
is an active choice. Somewhere along
the line, they see code that is riddled with
goto
statements, then decide to copy that style. Some
may have been entirely self-taught and were never able to
imagine anything better. Some learn not to use
GOTO
in BASIC and then "unlearn" that lesson when they
begin writing C. Others might still be writing code in a
style that they adopted when C compilers were much worse than
today. Regardless of how the habit was acquired, it is
naturally reinforced by the
mere-exposure effect.
The two exceptions cited by K&R in the paragraph that I quoted above are:
Perhaps inspired by point 1, many programmers not only use
goto
exclusively for error-handling, but also
handle errors exclusively using goto
!
Unfortunately, error checks may constitute 90% of the control
flow changes in many C programs, depending on what the
programmer arbitrarily designated as an 'error'. This
convention whereby every error check is a conditional
goto
falls so far short of "rarely, if at all" that it
would be funny if it weren't so tragic.
Every function I find with broken error-handling is a rat's
nest of gotos. I'm not saying that it's impossible to write
correct code using goto
; I think my anecdotal
observation is more likely a reflection of the fact that the
kind of programmers who think that goto
simplifies their code don't exercise sufficient care to make
sure it is correct. I'm inclined to believe so because I see
a lot of arguments for goto
which boil down to
"I don't want to think". Frankly, if you don't want to think
then you shouldn't be writing C programs.
A function in which goto
is over-used often has
the same control flow as an equivalent function that contains
nested if
blocks (the so-called 'arrow
anti-pattern') except that the actual structure of the
function has been flattened and obfuscated
using arbitrarily-invented (and possibly misleading) label
names:
void test(void) { FILE *fmin = fopen("testmin", "rb"); if (!fmin) { fputs("Open testmin failed\n", stderr); goto fmin_fail; } FILE *fsub = fopen("testsub", "rb"); if (!fsub) { fputs("Open testsub failed\n", stderr); goto fsub_fail; } FILE *fdiff = fopen("testdiff", "wb"); if (!fdiff) { fputs("Open testdiff failed\n", stderr); goto fdiff_fail; } int const minuend = fgetc(fmin), subtrahend = fgetc(fsub); if (minuend == EOF || subtrahend == EOF) { fputs("Read failed\n", stderr); } else if (fputc(minuend - subtrahend, fdiff) == EOF) { fputs("Write failed\n", stderr); } if (fclose(fdiff)) { fputs("Close testdiff failed\n", stderr); } fdiff_fail: fclose(fsub); fsub_fail: fclose(fmin); fmin_fail: return; }
I heartily resent checking the order and naming of such
labels when reviewing code. Nested if
blocks
aren't ideal either, but at least they are honest
and align visually.
void test(void) { FILE *fmin = fopen("testmin", "rb"); if (!fmin) { fputs("Open testmin failed\n", stderr); } else { FILE *fsub = fopen("testsub", "rb"); if (!fsub) { fputs("Open testsub failed\n", stderr); } else { FILE *fdiff = fopen("testdiff", "wb"); if (!fdiff) { fputs("Open testdiff failed\n", stderr); } else { int const minuend = fgetc(fmin), subtrahend = fgetc(fsub); if (minuend == EOF || subtrahend == EOF) { fputs("Read failed\n", stderr); } else if (fputc(minuend - subtrahend, fdiff) == EOF) { fputs("Write failed\n", stderr); } if (fclose(fdiff)) { fputs("Close testdiff failed\n", stderr); } } fclose(fsub); } fclose(fmin); } }(I'll propose a third alternative that actually simplifies this function later.)
Some programmers hate indentation: it makes them feel anxious. To some extent, this is rational, since the amount of indentation is one indicator of the complexity of a function. On the other hand, monitors aren't restricted to 80 columns anymore, and hiding the structure of a function doesn't simplify it. If you're willing to indent Python code then you should extend the same courtesy to C; if not, you should probably consider a career writing assembly language instead.
I've come to believe that the real issue with banning
goto
is that programmers like it — not
that they need it, or that it makes programs better. Like the
dark side of the Force, it is quicker, easier, and more
seductive. It's like beer for programmers. Of course
someone tried banning that once too. One of the main
reason for repealing prohibition in the U.S.A. was that tax
revenues could be raised by taxing beer. If usage of
goto
were taxed then programmers might think twice
about littering their code with it.
When the Taliban proudly announced to the world that "Women's
rights will be respected — within the limits of Islam",
I was tempted to announce my own reformed, pragmatic and
compassionate approach to C programming, which respects
everyone's right to use goto
— within the
limits of K&R.* (That is to say, rarely, if at all.) In
reality, I tend to agree with Stroustrup that persuasion is
more desirable and effective than prohibition.
* A joke about fanaticism — not a religious opinion, which I am unqualified to give.
My first preference for an error-handling pattern is not to
branch on error at all, given that efficiency of a program
that fails doesn't matter. This strategy is easy to implement
when writing serialization code, because a FILE
object stores the error state of a stream:
typedef struct { uint32_t count; unsigned char data[100]; } foo_t; static unsigned char const magic[] = {'S', 'O', 'U', 'L'}; static bool save_file(foo_t const *const obj, char const *const filename) { FILE *const f = fopen(filename, "wb"); if (NULL == f) { fprintf(stderr, "Open %s failed\n", filename); return false; } fwrite(magic, sizeof(magic), 1, f); for (size_t i = 0; i < sizeof(obj->count); ++i) { fputc((obj->count >> (CHAR_BIT * i)) & UCHAR_MAX, f); } fwrite(obj->data, obj->count, 1, f); bool err = ferror(f); if (fclose(f)) { err = true; } if (err) { fprintf(stderr, "Write to %s failed\n", filename); } return !err; }
It's easy to support this usage pattern when designing your own interfaces too. Another example of it is OpenGL's glGetError function.
My second preference is to follow the allocate-call-free
pattern. The main reason that programmers goto
a
label near the end of a function is to avoid resource leaks.
Had the same resource(s) instead been allocated in the
calling function, then the callee could have returned
directly without leaking. Consequently, I've come to believe
that
early exit (i.e. return
) is the most
powerful mechanism for structuring a C program, even though
it is a deviation from structured programming in its purest
form:
typedef enum { ERROR_NONE, ERROR_READ_FAIL, ERROR_BAD_MAGIC, ERROR_TOO_BIG, } error_t; static error_t deserialize(foo_t *const obj, FILE *const f) { unsigned char hdr[sizeof(magic)]; if (fread(hdr, sizeof(hdr), 1, f) != 1) { return ERROR_READ_FAIL; } if (memcmp(hdr, magic, sizeof(magic))) { return ERROR_BAD_MAGIC; } obj->count = 0; for (size_t i = 0; i < sizeof(obj->count); ++i) { int const c = fgetc(f); if (c == EOF) { return ERROR_READ_FAIL; } obj->count |= (uint32_t)c << (CHAR_BIT * i); } if (obj->count > sizeof(obj->data)) { return ERROR_TOO_BIG; } if (fread(obj->data, obj->count, 1, f) != 1) { return ERROR_READ_FAIL; } return ERROR_NONE; } static bool load_file(foo_t *const obj, char const *const filename) { FILE *const f = fopen(filename, "rb"); if (!f) { fprintf(stderr, "Open %s failed\n", filename); return false; } error_t const err = deserialize(obj, f); switch (err) { case ERROR_READ_FAIL: fprintf(stderr, "Read from %s failed\n", filename); break; case ERROR_BAD_MAGIC: fprintf(stderr, "Bad magic values in %s\n", filename); break; case ERROR_TOO_BIG: fprintf(stderr, "Too much data in %s\n", filename); break; } fclose(f); return err == ERROR_NONE; }
Note that the deserialize
function doesn't even
know the name of the file from which it is reading. That is a
good thing because it might not be a file at all — it
might be stdin
. Nor does
deserialize
depend on reporting errors via
stderr
, which might not be appropriate in every
situation.
In a real program, the code to report any
error_t
value (except ERROR_NONE
) to the
user would typically be in a separate function to allow
reuse. This function might be very different for a command
line tool when compared to an interactive application with a
GUI.
Separating resource allocation from processing can also
benefit performance, since it encourages reuse of resources.
The following excerpt is from a 3D object mesh format
converter. The process_object
function doesn't
allocate anything that isn't attached to varray
or groups
, nor does it open or close the
models
or out
files. Consequently, any
memory allocated for each object is recycled for the next,
and neither memory nor file handles are leaked:
Group groups[Group_Count]; for (int g = 0; g < Group_Count; ++g) { group_init(groups + g); } VertexArray varray; vertex_array_init(&varray); int object_count; for (object_count = 0; !stop && success; ++object_count) { success = process_object(models, out, object_name, object_count, &varray, &groups, &vtotal, &list_title, thick, data_start, flags); } for (int g = 0; g < Group_Count; ++g) { group_free(groups + g); } vertex_array_free(&varray);
This idea of attaching resources to an object passed by the calling function is widely applicable, and it makes testing for leaks very boring because there never are any. Someone once commented that I thought my code was based. You too can write 'based' code simply by making better use of functions.
When tempted to write multiple goto
statements
to a single cleanup label, consider the alternative of a
single-iteration do...while
loop:
bool subtractor(char const *const filename) { FILE *const f = fopen(filename, "rb"); if (!f) { fprintf(stderr, "Open %s failed\n", filename); return false; } bool success = false; do { unsigned char minuends[32]; if (1 != fread(minuends, sizeof(minuends), 1, f)) { continue; } unsigned char subtrahends[sizeof(minuends)]; if (1 != fread(subtrahends, sizeof(subtrahends), 1, f)) { continue; } for (size_t i = 0; i < sizeof(minuends); ++i) { printf("%d - %d = %d\n", minuends[i], subtrahends[i], minuends[i] - subtrahends[i]); } success = true; } while(0); if (!success) { fprintf(stderr, "Read from %s failed\n", filename); } fclose(f); return success; }
This idiom can be useful in cases where you do not want to
allocate resources in the calling function (perhaps because
an excessive number of arguments would need to be passed).
When compared to goto
, it has the advantage that
you don't need to invent a label name and type it repeatedly,
besides which it makes the structure of the code obvious at a
glance because the block within which early exit may occur
(equivalent to a try
block) is indented.
A nice way of remembering this idiom is Yoda's famous pronouncement: "Do or do not. There is no try."
It makes no difference whether break
is used to
exit the loop immediately, or continue
is used
to jump to the end of the loop body. As a general rule,
continue
may be preferable because it can also
be used within any switch
statements nested in
the loop. On the other hand, break
is more
self-explanatory.
Either way, this idiom does not allow early exit from nested
loops such as would be required in the
deserialize
function above. In such cases it's better
to refactor into separate functions and use
return
instead of break
or
continue
.
If a function allocates multiple resources of the same type,
then it may be better to iterate over an array of that type
instead of using nested function calls or nested
if
blocks. An enumeration is a good way to name the
array indices:
void test(void) { enum { FILE_MIN, FILE_SUB, FILE_DIFF, FILE_COUNT }; static struct { char const *name, *mode; } const files[FILE_COUNT] = { [FILE_MIN] = {"testmin", "rb"}, [FILE_SUB] = {"testsub", "rb"}, [FILE_DIFF] = {"testdiff", "wb"} }; FILE *f[FILE_COUNT]; size_t nopen; for (nopen = 0; nopen < FILE_COUNT; ++nopen) { f[nopen] = fopen(files[nopen].name, files[nopen].mode); if (!f[nopen]) { fprintf(stderr, "Open %s failed\n", files[nopen].name); break; } } if (nopen == FILE_COUNT) { int const minuend = fgetc(f[FILE_MIN]), subtrahend = fgetc(f[FILE_SUB]); if (minuend == EOF || subtrahend == EOF) { fputs("Read failed\n", stderr); } else if (fputc(minuend - subtrahend, f[FILE_DIFF]) == EOF) { fputs("Write failed\n", stderr); } } while (nopen-- > 0) { if (fclose(f[nopen])) { fprintf(stderr, "Close %s failed\n", files[nopen].name); } } }
Often, there is no need to name the array indices:
#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) static struct { int event_code; WimpEventHandler *handler; } const wimp_handlers[] = { { Wimp_ERedrawWindow, redraw_window }, { Wimp_EOpenWindow, open_window }, { Wimp_ECloseWindow, close_window }, { Wimp_EMouseClick, mouse_click }, }; static void deregister_wimp_handlers(EditWin *const edit_win, size_t i) { while (i-- > 0) { event_deregister_wimp_handler(edit_win->window_id, wimp_handlers[i].event_code, wimp_handlers[i].handler, edit_win); } } static bool register_wimp_handlers(EditWin *const edit_win) { for (size_t i = 0; i < ARRAY_SIZE(wimp_handlers); i++) { if (E(event_register_wimp_handler(edit_win->window_id, wimp_handlers[i].event_code, wimp_handlers[i].handler, edit_win))) { deregister_wimp_handlers(edit_win, i); return false; } } return true; }
Freeing resources in a separate function allows the same code to be used for both error handling and normal object destruction.
Often, different types of resources are allocated by a single function. In such cases, a simple allocation loop cannot be used.
I normally use a combination of techniques already described to limit the nesting depth of conditional blocks within a single function.
On rare occasions, a state machine to manage control
flow may be justified. This extends the concept of an
allocation loop: instead of using enumerators as array
indices, they are used as states in a switch
statement:
typedef struct { void *buffer; FILE *file; } object_t; typedef enum { INIT_STATE_FIRST, INIT_STATE_BUFFER = INIT_STATE_FIRST, INIT_STATE_FILE, INIT_STATE_LAST } init_state_t; static object_t *partial_destructor(object_t *const o, init_state_t state) { // state is the failed initialization step, so start destruction at the previous step. while (state-- > INIT_STATE_FIRST) { switch (state) { case INIT_STATE_BUFFER: free(o->buffer); break; case INIT_STATE_FILE: fclose(o->file); break; } } free(o); return NULL; } void destructor(object_t *const o) { partial_destructor(o, INIT_STATE_LAST); } object_t *constructor(char const *const filename, size_t const buf_size) { object_t *const o = malloc(sizeof(*o)); if (!o) { fprintf(stderr, "Memory allocation of object failed\n"); return NULL; } for (init_state_t state = INIT_STATE_FIRST; state < INIT_STATE_LAST; ++state) { switch (state) { case INIT_STATE_BUFFER: o->buffer = malloc(buf_size); if (!o->buffer) { fprintf(stderr, "Memory allocation of %zu failed\n", buf_size); return partial_destructor(o, state); } break; case INIT_STATE_FILE: o->file = fopen(filename, "rb"); if (!o->file) { fprintf(stderr, "Open %s failed\n", filename); return partial_destructor(o, state); } break; } } return o; }
A common variation of this idiom is to eliminate the loop in
the destructor and instead rely on fall-through between
case
statements:
static object_t *partial_destructor(object_t *const o, init_state_t state) { if (state-- > INIT_STATE_FIRST) { switch (state) { case INIT_STATE_FILE: fclose(o->file); // fallthrough case INIT_STATE_BUFFER: free(o->buffer); } } free(o); return NULL; }
I don't favour this variation because it makes the destructor fragile: reordering the enumeration no longer changes the order in which resources are freed, which can cause leaks or attempts to free resources which were never allocated. It doesn't even have the advantage of brevity because linters require fall-through to be explicitly annotated. The efficiency of object destruction is rarely significant and the compiler might unroll a destructor loop anyway.
It uses the facilities that C provides to give the program structure. It's true that the structure reflects the constraints of programming in a language that lacks automatic destructors, but that is the difference between a well-written C program, and a C program written in the same style as another language that has fewer constraints. This is also why C programming is a craft, not simply bashing out code.
But C does have a stack... and nesting... and function calls.
Then you have probably structured your program wrongly,
especially if you are likely to return in many places and
want to perform the same clean-up. It's equally possible that
you don't always want to do clean-up (e.g. in the
example I gave, where the FILE *
could be
stdin
), which is why programs should be composed of
functions, not goto
statements and labels.
"Kicking the can" is the essence of stepwise refinement, abstraction, and everything good about software engineering.
I don't think any plausible interpretation of structured programming agrees with this. If anything, the formal definition of structured programming is stricter than what I advocate.
It's true that break
, continue
and
return
are
deviations from structured programming in its strictest
form, but persuading people is a balance of 'carrot' and
'stick'. I wouldn't describe constraining branches to the end
of a loop, the end of a function, or the statement
immediately after the end of the current loop, as a 'slight'
restriction, therefore I wouldn't describe removing those
constraints as a 'slight' relaxation either.
Not knowing where execution of a program jumps to without reading the entire program is practically the definition of an unstructured program. A function is simply a sub-program.
Some coding standards allow this, but the person reading code
doesn't necessarily know what standard was in force when it
was written, or whether the author adhered to that standard.
In contrast, there is no ambiguity about whether or not
break
and continue
branch forwards
because they always do.
Just because C++ allows you to avoid thinking about the
consequences of acquiring a resource, that doesn't mean you
should adopt the same habits in C. A structured C program
does not have the same appearance as a C++ program with a
load of goto
and labels thrown in. (Or calls to
longjmp
, if trying to emulate C++ exceptions
too.)
This is already a common idiom for macro definitions; the
only difference here is that the loop isn't hidden by the
pre-processor. After I became accustomed to it, using
do
in place of try
seemed natural to me.
In any case, it's wrong to assume that any loop has more than
one iteration (or more than zero iterations, in the case of
while
or for
loops).
The obvious answer is "Don't write code like that". It's almost always better to group resource allocations by type. Failing that, use a state machine.
This argument doesn't apply to do...while
loops,
where the end of the loop is obvious. In other cases, the
start and end of the containing loop (or switch
)
should be visually aligned.
This is an interesting criticism of C's syntax (and that of
most modern languages) which has nothing to do with error
handling. Taken to an absurd conclusion, one should always
use goto
and never use break
or
continue
. I don't think the ability to search
outweighs the advantage of knowing that the control flow of a
program deviates from its apparent structure in limited and
predictable ways.
It is a massively overengineered solution to the toy
problem I presented. However, when you've seen functions that
contain hundreds of labels jumbled up with preprocessor
logic, you might feel differently. It's useful to have a tool
in your arsenal which scales to an unlimited number of
initialisations (unlike goto
, which requires
strict reverse ordering of termination without providing any
means of validating that beyond giving yourself eyestrain).
Although use of the post-decrement operator in a
while
statement may look confusing at first, this is a
common idiom so you might as well learn it (just as you once
learned to write an idiomatic for
loop).