goto hell;

Christopher Bazley, September 2022 (revised October 2022, April-May 2023 & November-December 2023)

Distracted boyfriend (C developers) ogles a passing woman (goto) to the disgust of his girlfriend (else, break, continue, return)

Introduction

A lot of programmers believe that goto statements are the best way to do error handling in the C programming language. I've even seen that sentiment expressed as a meme. I strongly disagree. Why should I care, given that I'm on holiday within sight of the beach? I guess I've had this argument with so many programmers over the years, that I'm hoping putting my thoughts down will be somehow cathartic.

I'm aware that in espousing this point of view, I may appear sophomoric. On the contrary, I consider myself a hardened veteran. "Learn the rules well, so you know when to break them" and "rules are made to be broken" are ever-popular maxims. They have some truth, but I also think they are popular in part because they flatter the ego. We should all beware of the fallacy of the converse. (I break the rules, therefore I am a pro.)

Programming is as much an art as it is a science, and much of art is governed by rules. Pointillism is a fine example. There are discernable movements in computer science as well as in the art world. One such movement is structured programming. Even if you continue to use goto freely in your programs, I think there's value in examining alternatives.

My aim is to show that C provides ample support for error handling without recourse to goto. My examples use functions from <stdio.h> because everyone should be familiar with them.

Background

As long ago as 1968, eminent Dutch computer scientist Edsger Dijkstra infamously observed:
For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce.
The C Programming Language (1978, 1988) is arguably the most important book about C, written by its inventors: Kernighan & Ritchie. They did not use goto in their book and explicitly advise against its use:
With a few exceptions like those cited here, code that relies on goto statements is generally harder to understand and to maintain than code without gotos. Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all.
Those two sources alone should persuade anyone susceptible to arguments from authority, yet usage of goto seems to be as widespread as ever, despite a widely publicised "goto fail" bug in Apple's SSL code. (A bug caused by missing braces, but Barnum was wrong about there being no such thing as bad publicity.)

My own view

In the last twenty years, I've written code in many styles. One employer had a rule of no more than one label per function; another had a coding standard based on MISRA C, which forbade not only goto but also any kind of early exit (although almost everyone ignored it). I also went through a misguided phase of scrutinising all of the object code generated by an old-fashioned C compiler, to ensure that it resembled the assembly language I would have written myself. That level of fastidiousness entailed quite a few goto statements. Currently, like Laplace, I have no need of goto in my code.

Many coding standards deprecate goto; I've never seen one mandate it. Consequently, for most programmers, using goto is an active choice. Somewhere along the line, they see code that is riddled with goto statements, then decide to copy that style. Some may have been entirely self-taught and were never able to imagine anything better. Some learn not to use GOTO in BASIC and then "unlearn" that lesson when they begin writing C. Others might still be writing code in a style that they adopted when C compilers were much worse than today. Regardless of how the habit was acquired, it is naturally reinforced by the mere-exposure effect.

The two exceptions cited by K&R in the paragraph that I quoted above are:

  1. To abandon processing in some deeply nested structure, such as breaking out of two or more loops at once. They comment "This organization is handy if the error-handling code is non-trivial, and if errors can occur in several places."
  2. To abandon processing when it has been determined that two arrays have an element in common (i.e. breaking out of two loops on success instead of on failure).

Perhaps inspired by point 1, many programmers not only use goto exclusively for error-handling, but also handle errors exclusively using goto! Unfortunately, error checks may constitute 90% of the control flow changes in many C programs, depending on what the programmer arbitrarily designated as an 'error'. This convention whereby every error check is a conditional goto falls so far short of "rarely, if at all" that it would be funny if it weren't so tragic.

Every function I find with broken error-handling is a rat's nest of gotos. I'm not saying that it's impossible to write correct code using goto; I think my anecdotal observation is more likely a reflection of the fact that the kind of programmers who think that goto simplifies their code don't exercise sufficient care to make sure it is correct. I'm inclined to believe so because I see a lot of arguments for goto which boil down to "I don't want to think". Frankly, if you don't want to think then you shouldn't be writing C programs.

A function in which goto is over-used often has the same control flow as an equivalent function that contains nested if blocks (the so-called 'arrow anti-pattern') except that the actual structure of the function has been flattened and obfuscated using arbitrarily-invented (and possibly misleading) label names:

void test(void)
{
  FILE *fmin = fopen("testmin", "rb");
  if (!fmin) {
    fputs("Open testmin failed\n", stderr);
    goto fmin_fail;
  }

  FILE *fsub = fopen("testsub", "rb");
  if (!fsub) {
    fputs("Open testsub failed\n", stderr);
    goto fsub_fail;
  }

  FILE *fdiff = fopen("testdiff", "wb");
  if (!fdiff) {
    fputs("Open testdiff failed\n", stderr);
    goto fdiff_fail;
  }

  int const minuend = fgetc(fmin),
            subtrahend = fgetc(fsub);

  if (minuend == EOF || subtrahend == EOF) {
    fputs("Read failed\n", stderr);
  } else if (fputc(minuend - subtrahend, fdiff) == EOF) {
    fputs("Write failed\n", stderr);
  }

  if (fclose(fdiff)) {
    fputs("Close testdiff failed\n", stderr);
  }
fdiff_fail:
  fclose(fsub);
fsub_fail:
  fclose(fmin);
fmin_fail:
  return;
}

I heartily resent checking the order and naming of such labels when reviewing code. Nested if blocks aren't ideal either, but at least they are honest and align visually.

void test(void)
{
  FILE *fmin = fopen("testmin", "rb");
  if (!fmin) {
    fputs("Open testmin failed\n", stderr);
  } else {
    FILE *fsub = fopen("testsub", "rb");
    if (!fsub) {
      fputs("Open testsub failed\n", stderr);
    } else {
      FILE *fdiff = fopen("testdiff", "wb");
      if (!fdiff) {
        fputs("Open testdiff failed\n", stderr);
      } else {
        int const minuend = fgetc(fmin),
                  subtrahend = fgetc(fsub);

        if (minuend == EOF || subtrahend == EOF) {
          fputs("Read failed\n", stderr);
        } else if (fputc(minuend - subtrahend, fdiff) == EOF) {
          fputs("Write failed\n", stderr);
        }

        if (fclose(fdiff)) {
          fputs("Close testdiff failed\n", stderr);
        }
      }
      fclose(fsub);
    }
    fclose(fmin);
  }
}
(I'll propose a third alternative that actually simplifies this function later.)

Some programmers hate indentation: it makes them feel anxious. To some extent, this is rational, since the amount of indentation is one indicator of the complexity of a function. On the other hand, monitors aren't restricted to 80 columns anymore, and hiding the structure of a function doesn't simplify it. If you're willing to indent Python code then you should extend the same courtesy to C; if not, you should probably consider a career writing assembly language instead.

I've come to believe that the real issue with banning goto is that programmers like it — not that they need it, or that it makes programs better. Like the dark side of the Force, it is quicker, easier, and more seductive. It's like beer for programmers. Of course someone tried banning that once too. One of the main reason for repealing prohibition in the U.S.A. was that tax revenues could be raised by taxing beer. If usage of goto were taxed then programmers might think twice about littering their code with it.

When the Taliban proudly announced to the world that "Women's rights will be respected — within the limits of Islam", I was tempted to announce my own reformed, pragmatic and compassionate approach to C programming, which respects everyone's right to use goto — within the limits of K&R.* (That is to say, rarely, if at all.) In reality, I tend to agree with Stroustrup that persuasion is more desirable and effective than prohibition.

* A joke about fanaticism — not a religious opinion, which I am unqualified to give.

Deferred error handling

My first preference for an error-handling pattern is not to branch on error at all, given that efficiency of a program that fails doesn't matter. This strategy is easy to implement when writing serialization code, because a FILE object stores the error state of a stream:

typedef struct {
  uint32_t count;
  unsigned char data[100];
} foo_t;

static unsigned char const magic[] = {'S', 'O', 'U', 'L'};

static bool save_file(foo_t const *const obj, char const *const filename)
{
  FILE *const f = fopen(filename, "wb");
  if (NULL == f) {
    fprintf(stderr, "Open %s failed\n", filename);
    return false;
  }

  fwrite(magic, sizeof(magic), 1, f);

  for (size_t i = 0; i < sizeof(obj->count); ++i) {
    fputc((obj->count >> (CHAR_BIT * i)) & UCHAR_MAX, f);
  }

  fwrite(obj->data, obj->count, 1, f);

  bool err = ferror(f);
  if (fclose(f)) {
    err = true;
  }
  if (err) {
    fprintf(stderr, "Write to %s failed\n", filename);
  }
  return !err;
}

It's easy to support this usage pattern when designing your own interfaces too. Another example of it is OpenGL's glGetError function.

Error handling by early exit from a function

My second preference is to follow the allocate-call-free pattern. The main reason that programmers goto a label near the end of a function is to avoid resource leaks. Had the same resource(s) instead been allocated in the calling function, then the callee could have returned directly without leaking. Consequently, I've come to believe that early exit (i.e. return) is the most powerful mechanism for structuring a C program, even though it is a deviation from structured programming in its purest form:

typedef enum {
  ERROR_NONE,
  ERROR_READ_FAIL,
  ERROR_BAD_MAGIC,
  ERROR_TOO_BIG,
} error_t;

static error_t deserialize(foo_t *const obj, FILE *const f)
{
  unsigned char hdr[sizeof(magic)];

  if (fread(hdr, sizeof(hdr), 1, f) != 1) {
    return ERROR_READ_FAIL;
  }

  if (memcmp(hdr, magic, sizeof(magic))) {
    return ERROR_BAD_MAGIC;
  }

  obj->count = 0;
  for (size_t i = 0; i < sizeof(obj->count); ++i) {
    int const c = fgetc(f);
    if (c == EOF) {
      return ERROR_READ_FAIL;
    }
    obj->count |= (uint32_t)c << (CHAR_BIT * i);
  }

  if (obj->count > sizeof(obj->data)) {
    return ERROR_TOO_BIG;
  }

  if (fread(obj->data, obj->count, 1, f) != 1) {
    return ERROR_READ_FAIL;
  }

  return ERROR_NONE;
}

static bool load_file(foo_t *const obj, char const *const filename)
{
  FILE *const f = fopen(filename, "rb");
  if (!f) {
    fprintf(stderr, "Open %s failed\n", filename);
    return false;
  }

  error_t const err = deserialize(obj, f);
  switch (err) {
  case ERROR_READ_FAIL:
    fprintf(stderr, "Read from %s failed\n", filename);
    break;
  case ERROR_BAD_MAGIC:
    fprintf(stderr, "Bad magic values in %s\n", filename);
    break;
  case ERROR_TOO_BIG:
    fprintf(stderr, "Too much data in %s\n", filename);
    break;
  }
  fclose(f);
  return err == ERROR_NONE;
}

Note that the deserialize function doesn't even know the name of the file from which it is reading. That is a good thing because it might not be a file at all — it might be stdin. Nor does deserialize depend on reporting errors via stderr, which might not be appropriate in every situation.

In a real program, the code to report any error_t value (except ERROR_NONE) to the user would typically be in a separate function to allow reuse. This function might be very different for a command line tool when compared to an interactive application with a GUI.

Separating resource allocation from processing can also benefit performance, since it encourages reuse of resources. The following excerpt is from a 3D object mesh format converter. The process_object function doesn't allocate anything that isn't attached to varray or groups, nor does it open or close the models or out files. Consequently, any memory allocated for each object is recycled for the next, and neither memory nor file handles are leaked:

Group groups[Group_Count];
for (int g = 0; g < Group_Count; ++g) {
  group_init(groups + g);
}

VertexArray varray;
vertex_array_init(&varray);

int object_count;
for (object_count = 0; !stop && success; ++object_count) {
  success = process_object(models, out, object_name, object_count,
    &varray, &groups, &vtotal, &list_title,
    thick, data_start, flags);
}

for (int g = 0; g < Group_Count; ++g) {
  group_free(groups + g);
}

vertex_array_free(&varray);

This idea of attaching resources to an object passed by the calling function is widely applicable, and it makes testing for leaks very boring because there never are any. Someone once commented that I thought my code was based. You too can write 'based' code simply by making better use of functions.

Error handling by means of a dummy loop

When tempted to write multiple goto statements to a single cleanup label, consider the alternative of a single-iteration do...while loop:

bool subtractor(char const *const filename)
{
  FILE *const f = fopen(filename, "rb");
  if (!f) {
    fprintf(stderr, "Open %s failed\n", filename);
    return false;
  }

  bool success = false;
  do {
    unsigned char minuends[32];
    if (1 != fread(minuends, sizeof(minuends), 1, f)) {
      continue;
    }

    unsigned char subtrahends[sizeof(minuends)];
    if (1 != fread(subtrahends, sizeof(subtrahends), 1, f)) {
      continue;
    }

    for (size_t i = 0; i < sizeof(minuends); ++i) {
      printf("%d - %d = %d\n", minuends[i], subtrahends[i], minuends[i] - subtrahends[i]);
    }
    success = true;
  } while(0);

  if (!success) {
    fprintf(stderr, "Read from %s failed\n", filename);
  }
  fclose(f);
  return success;
}

This idiom can be useful in cases where you do not want to allocate resources in the calling function (perhaps because an excessive number of arguments would need to be passed). When compared to goto, it has the advantage that you don't need to invent a label name and type it repeatedly, besides which it makes the structure of the code obvious at a glance because the block within which early exit may occur (equivalent to a try block) is indented.

A nice way of remembering this idiom is Yoda's famous pronouncement: "Do or do not. There is no try."

It makes no difference whether break is used to exit the loop immediately, or continue is used to jump to the end of the loop body. As a general rule, continue may be preferable because it can also be used within any switch statements nested in the loop. On the other hand, break is more self-explanatory.

Either way, this idiom does not allow early exit from nested loops such as would be required in the deserialize function above. In such cases it's better to refactor into separate functions and use return instead of break or continue.

Error handling by early exit from an allocation loop

If a function allocates multiple resources of the same type, then it may be better to iterate over an array of that type instead of using nested function calls or nested if blocks. An enumeration is a good way to name the array indices:

void test(void)
{
  enum {
    FILE_MIN,
    FILE_SUB,
    FILE_DIFF,
    FILE_COUNT
  };

  static struct {
    char const *name, *mode;
  } const files[FILE_COUNT] = {
    [FILE_MIN] = {"testmin", "rb"},
    [FILE_SUB] = {"testsub", "rb"},
    [FILE_DIFF] = {"testdiff", "wb"}
  };

  FILE *f[FILE_COUNT];
  size_t nopen;

  for (nopen = 0; nopen < FILE_COUNT; ++nopen) {
    f[nopen] = fopen(files[nopen].name, files[nopen].mode);
    if (!f[nopen]) {
      fprintf(stderr, "Open %s failed\n", files[nopen].name);
      break;
    }
  }

  if (nopen == FILE_COUNT) {
    int const minuend = fgetc(f[FILE_MIN]),
              subtrahend = fgetc(f[FILE_SUB]);

    if (minuend == EOF || subtrahend == EOF) {
      fputs("Read failed\n", stderr);
    } else if (fputc(minuend - subtrahend, f[FILE_DIFF]) == EOF) {
      fputs("Write failed\n", stderr);
    }
  }

  while (nopen-- > 0) {
    if (fclose(f[nopen])) {
      fprintf(stderr, "Close %s failed\n", files[nopen].name);
    }
  }
}

Often, there is no need to name the array indices:

#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0]))

static struct {
 int event_code;
 WimpEventHandler *handler;
} const wimp_handlers[] = {
  { Wimp_ERedrawWindow, redraw_window },
  { Wimp_EOpenWindow, open_window },
  { Wimp_ECloseWindow, close_window },
  { Wimp_EMouseClick, mouse_click },
};

static void deregister_wimp_handlers(EditWin *const edit_win, size_t i)
{
  while (i-- > 0) {
    event_deregister_wimp_handler(edit_win->window_id,
                                  wimp_handlers[i].event_code,
                                  wimp_handlers[i].handler,
                                  edit_win);
  }
}

static bool register_wimp_handlers(EditWin *const edit_win)
{
  for (size_t i = 0; i < ARRAY_SIZE(wimp_handlers); i++) {
    if (E(event_register_wimp_handler(edit_win->window_id,
                                      wimp_handlers[i].event_code,
                                      wimp_handlers[i].handler,
                                      edit_win))) {
      deregister_wimp_handlers(edit_win, i);
      return false;
    }
  }

  return true;
}

Freeing resources in a separate function allows the same code to be used for both error handling and normal object destruction.

Error handling using a state machine

Often, different types of resources are allocated by a single function. In such cases, a simple allocation loop cannot be used.

I normally use a combination of techniques already described to limit the nesting depth of conditional blocks within a single function.

On rare occasions, a state machine to manage control flow may be justified. This extends the concept of an allocation loop: instead of using enumerators as array indices, they are used as states in a switch statement:

typedef struct {
  void *buffer;
  FILE *file;
} object_t;

typedef enum {
  INIT_STATE_FIRST,
  INIT_STATE_BUFFER = INIT_STATE_FIRST,
  INIT_STATE_FILE,
  INIT_STATE_LAST
} init_state_t;

static object_t *partial_destructor(object_t *const o, init_state_t state)
{
  // state is the failed initialization step, so start destruction at the previous step.
  while (state-- > INIT_STATE_FIRST) {
    switch (state) {
    case INIT_STATE_BUFFER:
      free(o->buffer);
      break;
    case INIT_STATE_FILE:
      fclose(o->file);
      break;
    }
  }
  free(o);
  return NULL;
}

void destructor(object_t *const o)
{
  partial_destructor(o, INIT_STATE_LAST);
}

object_t *constructor(char const *const filename, size_t const buf_size)
{
  object_t *const o = malloc(sizeof(*o));
  if (!o) {
    fprintf(stderr, "Memory allocation of object failed\n");
    return NULL;
  }
  for (init_state_t state = INIT_STATE_FIRST; state < INIT_STATE_LAST; ++state) {
    switch (state) {
    case INIT_STATE_BUFFER:
      o->buffer = malloc(buf_size);
      if (!o->buffer) {
        fprintf(stderr, "Memory allocation of %zu failed\n", buf_size);
        return partial_destructor(o, state);
      }
      break;
    case INIT_STATE_FILE:
      o->file = fopen(filename, "rb");
      if (!o->file) {
        fprintf(stderr, "Open %s failed\n", filename);
        return partial_destructor(o, state);
      }
      break;
    }
  }
  return o;
}

A common variation of this idiom is to eliminate the loop in the destructor and instead rely on fall-through between case statements:

static object_t *partial_destructor(object_t *const o, init_state_t state)
{
  if (state-- > INIT_STATE_FIRST) {
    switch (state) {
    case INIT_STATE_FILE:
      fclose(o->file);
      // fallthrough
    case INIT_STATE_BUFFER:
      free(o->buffer);
    }
  }
  free(o);
  return NULL;
}

I don't favour this variation because it makes the destructor fragile: reordering the enumeration no longer changes the order in which resources are freed, which can cause leaks or attempts to free resources which were never allocated. It doesn't even have the advantage of brevity because linters require fall-through to be explicitly annotated. The efficiency of object destruction is rarely significant and the compiler might unroll a destructor loop anyway.

Objections

I feel that refactoring load_file into two functions was done somewhat arbitrarily.

It uses the facilities that C provides to give the program structure. It's true that the structure reflects the constraints of programming in a language that lacks automatic destructors, but that is the difference between a well-written C program, and a C program written in the same style as another language that has fewer constraints. This is also why C programming is a craft, not simply bashing out code.

I like labels named out1, out2 and out3 to make my error handling code feel like a stack.

But C does have a stack... and nesting... and function calls.

But what if I want to do clean-up before returning?

Then you have probably structured your program wrongly, especially if you are likely to return in many places and want to perform the same clean-up. It's equally possible that you don't always want to do clean-up (e.g. in the example I gave, where the FILE * could be stdin), which is why programs should be composed of functions, not goto statements and labels.

Refactoring code into another function doesn't really tackle complexity, it just kicks the can somewhere else.

"Kicking the can" is the essence of stepwise refinement, abstraction, and everything good about software engineering.

C's version of goto does not contradict structured programming because it is only allowed within functions.

I don't think any plausible interpretation of structured programming agrees with this. If anything, the formal definition of structured programming is stricter than what I advocate.

Goto is just a slightly relaxed version of break, continue and return.

It's true that break, continue and return are deviations from structured programming in its strictest form, but persuading people is a balance of 'carrot' and 'stick'. I wouldn't describe constraining branches to the end of a loop, the end of a function, or the statement immediately after the end of the current loop, as a 'slight' restriction, therefore I wouldn't describe removing those constraints as a 'slight' relaxation either.

We typically read functions from top to bottom. While reading a function, we can keep track all the labels we read.

Not knowing where execution of a program jumps to without reading the entire program is practically the definition of an unstructured program. A function is simply a sub-program.

Is it OK to use goto exclusively for forward branches?

Some coding standards allow this, but the person reading code doesn't necessarily know what standard was in force when it was written, or whether the author adhered to that standard. In contrast, there is no ambiguity about whether or not break and continue branch forwards because they always do.

Goto is the best we can hope for in the absence of automatic destructors.

Just because C++ allows you to avoid thinking about the consequences of acquiring a resource, that doesn't mean you should adopt the same habits in C. A structured C program does not have the same appearance as a C++ program with a load of goto and labels thrown in. (Or calls to longjmp, if trying to emulate C++ exceptions too.)

Using do...while for error handling is misleading because it implies iteration

This is already a common idiom for macro definitions; the only difference here is that the loop isn't hidden by the pre-processor. After I became accustomed to it, using do in place of try seemed natural to me. In any case, it's wrong to assume that any loop has more than one iteration (or more than zero iterations, in the case of while or for loops).

What if I had to initialize a mutex in between two other types of initialization?

The obvious answer is "Don't write code like that". It's almost always better to group resource allocations by type. Failing that, use a state machine.

I prefer labels because I don't know where you're breaking to without reading the rest of the function

This argument doesn't apply to do...while loops, where the end of the loop is obvious. In other cases, the start and end of the containing loop (or switch) should be visually aligned.

At least I can search for labels by name.

This is an interesting criticism of C's syntax (and that of most modern languages) which has nothing to do with error handling. Taken to an absurd conclusion, one should always use goto and never use break or continue. I don't think the ability to search outweighs the advantage of knowing that the control flow of a program deviates from its apparent structure in limited and predictable ways.

The state machine seems like massive overengineering.

It is a massively overengineered solution to the toy problem I presented. However, when you've seen functions that contain hundreds of labels jumbled up with preprocessor logic, you might feel differently. It's useful to have a tool in your arsenal which scales to an unlimited number of initialisations (unlike goto, which requires strict reverse ordering of termination without providing any means of validating that beyond giving yourself eyestrain).

I wouldn't trust myself to get the reverse-while loop right.

Although use of the post-decrement operator in a while statement may look confusing at first, this is a common idiom so you might as well learn it (just as you once learned to write an idiomatic for loop).