Tutorial: Generics in C

Christopher Bazley, October-December 2022

A teething toy monkey reclining on a building of toy wooden bricks of different colours and shapes

Introduction

In my opinion piece, 'C versus C++: fight!', I mentioned that it's possible to do generic programming using the C preprocessor. This tutorial provides examples to show why such techniques are useful and how to implement them.

The word generic is from the Latin root 'genus', simply meaning "kind" or "sort". Dictionary definitions of the word typically say something like "relating to a group of things" or "characteristic of a class of things".

Wikipedia defines generic programming as:

a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters.

For example, genericized code to implement an abstract data type such as a stack could be instantiated differently for an integer type, a floating-point type, or a user-defined type. This style of programming is also known as parametric polymorphism.

Hopefully this tutorial will be of use to people learning C programming, or those unsure of how such a theoretical concept applies to a low-level language such as C.

Parametric what?

In programming theory, polymorphism means a single interface to entities of different types. Parametric simply means that something is defined in terms of parameters.

Different kinds of polymorphism are applicable to different types of problem. It's important to consider which variety of polymorphism fits your use case (and whether you need polymorphism at all).

Parametric polymorphism can be contrasted with ad hoc polymorphism and subtype polymorphism:

In general, parametric polymorphism is used to specialize data structures for specific types, whereas subtype polymorphism is used to make business logic more reusable and composable. I've yet to find a use for ad hoc polymorphism beyond gimmicks and conveniences. (How hard is it really to call the appropriate function for a given data type?)

In the C programming language:

Aren't macros considered harmful?

A former colleague of mine (for whom I have great respect) used to say:

Every time a programmer says, 'I don't believe in macros,' a bug dies.

Of course this is simply a variation of a more famous statement about children and fairies in J.M.Barrie's classic story, ' Peter Pan'.

It's true that macros can be bad for code correctness, readability and debuggability, but C is not a complete language without its preprocessor. You could choose to use only #include and abjure other preprocessor commands such as #define, but in so doing, you would lose many useful language facilities. Let's not forget that bool, NULL, INT_MAX and assert() are all macros, as are indispensable pseudo-functions such as ARRAY_SIZE() and container_of().

I prefer external linkage of functions over fine-grained use of #if or #ifdef, enumeration constants over #define constants, and inline functions over function-like macros. However, the foregoing credo should not be taken to mean that the C preprocessor has no legitimate uses.

Stroustrup dismissed "the ugly and low-level semantics of Cpp" in 'The Design and Evolution of C++' (1994), in which he also wrote:

The key activity in determining what we needed from a parameterized type facility was to write programs using macros to fake parameterized types.

I often see this attitude from people who wish to dismiss C as unsuitable for a given purpose. When presented with a solution, they tend to dismiss it as 'fake' because it doesn't fit their preconceived notions. I'll let you judge whether the parameterization that I am about to present is real or not.

Digression into recursion

Let's imagine that we want to implement a flood fill algorithm for an image editor. Firstly, the pixel at a starting location is filled. Then, each pixel neighbouring a previously filled pixel is likewise filled if it has the same colour as the initial pixel originally had.

A simple solution to this problem might use recursion. A recursive function calls itself (directly or indirectly), for example in order to explore a branch of a tree-like data structure without losing track of its previous position. A flood fill algorithm interprets an image as just such a tree.

Here's a C program which implements the flood fill algorithm recursively:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum {
    IMG_SIZE = 8
};

struct image {
    char pixels[IMG_SIZE][IMG_SIZE];
};

static void flood_fill(struct image *const img,
                       int x, int y,
                       char find, char replace)
{
    if (x < 0 || y < 0 ||
        x >= IMG_SIZE || y >= IMG_SIZE ||
        img->pixels[y][x] != find) {
        return;
    }

    img->pixels[y][x] = replace;
    flood_fill(img, x+1, y, find, replace);
    flood_fill(img, x, y+1, find, replace);
    flood_fill(img, x-1, y, find, replace);
    flood_fill(img, x, y-1, find, replace);
}

int main(int argc, char const *argv[])
{
    struct image img = {{
        {' ', ' ', ' ', '#', '#', ' ', ' ', ' '},
        {' ', '#', ' ', ' ', ' ', ' ', '#', ' '},
        {' ', ' ', '#', ' ', '#', '#', ' ', ' '},
        {'#', '#', '#', ' ', ' ', '#', '#', '#'},
        {'#', ' ', '#', '#', ' ', '#', ' ', '#'},
        {' ', ' ', '#', ' ', ' ', '#', ' ', '#'},
        {'#', ' ', '#', ' ', '#', '#', ' ', ' '},
        {'#', '#', ' ', ' ', '#', ' ', '#', '#'},
    }};

    for (int y = 0; y < IMG_SIZE; ++y) {
        printf("%.*s\n", IMG_SIZE, img.pixels[y]);
    }

    puts("Enter coordinates (x,y):");
    char input[16];
    int start_x, start_y;

    if (fgets(input, sizeof(input), stdin) == NULL ||
        strchr(input, '\n') == NULL ||
        sscanf(input, "%d,%d", &start_x, &start_y) != 2 ||
        start_x < 0 || start_y < 0 ||
        start_x >= IMG_SIZE || start_y >= IMG_SIZE)
    {
        fputs("Bad input\n", stderr);
        return EXIT_FAILURE;
    }

    char const find = img.pixels[start_y][start_x];
    flood_fill(&img, start_x, start_y, find, '.');

    puts("New image:");
    for (int y = 0; y < IMG_SIZE; ++y) {
        printf("%.*s\n", IMG_SIZE, img.pixels[y]);
    }

    return EXIT_SUCCESS;
}

Recursion has one major drawback: storage for automatic variables used within a function cannot be freed until it exits. Too many nested function calls can cause stack overflow. If overflow occurs, the program is terminated abruptly without first having been given a chance to recover (unlike a failed call to malloc(), for example).

Given that the C runtime system uses a stack to store variable values for active function calls, the obvious solution is to implement a stack data type within the program itself. This explicit stack can be used to keep track of the progress of the flood fill operation, instead of using the call stack implicitly via recursion.

A stack data type

The code below implements a stack data type, including functions to push and pop items, and another function to check whether the stack is empty:

#include <stdbool.h>
#include <stdlib.h>
#include <assert.h>

struct coord {
    int x, y;
};

struct coord_stack {
    size_t nitems, limit;
    struct coord *items;
};

bool coord_stack_init(struct coord_stack *const stack,
                      size_t const limit)
{
    *stack = (struct coord_stack){
        .nitems = 0,
        .limit = limit,
        .items = malloc(sizeof(struct coord) * limit)
    };
    return stack->items != NULL;
}

void coord_stack_term(struct coord_stack *const stack)
{
    free(stack->items);
}

bool coord_stack_push(struct coord_stack *const stack,
                      struct coord item)
{
    if (stack->nitems >= stack->limit) {
        return false;
    }
    stack->items[stack->nitems++] = item;
    return true;
}

struct coord coord_stack_pop(struct coord_stack *const stack)
{
    assert(stack->nitems > 0);
    return stack->items[--stack->nitems];
}

bool coord_stack_is_empty(struct coord_stack *const stack)
{
    return stack->nitems == 0;
}

Nothing in this stack implementation cares what the item type struct coord actually represents, therefore this code is a good candidate for being genericized in future.

Here's the flood_fill() function rewritten not to be recursive, using the new data type:

enum {
    STACK_LIMIT = 128
};

static void flood_fill(struct image *const img,
                       int x, int y, char find, char replace)
{
    struct coord_stack stack;
    if (!coord_stack_init(&stack, STACK_LIMIT)) {
        fputs("Out of memory\n", stderr);
        return;
    }

    coord_stack_push(&stack, (struct coord){x,y});

    while (!coord_stack_is_empty(&stack)) {
        struct coord p = coord_stack_pop(&stack);

        if (p.x < 0 || p.y < 0 ||
            p.x >= IMG_SIZE ||
            p.y >= IMG_SIZE ||
            img->pixels[p.y][p.x] != find)
        {
            continue;
        }

        img->pixels[p.y][p.x] = replace;

        if (!coord_stack_push(&stack, (struct coord){p.x+1,p.y}) ||
            !coord_stack_push(&stack, (struct coord){p.x,p.y+1}) ||
            !coord_stack_push(&stack, (struct coord){p.x-1,p.y}) ||
            !coord_stack_push(&stack, (struct coord){p.x,p.y-1}))
        {
            fputs("Stack overflow\n", stderr);
            break;
        }
    }

    coord_stack_term(&stack);
}

Yet more stacks

Now that we have a flood fill algorithm for our image editor, we can think about making the editor more user-friendly. It's easy to accidentally spoil an image by filling the wrong set of pixels.

Support for undo and redo requires an editor to store a list of previous edits that can be undone (in reverse order), and a list of undone edits that can be redone (in their original order).

One possible implementation uses two stacks:

The 'undo' stack also needs to store a (full or partial) snapshot of the state of the image before each edit, so that it can restore that state.

Here is the main() function of the image editor, modified to support undo and redo:

#include <ctype.h>

struct undo_record {
    struct coord start;
    struct image before;
};

struct redo_record {
    struct coord start;
};

static bool process_user_input(struct undo_stack *const undos,
                               struct redo_stack *const redos,
                               struct image *const img)
{
    puts("'U'ndo, 'R'edo, 'Q'uit, or coords (x,y):");
    char input[16];

    if (fgets(input, sizeof(input), stdin) == NULL ||
        strchr(input, '\n') == NULL)
    {
        fputs("Bad input\n", stderr);
        return true; // ready for next input
    }

    switch (tolower(input[0])) {
    case 'u':
        if (!undo_stack_is_empty(undos)) {
            struct undo_record undo =
                undo_stack_pop(undos);

            struct redo_record redo = {
                .start = undo.start,
            };

            if (!redo_stack_push(redos, redo)) {
                fputs("Out of memory\n", stderr);
                undo_stack_push(undos, undo);
                break;
            }
            *img = undo.before;
        }
        break;

    case 'r':
        if (!redo_stack_is_empty(redos)) {
            struct redo_record redo =
                redo_stack_pop(redos);

            struct undo_record undo = {
                .start = redo.start,
                .before = *img,
            };

            if (!undo_stack_push(undos, undo)) {
                fputs("Out of memory\n", stderr);
                redo_stack_push(redos, redo);
                break;
            }

            char const find =
                img->pixels[undo.start.y][undo.start.x];

            flood_fill(img, undo.start.x, undo.start.y, find, '.');
        }
        break;

    case 'q':
        return false; // no more user input

    default:
        {
            int start_x, start_y;

            if (sscanf(input, "%d,%d",
                        &start_x, &start_y) != 2 ||
                start_x < 0 || start_y < 0 ||
                start_x >= IMG_SIZE || start_y >= IMG_SIZE)
            {
                fputs("Bad input\n", stderr);
                break;
            }

            struct undo_record undo = {
                .start = {start_x, start_y},
                .before = *img,
            };

            if (!undo_stack_push(undos, undo)) {
                fputs("Out of memory\n", stderr);
                break;
            }

            while (!redo_stack_is_empty(redos)) {
                redo_stack_pop(redos);
            }

            char const find = img->pixels[start_y][start_x];
            flood_fill(img, start_x, start_y, find, '.');
        }
        break;
    }

    return true; // ready for next input
}

int main(int argc, char const *argv[])
{
    struct image img = {{
        /* ... As before ... */
    }};

    struct undo_stack undos;

    if (!undo_stack_init(&undos, STACK_LIMIT)) {
        fputs("Out of memory\n", stderr);
        return EXIT_FAILURE;
    }

    int exit_code = EXIT_SUCCESS;
    struct redo_stack redos;

    if (!redo_stack_init(&redos, STACK_LIMIT)) {
        fputs("Out of memory\n", stderr);
        exit_code = EXIT_FAILURE;
    } else {
        do {
            for (int y = 0; y < IMG_SIZE; ++y) {
                printf("%.*s\n", IMG_SIZE, img.pixels[y]);
            }
        } while (process_user_input(&undos, &redos, &img));

        redo_stack_term(&redos);
    }

    undo_stack_term(&undos);
    return exit_code;
}

When compiling this code, you'll see errors caused by missing declarations of functions such as undo_stack_init() and types such as struct undo_stack. These names are assumed to follow the same pattern as the stack data type already implemented for the flood_fill() function.

The next question is how to go about adding the missing definitions.

Not-so-generic solutions

The simplest way forward would be to make two copies of the existing stack code and amend them for use as undo and redo stacks. The resultant code would be type-safe, easy to debug and read, but any future changes would need to be duplicated in three places. This could be costly to maintain.

To avoid this code duplication we could instead define a union to represent all three stack item types (coordinates, undo and redo). Every stack item would use as much memory as required by the largest item type:

union stack_item {
  struct coord coord;
  struct undo_record undo;
  struct redo_record redo;
};

Effectively, this is subtype polymorphism, and would therefore require a (conceptual) downcast to the appropriate subtype after popping an item from the stack. That isn't type-safe, although the union could be wrapped in a struct to bundle it with an indication of which member is valid.

Another alternative would be to store a void * pointer in each stack item instead of a struct type. That isn't type-safe either and would require separate storage to be allocated and freed for every stack item, which seems error-prone and inefficient.

Type-generic macros

I wrote earlier that generic code is written using 'types to-be-specified-later'. Sometimes, a viable alternative is never to specify types at all!

Kernighan & Ritchie offered the following example of a function-like macro that "will serve for any data type" in 'The C Programming Language' (1988):

#define max(A, B) ((A) > (B) ? (A) : (B))

This kind of macro works because the operators provided by the C language ([], ->, *, ++, and others) are like inbuilt generic functions that work on any compatible type (within reason).

Our stack data type is a lot more complex than a simple macro to select the maximum of two values, but if we could harness the same technique for stack methods then they could be invoked for all three stack types.

Here are the stack methods redefined as function-like macros:

#define STACK_INIT(stack, new_limit) \
    ((stack)->nitems = 0, \
     (stack)->limit = (new_limit), \
     ((stack)->items = malloc(sizeof((stack)->items[0]) * \
     (new_limit))) != NULL)

#define STACK_TERM(stack) \
    free((stack)->items)

#define STACK_PUSH(stack, item) \
    (((stack)->nitems >= (stack)->limit) ? \
     false : \
     ((stack)->items[(stack)->nitems++] = (item), true))

#define STACK_POP(stack) \
    ((stack)->items[--(stack)->nitems])

#define STACK_IS_EMPTY(stack) \
    ((stack)->nitems == 0)

Here's the flood_fill() function rewritten to use type-generic macros:

static void flood_fill(struct image *const img,
                       int x, int y, char find, char replace)
{
    struct coord_stack stack;
    if (!STACK_INIT(&stack, STACK_LIMIT)) {
        fputs("Out of memory\n", stderr);
        return;
    }

    STACK_PUSH(&stack, ((struct coord){x,y}));

    while (!STACK_IS_EMPTY(&stack)) {
       struct coord p = STACK_POP(&stack);
        if (p.x < 0 || p.y < 0 ||
            p.x >= IMG_SIZE ||
            p.y >= IMG_SIZE ||
            img->pixels[p.y][p.x] != find)
        {
            continue;
        }

        img->pixels[p.y][p.x] = replace;

        if (!STACK_PUSH(&stack, ((struct coord){p.x+1,p.y})) ||
            !STACK_PUSH(&stack, ((struct coord){p.x,p.y+1})) ||
            !STACK_PUSH(&stack, ((struct coord){p.x,p.y+1})) ||
            !STACK_PUSH(&stack, ((struct coord){p.x,p.y-1})))
        {
            fputs("Stack overflow\n", stderr);
            break;
        }
    }

    STACK_TERM(&stack);
}

It would be a bit misleading to describe these macros as not type-safe, but the compiler can only check usage of the underlying struct type (defined elsewhere), so it is hard to interpret any errors:

stacks.c:106:43: error: incompatible types when assigning to type
'struct redo_record' from type 'struct coord'
  106 |      ((stack)->items[(stack)->nitems++] = (item), true))
      |                                           ^
stacks.c:123:5: note: in expansion of macro 'STACK_PUSH'
  123 |     STACK_PUSH(&stack, ((struct coord){x,y}));
      |     ^~~~~~~~~~

This use of function-like macros also has other drawbacks:

However, if you're happy with this solution, then good luck to you!

Parameterized stack data type

What we really want is to be able to specify the stack item type as a parameter, to be used when instantiating variations of the original struct and function definitions.

The obvious way to parameterize anything at compile time is to use macros. Instead of expanding directly to in-line method code (as in the previous example), the following function-like macros expand to the definitions required to instantiate the stack data type for a given item_type:

#define STACK_STRUCT(tag, item_type) \
struct tag { \
    size_t nitems, limit; \
    item_type *items; \
};

#define STACK_PUSH(tag, item_type) \
bool tag ## _push(struct tag *const stack, \
                  item_type item) \
{ \
    if (stack->nitems >= stack->limit) { \
        return false; \
    } \
    stack->items[stack->nitems++] = item; \
    return true; \
}

#define STACK_POP(tag, item_type) \
item_type tag ## _pop(struct tag *const stack) \
{ \
    assert(stack->nitems > 0); \
    return stack->items[--stack->nitems]; \
}

// etc.

#define STACK_DECLARE(tag, item_type) \
    STACK_STRUCT(tag, item_type) \
    STACK_INIT(tag, item_type) \
    STACK_TERM(tag, item_type) \
    STACK_PUSH(tag, item_type) \
    STACK_POP(tag, item_type) \
    STACK_IS_EMPTY(tag, item_type)

The preprocessor's ## operator is used to concatenate each stack method name with the struct tag of the instantiated type, in order to create a unique function name with the expected prefix. This allows more than one stack type to be instantiated in the same source file.

Parameterizing the other stack function definitions in the same style as 'push' and 'pop' has been left as an exercise for the reader.

The three stack types required by the image editor can now be instantiated as follows:

STACK_DECLARE(coord_stack, struct coord)
STACK_DECLARE(undo_stack, struct undo_record)
STACK_DECLARE(redo_stack, struct redo_record)

If a stack method is accidentally called with the wrong type of argument then we get a more helpful error than we did when using type-generic macros:

stacks.c:129:22: warning: passing argument 1 of 'coord_stack_push'
from incompatible pointer type [-Wincompatible-pointer-types]
  129 |     coord_stack_push(&stack, (struct coord){x,y});
      |                      ^~~~~~
      |                      |
      |                      struct redo_stack *
stacks.c:80:37: note: expected 'struct coord_stack * const' but
argument is of type 'struct redo_stack *'
   80 | bool tag ## _push(struct tag *const stack, \
      |                   ~~~~~~~~~~~~~~~~~~^~~~~
stacks.c:107:5: note: in expansion of macro 'STACK_PUSH'
  107 |     STACK_PUSH(tag, item_type) \
      |     ^~~~~~~~~~~
stacks.c:111:1: note: in expansion of macro 'STACK_DECLARE'
  111 | STACK_DECLARE(coord_stack, struct coord)
      | ^~~~~~~~~~~~~~

The new stack method definitions can be read and written more like ordinary functions, but I still think it's possible to improve on this solution:

But I just want to write normal code

C actually provides two mechanisms of substituting parameterized code into a program. The obvious one is macros, which we already looked at; the other is the humble #include directive.

An #include line is normally used to include a header file in which functions with external linkage are declared, but it is equally useful for pasting arbitrary code snippets into a program.

This is a pathway to many abilities some consider to be unnatural.

—Chancellor Palpatine, Revenge of the Sith

Unlike a macro, an #include line cannot have parameters (apart from the filename). That doesn't matter because any macros defined before reaching the #include line are also expanded while processing the included file.

Here's some of the original stack code, parameterized using two macros that must be defined before including this file:

// generic_stack.h
#if !defined(STACK_TAG) || !defined(STACK_ITEM_TYPE)
#error Missing type or tag definition
#endif

#define STACK_CONCAT(tag, method) tag ## _ ## method
#define STACK_METHOD2(tag, method) STACK_CONCAT(tag, method)
#define STACK_METHOD(method) STACK_METHOD2(STACK_TAG, method)

struct STACK_TAG {
    size_t nitems, limit;
    STACK_ITEM_TYPE *items;
};

bool STACK_METHOD(push)(struct STACK_TAG *const stack,
                        STACK_ITEM_TYPE item)
{
    if (stack->nitems >= stack->limit) {
        return false;
    }
    stack->items[stack->nitems++] = item;
    return true;
}

STACK_ITEM_TYPE STACK_METHOD(pop)(struct STACK_TAG *const stack)
{
    assert(stack->nitems > 0);
    return stack->items[--stack->nitems];
}

// etc.

#undef STACK_TAG
#undef STACK_ITEM_TYPE
#undef STACK_CONCAT
#undef STACK_METHOD2
#undef STACK_METHOD

Parameterizing the missing function definitions has once again been left as an exercise for the reader.

Unlike an ordinary header file, "generic_stack.h" must not contain the customary #ifndef, #define and #endif lines to guard against it being included more than once. This is to allow a stack to be instantiated for different types in the same source file.

The necessity for the STACK_METHOD2() macro is somewhat obscure, but is explained by Appendix A of K&R (1988):

Unless the parameter in the replacement sequence is preceded by #, or preceded or followed by ##, the argument tokens are examined for macro calls, and expanded as necessary, just before insertion.

In other words, had STACK_TAG been used directly as an argument to STACK_CONCAT() then the resultant function name would have been something like STACK_TAG_push()!

The three stack types required by the image editor can now be instantiated as follows:

#define STACK_TAG coord_stack
#define STACK_ITEM_TYPE struct coord
#include "generic_stack.h"

#define STACK_TAG undo_stack
#define STACK_ITEM_TYPE struct undo_record
#include "generic_stack.h"

#define STACK_TAG redo_stack
#define STACK_ITEM_TYPE struct redo_record
#include "generic_stack.h"

Food for thought

That's all folks

I hope you found this tutorial useful; I had fun writing it. If not, why not comment and let me know what other C programming topics you would like to read about?

Thanks for reading.