In my opinion piece, 'C versus C++: fight!', I mentioned that it's possible to do generic programming using the C preprocessor. This tutorial provides examples to show why such techniques are useful and how to implement them.
The word generic is from the Latin root 'genus', simply meaning "kind" or "sort". Dictionary definitions of the word typically say something like "relating to a group of things" or "characteristic of a class of things".
Wikipedia defines generic programming as:
a style of computer programming in which algorithms are written in terms of types to-be-specified-later that are then instantiated when needed for specific types provided as parameters.
For example, genericized code to implement an abstract data type such as a stack could be instantiated differently for an integer type, a floating-point type, or a user-defined type. This style of programming is also known as parametric polymorphism.
Hopefully this tutorial will be of use to people learning C programming, or those unsure of how such a theoretical concept applies to a low-level language such as C.
In programming theory, polymorphism means a single interface to entities of different types. Parametric simply means that something is defined in terms of parameters.
Different kinds of polymorphism are applicable to different types of problem. It's important to consider which variety of polymorphism fits your use case (and whether you need polymorphism at all).
Parametric polymorphism can be contrasted with ad hoc polymorphism and subtype polymorphism:
int
) to be used with an
interface; subtype polymorphism instead requires
user-defined types.
In general, parametric polymorphism is used to specialize data structures for specific types, whereas subtype polymorphism is used to make business logic more reusable and composable. I've yet to find a use for ad hoc polymorphism beyond gimmicks and conveniences. (How hard is it really to call the appropriate function for a given data type?)
In the C programming language:
_Generic()
expressions.
A former colleague of mine (for whom I have great respect) used to say:
Every time a programmer says, 'I don't believe in macros,' a bug dies.
Of course this is simply a variation of a more famous statement about children and fairies in J.M.Barrie's classic story, ' Peter Pan'.
It's true that macros can be bad for code correctness,
readability and debuggability, but C is not a complete
language without its preprocessor. You could choose to use
only #include
and abjure other preprocessor
commands such as #define
, but in so doing, you
would lose many useful language facilities. Let's not forget
that bool
, NULL
,
INT_MAX
and assert()
are all macros, as
are indispensable pseudo-functions such as
ARRAY_SIZE()
and container_of()
.
I prefer external linkage of functions over fine-grained use
of #if
or #ifdef
, enumeration
constants over #define
constants, and inline
functions over function-like macros. However, the foregoing
credo should not be taken to mean that the C preprocessor has
no legitimate uses.
Stroustrup dismissed "the ugly and low-level semantics of Cpp" in 'The Design and Evolution of C++' (1994), in which he also wrote:
The key activity in determining what we needed from a parameterized type facility was to write programs using macros to fake parameterized types.
I often see this attitude from people who wish to dismiss C as unsuitable for a given purpose. When presented with a solution, they tend to dismiss it as 'fake' because it doesn't fit their preconceived notions. I'll let you judge whether the parameterization that I am about to present is real or not.
Let's imagine that we want to implement a flood fill algorithm for an image editor. Firstly, the pixel at a starting location is filled. Then, each pixel neighbouring a previously filled pixel is likewise filled if it has the same colour as the initial pixel originally had.
A simple solution to this problem might use recursion. A recursive function calls itself (directly or indirectly), for example in order to explore a branch of a tree-like data structure without losing track of its previous position. A flood fill algorithm interprets an image as just such a tree.
Here's a C program which implements the flood fill algorithm recursively:
#include <stdio.h> #include <stdlib.h> #include <string.h> enum { IMG_SIZE = 8 }; struct image { char pixels[IMG_SIZE][IMG_SIZE]; }; static void flood_fill(struct image *const img, int x, int y, char find, char replace) { if (x < 0 || y < 0 || x >= IMG_SIZE || y >= IMG_SIZE || img->pixels[y][x] != find) { return; } img->pixels[y][x] = replace; flood_fill(img, x+1, y, find, replace); flood_fill(img, x, y+1, find, replace); flood_fill(img, x-1, y, find, replace); flood_fill(img, x, y-1, find, replace); } int main(int argc, char const *argv[]) { struct image img = {{ {' ', ' ', ' ', '#', '#', ' ', ' ', ' '}, {' ', '#', ' ', ' ', ' ', ' ', '#', ' '}, {' ', ' ', '#', ' ', '#', '#', ' ', ' '}, {'#', '#', '#', ' ', ' ', '#', '#', '#'}, {'#', ' ', '#', '#', ' ', '#', ' ', '#'}, {' ', ' ', '#', ' ', ' ', '#', ' ', '#'}, {'#', ' ', '#', ' ', '#', '#', ' ', ' '}, {'#', '#', ' ', ' ', '#', ' ', '#', '#'}, }}; for (int y = 0; y < IMG_SIZE; ++y) { printf("%.*s\n", IMG_SIZE, img.pixels[y]); } puts("Enter coordinates (x,y):"); char input[16]; int start_x, start_y; if (fgets(input, sizeof(input), stdin) == NULL || strchr(input, '\n') == NULL || sscanf(input, "%d,%d", &start_x, &start_y) != 2 || start_x < 0 || start_y < 0 || start_x >= IMG_SIZE || start_y >= IMG_SIZE) { fputs("Bad input\n", stderr); return EXIT_FAILURE; } char const find = img.pixels[start_y][start_x]; flood_fill(&img, start_x, start_y, find, '.'); puts("New image:"); for (int y = 0; y < IMG_SIZE; ++y) { printf("%.*s\n", IMG_SIZE, img.pixels[y]); } return EXIT_SUCCESS; }
Recursion has one major drawback: storage for automatic
variables used within a function cannot be freed until it
exits. Too many nested function calls can cause stack
overflow. If overflow occurs, the program is terminated
abruptly without first having been given a chance to recover
(unlike a failed call to malloc()
, for example).
Given that the C runtime system uses a stack to store variable values for active function calls, the obvious solution is to implement a stack data type within the program itself. This explicit stack can be used to keep track of the progress of the flood fill operation, instead of using the call stack implicitly via recursion.
The code below implements a stack data type, including functions to push and pop items, and another function to check whether the stack is empty:
#include <stdbool.h> #include <stdlib.h> #include <assert.h> struct coord { int x, y; }; struct coord_stack { size_t nitems, limit; struct coord *items; }; bool coord_stack_init(struct coord_stack *const stack, size_t const limit) { *stack = (struct coord_stack){ .nitems = 0, .limit = limit, .items = malloc(sizeof(struct coord) * limit) }; return stack->items != NULL; } void coord_stack_term(struct coord_stack *const stack) { free(stack->items); } bool coord_stack_push(struct coord_stack *const stack, struct coord item) { if (stack->nitems >= stack->limit) { return false; } stack->items[stack->nitems++] = item; return true; } struct coord coord_stack_pop(struct coord_stack *const stack) { assert(stack->nitems > 0); return stack->items[--stack->nitems]; } bool coord_stack_is_empty(struct coord_stack *const stack) { return stack->nitems == 0; }
Nothing in this stack implementation cares what the item type
struct coord
actually represents, therefore this
code is a good candidate for being genericized in future.
Here's the flood_fill()
function rewritten not
to be recursive, using the new data type:
enum { STACK_LIMIT = 128 }; static void flood_fill(struct image *const img, int x, int y, char find, char replace) { struct coord_stack stack; if (!coord_stack_init(&stack, STACK_LIMIT)) { fputs("Out of memory\n", stderr); return; } coord_stack_push(&stack, (struct coord){x,y}); while (!coord_stack_is_empty(&stack)) { struct coord p = coord_stack_pop(&stack); if (p.x < 0 || p.y < 0 || p.x >= IMG_SIZE || p.y >= IMG_SIZE || img->pixels[p.y][p.x] != find) { continue; } img->pixels[p.y][p.x] = replace; if (!coord_stack_push(&stack, (struct coord){p.x+1,p.y}) || !coord_stack_push(&stack, (struct coord){p.x,p.y+1}) || !coord_stack_push(&stack, (struct coord){p.x-1,p.y}) || !coord_stack_push(&stack, (struct coord){p.x,p.y-1})) { fputs("Stack overflow\n", stderr); break; } } coord_stack_term(&stack); }
Now that we have a flood fill algorithm for our image editor, we can think about making the editor more user-friendly. It's easy to accidentally spoil an image by filling the wrong set of pixels.
Support for undo and redo requires an editor to store a list of previous edits that can be undone (in reverse order), and a list of undone edits that can be redone (in their original order).
One possible implementation uses two stacks:
Here is the main()
function of the image editor,
modified to support undo and redo:
#include <ctype.h> struct undo_record { struct coord start; struct image before; }; struct redo_record { struct coord start; }; static bool process_user_input(struct undo_stack *const undos, struct redo_stack *const redos, struct image *const img) { puts("'U'ndo, 'R'edo, 'Q'uit, or coords (x,y):"); char input[16]; if (fgets(input, sizeof(input), stdin) == NULL || strchr(input, '\n') == NULL) { fputs("Bad input\n", stderr); return true; // ready for next input } switch (tolower(input[0])) { case 'u': if (!undo_stack_is_empty(undos)) { struct undo_record undo = undo_stack_pop(undos); struct redo_record redo = { .start = undo.start, }; if (!redo_stack_push(redos, redo)) { fputs("Out of memory\n", stderr); undo_stack_push(undos, undo); break; } *img = undo.before; } break; case 'r': if (!redo_stack_is_empty(redos)) { struct redo_record redo = redo_stack_pop(redos); struct undo_record undo = { .start = redo.start, .before = *img, }; if (!undo_stack_push(undos, undo)) { fputs("Out of memory\n", stderr); redo_stack_push(redos, redo); break; } char const find = img->pixels[undo.start.y][undo.start.x]; flood_fill(img, undo.start.x, undo.start.y, find, '.'); } break; case 'q': return false; // no more user input default: { int start_x, start_y; if (sscanf(input, "%d,%d", &start_x, &start_y) != 2 || start_x < 0 || start_y < 0 || start_x >= IMG_SIZE || start_y >= IMG_SIZE) { fputs("Bad input\n", stderr); break; } struct undo_record undo = { .start = {start_x, start_y}, .before = *img, }; if (!undo_stack_push(undos, undo)) { fputs("Out of memory\n", stderr); break; } while (!redo_stack_is_empty(redos)) { redo_stack_pop(redos); } char const find = img->pixels[start_y][start_x]; flood_fill(img, start_x, start_y, find, '.'); } break; } return true; // ready for next input } int main(int argc, char const *argv[]) { struct image img = {{ /* ... As before ... */ }}; struct undo_stack undos; if (!undo_stack_init(&undos, STACK_LIMIT)) { fputs("Out of memory\n", stderr); return EXIT_FAILURE; } int exit_code = EXIT_SUCCESS; struct redo_stack redos; if (!redo_stack_init(&redos, STACK_LIMIT)) { fputs("Out of memory\n", stderr); exit_code = EXIT_FAILURE; } else { do { for (int y = 0; y < IMG_SIZE; ++y) { printf("%.*s\n", IMG_SIZE, img.pixels[y]); } } while (process_user_input(&undos, &redos, &img)); redo_stack_term(&redos); } undo_stack_term(&undos); return exit_code; }
When compiling this code, you'll see errors caused by missing
declarations of functions such as
undo_stack_init()
and types such as struct
undo_stack
. These names are assumed to follow the same
pattern as the stack data type already implemented for the
flood_fill()
function.
The next question is how to go about adding the missing definitions.
The simplest way forward would be to make two copies of the existing stack code and amend them for use as undo and redo stacks. The resultant code would be type-safe, easy to debug and read, but any future changes would need to be duplicated in three places. This could be costly to maintain.
To avoid this code duplication we could instead define a union to represent all three stack item types (coordinates, undo and redo). Every stack item would use as much memory as required by the largest item type:
union stack_item { struct coord coord; struct undo_record undo; struct redo_record redo; };
Effectively, this is subtype polymorphism, and would
therefore require a (conceptual) downcast to the appropriate
subtype after popping an item from the stack. That isn't
type-safe, although the union
could be wrapped
in a struct
to bundle it with an indication of
which member is valid.
Another alternative would be to store a void *
pointer in each stack item instead of a struct
type. That isn't type-safe either and would require separate
storage to be allocated and freed for every stack item, which
seems error-prone and inefficient.
I wrote earlier that generic code is written using 'types to-be-specified-later'. Sometimes, a viable alternative is never to specify types at all!
Kernighan & Ritchie offered the following example of a function-like macro that "will serve for any data type" in 'The C Programming Language' (1988):
#define max(A, B) ((A) > (B) ? (A) : (B))
This kind of macro works because the operators provided by
the C language ([]
, ->
,
*
, ++
, and others) are like inbuilt
generic functions that work on any compatible type (within
reason).
Our stack data type is a lot more complex than a simple macro to select the maximum of two values, but if we could harness the same technique for stack methods then they could be invoked for all three stack types.
Here are the stack methods redefined as function-like macros:
#define STACK_INIT(stack, new_limit) \ ((stack)->nitems = 0, \ (stack)->limit = (new_limit), \ ((stack)->items = malloc(sizeof((stack)->items[0]) * \ (new_limit))) != NULL) #define STACK_TERM(stack) \ free((stack)->items) #define STACK_PUSH(stack, item) \ (((stack)->nitems >= (stack)->limit) ? \ false : \ ((stack)->items[(stack)->nitems++] = (item), true)) #define STACK_POP(stack) \ ((stack)->items[--(stack)->nitems]) #define STACK_IS_EMPTY(stack) \ ((stack)->nitems == 0)
Here's the flood_fill()
function rewritten to
use type-generic macros:
static void flood_fill(struct image *const img, int x, int y, char find, char replace) { struct coord_stack stack; if (!STACK_INIT(&stack, STACK_LIMIT)) { fputs("Out of memory\n", stderr); return; } STACK_PUSH(&stack, ((struct coord){x,y})); while (!STACK_IS_EMPTY(&stack)) { struct coord p = STACK_POP(&stack); if (p.x < 0 || p.y < 0 || p.x >= IMG_SIZE || p.y >= IMG_SIZE || img->pixels[p.y][p.x] != find) { continue; } img->pixels[p.y][p.x] = replace; if (!STACK_PUSH(&stack, ((struct coord){p.x+1,p.y})) || !STACK_PUSH(&stack, ((struct coord){p.x,p.y+1})) || !STACK_PUSH(&stack, ((struct coord){p.x,p.y+1})) || !STACK_PUSH(&stack, ((struct coord){p.x,p.y-1}))) { fputs("Stack overflow\n", stderr); break; } } STACK_TERM(&stack); }
It would be a bit misleading to describe these macros as not
type-safe, but the compiler can only check usage of the
underlying struct
type (defined elsewhere), so
it is hard to interpret any errors:
stacks.c:106:43: error: incompatible types when assigning to type 'struct redo_record' from type 'struct coord' 106 | ((stack)->items[(stack)->nitems++] = (item), true)) | ^ stacks.c:123:5: note: in expansion of macro 'STACK_PUSH' 123 | STACK_PUSH(&stack, ((struct coord){x,y})); | ^~~~~~~~~~
This use of function-like macros also has other drawbacks:
,
) instead of splitting
code into logical blocks.
new_limit
stored by STACK_INIT()
might
differ from the value used to allocate memory for items.
limit
was both a
member of struct coord_stack
and an argument
to the original coord_stack_init()
).
STACK_PUSH(&stack, (struct
coord){x,y})
.
However, if you're happy with this solution, then good luck to you!
What we really want is to be able to specify the stack item
type as a parameter, to be used when instantiating variations
of the original struct
and function definitions.
The obvious way to parameterize anything at compile time is
to use macros. Instead of expanding directly to in-line
method code (as in the previous example), the following
function-like macros expand to the definitions
required to instantiate the stack data type for a given
item_type
:
#define STACK_STRUCT(tag, item_type) \ struct tag { \ size_t nitems, limit; \ item_type *items; \ }; #define STACK_PUSH(tag, item_type) \ bool tag ## _push(struct tag *const stack, \ item_type item) \ { \ if (stack->nitems >= stack->limit) { \ return false; \ } \ stack->items[stack->nitems++] = item; \ return true; \ } #define STACK_POP(tag, item_type) \ item_type tag ## _pop(struct tag *const stack) \ { \ assert(stack->nitems > 0); \ return stack->items[--stack->nitems]; \ } // etc. #define STACK_DECLARE(tag, item_type) \ STACK_STRUCT(tag, item_type) \ STACK_INIT(tag, item_type) \ STACK_TERM(tag, item_type) \ STACK_PUSH(tag, item_type) \ STACK_POP(tag, item_type) \ STACK_IS_EMPTY(tag, item_type)
The preprocessor's ##
operator is used to
concatenate each stack method name with the
struct
tag of the instantiated type, in order to
create a unique function name with the expected prefix. This
allows more than one stack type to be instantiated in the
same source file.
Parameterizing the other stack function definitions in the same style as 'push' and 'pop' has been left as an exercise for the reader.
The three stack types required by the image editor can now be instantiated as follows:
STACK_DECLARE(coord_stack, struct coord) STACK_DECLARE(undo_stack, struct undo_record) STACK_DECLARE(redo_stack, struct redo_record)
If a stack method is accidentally called with the wrong type of argument then we get a more helpful error than we did when using type-generic macros:
stacks.c:129:22: warning: passing argument 1 of 'coord_stack_push' from incompatible pointer type [-Wincompatible-pointer-types] 129 | coord_stack_push(&stack, (struct coord){x,y}); | ^~~~~~ | | | struct redo_stack * stacks.c:80:37: note: expected 'struct coord_stack * const' but argument is of type 'struct redo_stack *' 80 | bool tag ## _push(struct tag *const stack, \ | ~~~~~~~~~~~~~~~~~~^~~~~ stacks.c:107:5: note: in expansion of macro 'STACK_PUSH' 107 | STACK_PUSH(tag, item_type) \ | ^~~~~~~~~~~ stacks.c:111:1: note: in expansion of macro 'STACK_DECLARE' 111 | STACK_DECLARE(coord_stack, struct coord) | ^~~~~~~~~~~~~~
The new stack method definitions can be read and written more like ordinary functions, but I still think it's possible to improve on this solution:
\
.
STACK_DECLARE()
look like
function calls, which are nonsensical outside of a
function.
STACK_DECLARE()
invocation, but standard C does not
allow an extra ;
outside of a function.
C actually provides two mechanisms of substituting
parameterized code into a program. The obvious one is macros,
which we already looked at; the other is the humble
#include
directive.
An #include
line is normally used to include a
header file in which functions with external linkage are
declared, but it is equally useful for pasting arbitrary code
snippets into a program.
This is a pathway to many abilities some consider to be unnatural.
—Chancellor Palpatine, Revenge of the Sith
Unlike a macro, an#include
line cannot have
parameters (apart from the filename). That doesn't matter
because any macros defined before reaching the
#include
line are also expanded while processing the
included file.
Here's some of the original stack code, parameterized using two macros that must be defined before including this file:
// generic_stack.h #if !defined(STACK_TAG) || !defined(STACK_ITEM_TYPE) #error Missing type or tag definition #endif #define STACK_CONCAT(tag, method) tag ## _ ## method #define STACK_METHOD2(tag, method) STACK_CONCAT(tag, method) #define STACK_METHOD(method) STACK_METHOD2(STACK_TAG, method) struct STACK_TAG { size_t nitems, limit; STACK_ITEM_TYPE *items; }; bool STACK_METHOD(push)(struct STACK_TAG *const stack, STACK_ITEM_TYPE item) { if (stack->nitems >= stack->limit) { return false; } stack->items[stack->nitems++] = item; return true; } STACK_ITEM_TYPE STACK_METHOD(pop)(struct STACK_TAG *const stack) { assert(stack->nitems > 0); return stack->items[--stack->nitems]; } // etc. #undef STACK_TAG #undef STACK_ITEM_TYPE #undef STACK_CONCAT #undef STACK_METHOD2 #undef STACK_METHOD
Parameterizing the missing function definitions has once again been left as an exercise for the reader.
Unlike an ordinary header file, "generic_stack.h" must
not contain the customary #ifndef
,
#define
and #endif
lines to guard against
it being included more than once. This is to allow a stack to
be instantiated for different types in the same source file.
The necessity for the STACK_METHOD2()
macro is
somewhat obscure, but is explained by Appendix A of K&R
(1988):
Unless the parameter in the replacement sequence is preceded by #, or preceded or followed by ##, the argument tokens are examined for macro calls, and expanded as necessary, just before insertion.
In other words, had STACK_TAG
been used directly
as an argument to STACK_CONCAT()
then the
resultant function name would have been something like
STACK_TAG_push()
!
The three stack types required by the image editor can now be instantiated as follows:
#define STACK_TAG coord_stack #define STACK_ITEM_TYPE struct coord #include "generic_stack.h" #define STACK_TAG undo_stack #define STACK_ITEM_TYPE struct undo_record #include "generic_stack.h" #define STACK_TAG redo_stack #define STACK_ITEM_TYPE struct redo_record #include "generic_stack.h"
#define
?
STACK_METHOD()
?
STACK_TAG
macro merely specify a
struct
tag, whereas
STACK_ITEM_TYPE
specifies a whole type?
I hope you found this tutorial useful; I had fun writing it. If not, why not comment and let me know what other C programming topics you would like to read about?
Thanks for reading.