Consistency and precision in Zig's type system
One of the articles that first got me interested in Zig is Kevin Lynagh’s article on writing keyboard firmware in Zig which frequently emphasizes the consistency of the language:
The language is so small and consistent that after a few hours of study I was able to load enough of it into my head to just do my work. … It feels like Zig is a language that I’d be able to master; to fully internalize such that I can use it without thinking about it. This feels super exciting and empowering.
Around this time I was in the process of learning Rust. I had a fair grasp on the concepts of ownership and lifetimes, along with the syntax of the language, but I didn’t yet feel comfortable creating my own projects. Reading this enthusiastic praise from Kevin motivated me to try Zig.1
Now over half a year later I have spent quite a bit of time with the Zig programming language. I find the language to be small and simple. Like Kevin, I am very enthusiastic about using Zig, and I found myself comfortable and productive early in the learning process. Many aspects of the language excite me, especially the capabilities of compile-time metaprogramming, but for this post I will focus Zig’s type system.
Because I am familiar with C, and Zig is also a systems language, many of the examples I share will compare Zig to C.2
# Simple Types
I find that Zig’s syntax for declaring types is very readable and consistent. To start simple, compare declaring an int between C and Zig.
// C
int value = 0;
// Zig
var value: i32 = 0;
There are a couple differences between these statements. C places the type on
the left of the identifier, while Zig requires a type annotation appended to the
name. The types of most variables can be inferred in Zig, but in this statement
the type must be specified to determine how much storage to allocate on the
stack for value
. Zig also uses var
to introduce a variable declaration. With
such a simple example there isn’t anything else worth noting about the
differences between the languages, so let’s move on to array syntax.
The following code snippets both declare a zero-filled integer array of size 2.
// C
int values[2] = { 0, 0 };
// Zig
var values: [2]i32 = .{ 0, 0 };
Here we begin to see more notable differences between C and Zig. With the introduction of arrays, C begins to fragment the type information between the two sides of the array name; the datatype on the left, and the size of the array on the right. This has been called the Clockwise/Spiral Rule.
Zig stays consistent with the single integer example by prefixing the type
annotation with [2]
to specify an array size 2. The data is initialized with
an anonymous list
literal (the
.{}
) which infers its type from context. Here Zig starts to show one of my
favorite consistencies of the language: types read from left to right. To
illustrate this further, here’s an example of a more complex type:
var to_make_a_point: ?*[]?std.ArrayList(u8) = undefined;
This reads from left to right as an optional pointer to a slice of optional ArrayLists of u8. I can’t think of an example of where this would be used, but it is a valid type!
On the subject of pointers, Zig breaks away from conventional syntax for
dereferencing. As a teaching assistant for a C++ introduction course, I have
seen many students confused by the overloaded use of *
for both declaring and
dereferencing pointers. Zig removes the possibility of ambiguity by appending
.*
to dereference a pointer.
var number: i32 = 32;
var ptr: *i32 = &number;
ptr.* += 10;
This is another example of a small change that adds to the consistent experience in Zig.
# Precision in Types
Zig’s type system also affords more precision than C’s. Take for example the
type of a string in C, const char *
. This could be a pointer to a single char,
it could be NULL
, or it could point to a sequence of any number of char
s,
potentially NULL
-terminated. Because C’s type system cannot encode anything
beyond a pointer to memory, the compiler cannot prevent invalid access, like
attempting to index a pointer to a single char.
Zig’s type system on the other hand gives more tools to express these types with
precision. A pointer to a single char (or u8
) is written as *u8
and cannot
be null
. An attempt to index a single-item pointer is a compile error.
pub fn main() !void {
var char: u8 = 'a';
const ptr: *u8 = &char;
_ = ptr[0];
}
> zig run main.zig
./main.zig:6:12: error: index of single-item pointer
_ = ptr[0];
^
All many-item pointer types
are created with variations of []
.
// a pointer to an unknown number of u8
const ptr: [*]u8 = undefined;
// a pointer to a runtime-known number of items (slice)
const ptr: []u8 = undefined;
// a 0-terminated multi-item pointer of u8
const ptr:[*:0]u8 = undefined;
These types are all slightly different, and their uses will be checked by the compiler. This is especially useful when interfacing with C code by encoding more information about the properties of a pointer.
# Struct Types
Now let’s take a look at defining new types.
// C
typedef struct Point {
int x,
int y,
} Point;
// Zig
const Point = struct {
x: i32,
y: i32,
};
Superficially there isn’t much difference between C and Zig besides syntax, but
there is one important distinction. Unlike C where the struct keyword requires a
name, all Zig structs are
anonymous until bound to an identifier. Storing the type as a constant with
const Point = struct { ... }
allows for reuse.
This is because Zig types are first-class values at compile time. In the above
code the struct { x: i32, y: i32 }
describes a composite type of two signed
integers. The assignment to const Point
is really a compile-time assignment
of data.
This has some interesting side effects. For example, the following function is valid.
fn getFieldA(x: struct { a: i32 }) i32 {
return x.a;
}
pub fn main() !void {
var field_a = getFieldA(.{ .a = 500 });
std.debug.print("{}\n", .{field_a});
}
While writing this post I hadn’t ever tried using a anonymous struct {}
as a
function parameter’s type. But fitting with the theme of consistency in Zig, I
tried this and it worked as expected which was satisfying!
# Generic Types
Zig’s first-class types also lead to a very simple and consistent method of
declaring generic types. Languages like Rust and C++ introduce templating syntax
with <>
characters to mark a struct as generic. In Zig, generics are
implemented with compile-time function calls.3 This is a generic
Pair
struct.
fn Pair(comptime A: type, comptime B: type) type {
return struct {
a: A,
b: B,
};
}
pub fn main() !void {
var pair: Pair(bool, i32) = .{ .a = true, .b = 301 };
std.debug.print("pair.a={}, pair.b={}\n", .{ pair.a, pair.b });
}
Rather than modifying the struct syntax to support generics, Zig stays
remarkably consistent regarding struct {}
as a value at compile time. The
Pair
function4 is evaluated at compile time which returns a new type
defined by the arguments to the function.
Continuing the pattern of consistency, the result of the Pair()
function can be
stored as a constant for reuse, creating a new type that can be used anywhere.
const Vec2d = Pair(i64, i64);
pub fn main() !void {
var vec: Vec2d = .{ .a = -1, .b = 2 };
std.debug.print("({}, {})\n", .{ vec.a, vec.b });
}
Looking only at the syntax for declaring types, Zig shows how a simple concept used consistently reduces the surface area of a programming language. Once you understand how to declare a struct and define a function, you can easily create generic types by combining the two.
As someone who enjoys programming in C, Zig has quickly become my favorite programming language, and I am excited to see where Zig goes in the coming years. From what I understand, Zig is relatively stable at this point, though there are still breaking changes planned before the 1.0 release. Some changes, like the proposal to make function definitions expressions will increase the consistency of the language. I hope that whatever else changes will stay true to the simple consistency of Zig.
-
Please don’t think I’m bashing Rust here. I love the goals and capabilities of the language! But with its guarantees and features Rust is a larger, more complex language than C or Zig. For me Zig hits the sweet spot of fixing major problems in C, while staying small, consistent, and easy to fit the language inside my head.↩︎︎
-
In this comparison I find it important to recognize that Zig is only almost 6 years old, while C is approaching 50 years. In comparing the two languages I find it encouraging to see the progress made over the decades. It is very easy to view C negatively through the lens of the present. I hope we do not forget the historical significance of C, while also recognizing that mistakes were made and can be improved on in new languages.↩︎︎
-
All compile time function calls are memoized, so two separate calls to a generic type function will result in the same type.↩︎︎
-
An interesting side-effect of Zig’s use of functions for type generics is seen in the naming conventions of the language. A function name written in camelCase is a typical function, but a function in TitleCase is a function that returns a type. The same applies to variable names, snake_case for typical variables, and TitleCase for variables that store types. While this isn’t required for code to compile, syntax highlighters do examine the casing to determine what part of the language a given identifier represents. I’ve written about Zig’s naming conventions at length in another post.↩︎︎