Hare's advances compared to C February 9, 2021 by Drew DeVault
The Hare programming language makes a number of improvements compared to the language from which it draws much of its inspiration. Like C, Hare has a very small runtime, manual memory access, direct access to pointers, and dangerous features which you are free to point at your feet and shoot, albeit only after signing a waiver.
Despite the veneration with which we look upon C, those who look upon it with scorn do have some valid concerns. A complete lack of memory-safe features and a miserable error handling experience both make it pretty easy to make mistakes in C that are not possible elsewhere. Some of this also makes it possible to do desirable things which are not possible in those other languages, but it would be nice to have options for when you don’t need to do those things and just want to write reliable code.
Tagged unions for error handling
Tagged unions are the key innovation of Hare. Every function which could return
errors returns a tagged union of either the happy result (or results), or the
possible error types. Each possible error is described as part of the function’s
API, rather than tossed into errno
and ambiguously mixed in with every other
kind of error. We also make it easier for you to handle these errors in your own
code thanks to the match
statement, which forces you to think about and
provide some kind of handling for each failure case.
fn io::write(s: *stream, buf: const []u8) (size | io::error);
// ...
sum += match (io::write(s, buf)) {
case let err: io::error =>
match (err) {
case unsupported =>
abort("Expected write to be supported");
case =>
return err;
};
case let n: size =>
process(buf[..n]);
yield n;
};
The match statement is required to be exhaustive (this is also true of
switch
), so it’s not possible to forget an error case by mistake. The compiler
helps you identify what errors are possible and make sure that they’re covered.
Nullable and non-nullable pointer types
null
is an oft-derided feature of C and other programming languages. The real
issue is not that you can represent the absence of a value, but that you can
treat the absence of a value as if it existed, leading to segfaults and other
broken behaviors. Hare fixes this! In Hare, a pointer type can only be null
if
it has the nullable
attribute, and nullable
pointers cannot be dereferenced
without first testing if they’re null
.
let x = 10;
let y: *int = &x; // Guaranteed to be non-null
let z: nullable *int = y; // May be null!
*y; // Valid
*z; // Error: main.ha:6:19: Cannot dereference nullable pointer type
match (z) {
case null =>
abort();
case let z: *int =>
yield *z; // Valid
};
Again, however, if you think you’re smarter than the compiler, it believes you.
You can cast null
to a non-nullable pointer type explicitly. This is
occasionally necessary, for example, to initialize pointer globals during
@init
.
No uninitialized values
Uninitialized values are another big source of frustration for C programmers, and a source of undefined behavior. In Hare, every time a variable is initialized, you are required to provide a value for it.
let x: int; // Syntax error: unexpected ';' at main.ha:2:19, expected '='
If you need to do some logic to derive the appropriate value, then you can take advantage of Hare’s expression-based syntax to do so:
let x: int = if (foo) {
let results = do_work();
yield results.x;
} else 42;
We also offer the ...
operator for making it easier to initialize data in
bulk for arrays and structs. Some types define a default value, e.g. 0
for
int
, and you can initialize struct fields to those defaults:
let x = coords { x = 1337, ... };
And you can fill out large arrays with ...
as well:
let x: [4096]int = [0...];
Checked arrays and slices
Every array indexing operation in Hare generates a boundary test.
export fn main() void = {
let x = [1, 2, 3];
let y = x[42];
};
// $ hare run main.ha
// Abort: slice or array access out of bounds
Of course, sometimes you don’t want this, so we let you use [*]
to define an
array of undefined length, which has no boundary checks.
export fn main() void = {
let x = [1, 2, 3];
let y = &x: *[*]int;
y[42] = 1337;
};
// $ hare run main.ha
// Segmentation fault
Slicing also makes some kinds of operations easier and safer for free. For instance, let’s say you want to copy part of a slice to another. You can assign to a slice:
x[5..10] = y[5..10];
In C, we’d probably use memcpy instead:
memcpy(&x[5], &y[5], 5 * sizeof(x[0]));
This introduces another opportunity for error: you can forget to use sizeof
,
or use sizeof
on the wrong object. This is a common cause of buffer overflows.
We also introduce a len
operator, which removes the need for distinguishing
between array size and array length.
Getting a handle on undefined behavior
Okay, we have a little bit of undefined behavior. The word “undefined” has a
special meaning in our specification, and is almost always used to describe a
situation that raises a compiler error. For example, a non-nullable pointer has
an “undefined” default value, and any type with an “undefined” default value
causes an error when used with the ...
operator. For example:
type my_struct = {
x: *int,
y: *int,
};
export fn main() void = {
let foo = my_struct { ... };
// Error: main.ha:8:27: field 'x' has no default value
};
The only other example is this quote from section 5.4.4 of the Hare specification:
The name and signature of the program entry point function is undefined in the freestanding environment.
The freestanding environment simply does not define the entry point as main
.
What Hare definitely does not have is the same kind of undefined behavior that
C compilers use as a license to do whatever they want to your program, often
with optimization as an excuse. On the subject of optimization, to quote the
specification again:
If the implementation is able to determine that the evaluation of part of an expression is not necessary to compute the correct value and cause the same side-effects to occur in the same order, it may rewrite or re-order the expressions or sub-expressions to produce the same results more optimally.
The interpretation of this constraint shall be conservative. Implementations should prefer to be predictable over being fast. Programs which require greater performance shall prefer to hand-optimize their source code for this purpose.
There are many areas that C leaves undefined that we’ve decided to define. An byte is always 8 bits. Shifting greater than the width of a value is defined. Signed overflow and underflow is defined. Hare programs always have predictable behavior.
Of course, we still allow you to shoot yourself in the foot, because any system
which prevents you from doing dangerous things also prevents you from doing
clever things. If you cast null
to a non-nullable pointer and write to it,
something bad will probably happen.
Better strings
POSIX locales are a nightmare. Hare improves upon this with the following
invariant: all strings are always valid UTF-8. We also store the length
alongside the string, allowing NUL
to appear in the string contents, and for
len(str)
to be an O(1) operation. Instead of char
, we have rune
, which is
a single UCS-32 Unicode codepoint. The Hare standard library also provides a
number of useful convenience functions which natively grok Unicode and are less
fraught with footguns when compared to POSIX.
If you know better than us, though, then again you have options. We provide the
ascii
module for asking questions only of the ASCII subset of Unicode, such as
ctype.h
equivalents like ascii::isalnum
. You can also use the (O(1)!)
strings::toutf8
to convert a str
into a byte slice []u8
, which you can
index and manipulate to your heart’s content.
Strongly-typed variadism
stdarg.h
is a convenient feature of C, but also one which is a common source
of errors. Hare instead supports variadism as a kind of syntax sugar over slices
and tagged unions. For example, consider fmt::printf
:
// Tagged union of all types which are formattable.
export type formattable = (...types::numeric | uintptr | str | rune | nullable *void);
// Formats text for printing writes it to [os::stdout].
export fn printf(fmt: str, args: formattable...) (io::error | size) =
fprintf(os::stdout, fmt, args...);
The args
parameter here is a slice of formattable
values. You can use
len(args)
and index one with args[3]
. Each value is strongly typed and
checked for validity at the call-site. Another function, format
, takes
advantage of this:
fn format(
out: *io::stream,
arg: formattable,
mod: *modifiers,
) void = match (arg) {
case let s: str =>
io::write(out, strings::to_utf8(s));
case let r: rune =>
io::write(out, utf8::encode_rune(r));
case let p: uintptr =>
let s = strconv::uptrtos(p);
io::write(out, strings::to_utf8(s));
case let v: nullable *void =>
match (v) {
case let v: *void =>
let mod = modifiers { base = base::LOWER_HEX, ... };
format(out, v: uintptr, &mod);
case null =>
format(out, "(null)", mod);
};
case let n: types::numeric =>
// ...
};
This is how our fmt
doesn’t have to distinguish between %s
and %d
and
%c
: because it’s baked into the type which you pass to the function. Another
footgun removed! And, of course, if you’re interoperating with C code, you can
declare a prototype which uses C-style variadism and call it normally from Hare
code, albeit with the obvious lack of type safety guarantees.
In summary
Hare makes a number of conservative improvements on C’s ideas, the biggest bet of which is the use of tagged unions. Here are a few other improvements:
- A context-free grammar
- Less weird type syntax
- Language tooling in the stdlib
- Built-in and semantically meaningful static and runtime assertions
- A lightweight system for dependency resolution
- defer for cleanup and error handling
- An optional build system which you can replace with
make
and standard tools
Even with these improvements, Hare manages to be a smaller, more conservative language than C, with our specification clocking in at less than 1/10th the size of C11, without sacrificing anything that you need to get things done in the systems programming world.