An introduction to the Hare standard library

This tutorial introduces you to the Hare standard library. It assumes familiarity with most of the fundamental language concepts, which you can learn from the language introduction tutorial. You are also encouraged to make liberal use of the standard library’s reference documentation, which is available in your terminal via the “haredoc” tool, or available online at docs.harelang.org.

We will not cover the entire standard library in this tutorial, but we will introduce you to the most important parts of the standard library, and give you an idea of its general design and use.

Note: As you can tell, much of this tutorial remains to be written! However, comprehensive reference documentation is available for the standard library. You can browse it in your terminal via the "haredoc" command, or online at docs.harelang.org. If you have any questions, do not hestitate to connect with the Hare community.

Input and output

The Hare standard library offers support for file I/O, to access the host filesystem, read and write files, and work with pipes. A userspace I/O abstraction is provided that allows the user to wrap I/O sources in various processing tools to deal with compression, encryption, hashing, and so on.

Hare’s I/O abstraction

The essential resource for I/O in Hare is io::handle. This type is a tagged union which can store either a native file handle, io::file (i.e. a Unix file descriptor), which is backed by the host operating system and provides access to files, network sockets, and other resources; or an io::stream, which implements I/O operations in userspace.

Resources for creating an io::file are varied throughout the standard library, for instance os::open creates an io::file by creating or opening a file on the host filesystem, while net::tcp::connect opens a TCP connection and returns the file descriptor associated with the socket.

Additionally, various implementations of io::stream are provided for various purposes. Buffered I/O is implemented via bufio, for example. Other examples include the hash module, and modules which implement hashes (such as hash::fnv or crypto::sha256), which extend the I/O abstraction for improved performance and additional tasks which may be accomplished with I/O operations.

use crypto::sha256;
use encoding::hex;
use fmt;
use fs;
use hash;
use io;
use os;

export fn main() void = {
	if (len(os::args) != 2) {
		fmt::fatalf("Usage: {} <input>", os::args[0]);
	};

	const path = os::args[1];
	const file = match (os::open(path)) {
	case let file: io::file =>
		yield file;
	case let err: fs::error =>
		fmt::fatalf("Error opening {}: {}",
			path, fs::strerror(err));
	};
	defer io::close(file)!;

	const hash = sha256::sha256();
	io::copy(&hash, file)!;

	let sum: [sha256::SZ]u8 = [0...];
	hash::sum(&hash, sum);

	fmt::println(hex::encodestr(sum))!;
};

This program computes the sha256 hash of a file using various I/O features from the Hare standard library.

Performing I/O operations

Support for standard file operations is provided by the standard library, such as:

io::read: read from a file
io::write: write to a file
io::close: close a file
io::seek: change the position in a file
io::copy: efficiently copy all data between files

Generally speaking, I/O takes the form of obtaining an I/O object (such as by opening a file) and performing read or write operations against that resource. Read operations have three outcomes:

// Reads up to len(buf) bytes from a [[handle]] into the given buffer, returning
// the number of bytes read.
fn read(
        h: handle,
        buf: []u8,
) (size | EOF | error);

Reads may return the number of bytes read (which may be less than the size of the provided buffer), an end-of-file condition (indicating there is no further data to read; this is not considered an error), or an error.

Write operations have two outcomes; writing some number of bytes and returning that number (which also may be less than the amount provided in the buffer), or an error.

Unless otherwise documented by the interface that provides an io::handle, the caller should close the resource once they are finished with it via io::close. In the case of io::files, this operation generally closes the underlying file descriptor and the host operating system will clean up state associated with the file. In the case of io::streams, the close operation will perform domain-specific clean-up associated with the stream, such as freeing any memory that was allocated for the stream’s operation. Closing a file can fail, but generally only in the case of programmer error, thus the use of ! to assert errors on io::close is common (as seen in the sample above).

Most systems are resource-limited in how many files they can have open at once, failing to close files and manage file lifetimes properly will exhaust this resource and lead to program failure. If appropriate to your use-case, it is recommended to defer io::close(object)! shortly after creating the resource.

Additional support is provided for io::file objects to integrate with the host operating system, such as the use of io::mmap for memory-mapped I/O, or io::readv et al for vectored I/O. Additional support for system features that utilize file descriptors makes use of io::file throughout the standard library.

Access to the host filesystem

Access to the host filesystem is provided by the os module, through functions like os::open. os::open accepts a set of flags to tune the operation, some of which are not implemented by all platforms, such as flag::APPEND to open files in append mode. os::create is also provided to create new files and accepts an additional parameter to specify the desired file mode; fs::mode is provided to make the construction of the desired mode, and interpretation of file modes, easier.

Manipulating the filesystem itself is also supported:

os::chmod, os::chown: change access and ownership information for a file
os::iter: efficiently iterate over entries in a directory
os::mkdir: create a new directory
os::move: move a file from one location to another
os::rename: rename a file
os::stat: retrieve details about a file, such as ownership and type

These functions are all designed to be portable, but access to Unix-specific functionality is also provided by this module, including:

The use of directory file descriptors
Manipulating symbolic and hard links
Working with block device, character device, and FIFO nodes

The use of Unix-style non-blocking I/O is supported, though generally advised against in favor of I/O multiplexing (described below). To use Unix-style non-blocking I/O, open a file with fs::flag::NONBLOCK and test for errors::again to detect I/O operations that would otherwise have blocked.

Buffered I/O

Many I/O operations are most efficient when performed in bulk, by reading or writing large amounts of data at a time. However, it is often much more convenient to write your code with many small reads or writes rather than buffering it yourself to perform I/O in large batches.

The “bufio” module aims to make the process of buffering I/O operations into fewer, larger operations easier. This module also includes a number of tools for working with buffers and I/O efficiently, for example efficiently scanning lines of text from an input.

bufio::init accepts an arbitrary I/O handle, as well as a buffer for reads and a buffer for writes (one can omit either for a read-only or write-only stream), and then will batch reads and writes using these buffers. The following example illustrates its use and the performance advantages of the approach; comment out the “bufio::init” line to see the difference in performance.

use bufio;
use fmt;
use fs;
use io;
use os;
use time;

export fn main() void = {
	const input = match (os::open(os::args[1])) {
	case let file: io::file =>
		yield file;
	case let err: fs::error =>
		fmt::fatalf("Error opening {}: {}",
			os::args[1], fs::strerror(err));
	};
	defer io::close(input)!;

	// Create a buffered stream
	let rdbuf: [os::BUFSZ]u8 = [0...];
	let input = &bufio::init(input, rdbuf, []);

	const start = time::now(time::clock::MONOTONIC);

	// Read entire file one byte at a time
	let buf: [1]u8 = [0];
	for (!(io::read(input, buf)! is io::EOF)) void;

	const stop = time::now(time::clock::MONOTONIC);
	const elapsed = time::diff(start, stop);
	const sec = elapsed / time::SECOND;
	const nsec = elapsed % time::SECOND;
	fmt::printfln("Took {}.{:.09}s to read file", sec, nsec)!;
};

The standard file descriptors os::stdin and os::stdout are buffered. You can access the underlying files via os::stdin_file and os::stdout_file. stderr is unbuffered.

Token scanners

The bufio module also provides a “scanner”, which scans an input stream for certain kinds of tokens, such as new lines, and internally manages a buffer to batch smaller reads into fewer I/O operations. You can create a new scanner with bufio::newscanner, then use the various scanner functions, such as:

bufio::scan_byte: reads a single byte from the input
bufio::scan_rune: reads a single UTF-8 rune from the input
bufio::scan_bytes: reads a number of bytes up to a provided delimiter
bufio::scan_string: reads a UTF-8 string up to a provided delimiter
bufio::scan_line: reads a UTF-8 string up the next newline

The scanner can be configured to allocate and resize its own internal buffers, up to a limit specified in the bufio::newscanner call, or you can supply your own fixed-size buffer with bufio::newscanner_static.

Note that the scanner reads data from the underlying source ahead of the last value returned from each of the scan_* calls, so if you abandon the scanner and resume reading directly from the underlying file, you will miss any data which was read ahead. To mitigate this, you can access the read-ahead buffer via bufio::scan_buffer.

Here is an example program which efficiently reads lines of text from a file and numbers them:

use bufio;
use fmt;
use fs;
use io;
use os;
use types;

export fn main() void = {
	const input = match (os::open(os::args[1])) {
	case let file: io::file =>
		yield file;
	case let err: fs::error =>
		fmt::fatalf("Error opening {}: {}",
			os::args[1], fs::strerror(err));
	};
	defer io::close(input)!;

	const scan = bufio::newscanner(input, types::SIZE_MAX);
	for (let i = 1u; true; i += 1) {
		const line = match (bufio::scan_line(&scan)!) {
		case io::EOF =>
			break;
		case let line: const str =>
			yield line;
		};
		fmt::printfln("{}\t{}", i, line)!;
	};
};

Memory I/O

It is often useful to use I/O operations to work with buffers of data. For instance, one might wish to prepare a []u8 or a str for some operation by using io::write, fmt::fprintf, etc. You may have a buffer of data from some source as a []u8 and wish to pass it to functions which use I/O semantics; imagine you have a tarball in a memory buffer and wish to process it with format::tar. The memio module is designed to facilitate these use-cases.

The two main entry points to this module are memio::dynamic and memio::fixed, which respectively create an io::stream which performs reads and writes against an internally managed, dynamically allocated buffer and a user-managed fixed-length buffer. One can obtain the buffer as a []u8 with memio::buffer or as a string with memio::string.

use fmt;
use io;
use memio;

export fn main() void = {
	const sink = &memio::dynamic();
	defer io::close(sink)!; // Frees the underlying buffer

	const username = "Drew";
	fmt::fprint(sink, "Hello, ")!;
	fmt::fprint(sink, username)!;
	fmt::fprint(sink, "!")!;

	fmt::println(memio::string(sink)!)!;
};

memio does not hold any dynamically allocated state aside from the buffer itself, so you can skip io::close and use memio::buffer to claim ownership of the buffer (freeing it yourself later with free()) without leaking memory.

I/O multiplexing

If you have several sources of I/O to read from or write to, you may wish to know which operations can be performed without blocking. Programmers familiar with NONBLOCK usage on Unix systems can take advantage of it as described in an earlier section, but the recommended approach on Unix uses unix::poll, which is a wrapper around the portable poll syscall. Be aware that this module only works with io::file, rather than io::stream.

If you are familiar with the syscall you will already have a generally strong understanding of the usage of the Hare module. Otherwise, examples of unix::poll usage will be covered in the networking section later in this tutorial.

For more complex use-cases (those covered by non-portable tools such as epoll(2) on Linux or kqueue on *BSD), see the extended library hare-ev project, which provides more comprehensive event loop support.

Custom I/O streams

It is often useful to create custom implementations of the I/O abstraction, writing I/O objects that provide your own implementations of read, write, etc, which can be passed into any function that expects an I/O object. io::stream is provided for this use, which is used to implement many userspace I/O operations throughout the standard library, but which can also be used in your own code to implement custom streams.

One must define the implementation using an io::vtable, filling in whichever I/O operations you wish to support, then place an io::stream (initialized as a pointer to this table) at the start of an object to create a custom stream. You can fill in the remainder of the object with your custom state.

A simple illustrative example of such a stream is provided by the standard library’s io::limitreader, which only allows a user-defined number of bytes to be read from an underlying source of input. The implementation is concise:

export type limitstream = struct {
	vtable: stream,
	source: handle,
	limit: size,
};

const limit_vtable_reader: vtable = vtable {
	reader = &limit_read,
	...
};

// Create an overlay stream that only allows a limited amount of bytes to be
// read from the underlying stream. This stream does not need to be closed, and
// closing it does not close the underlying stream. Reading any data beyond the
// given limit causes the reader to return [[EOF]].
export fn limitreader(source: handle, limit: size) limitstream = {
	return limitstream {
		vtable = &limit_vtable_reader,
		source = source,
		limit = limit,
	};
};

fn limit_read(s: *stream, buf: []u8) (size | EOF | error) = {
	let stream = s: *limitstream;
	if (stream.limit == 0) {
		return EOF;
	};
	if (len(buf) > stream.limit) {
		buf = buf[..stream.limit];
	};
	match (read(stream.source, buf)) {
	case EOF =>
		return EOF;
	case let z: size =>
		stream.limit -= z;
		return z;
	};
};

I/O utilities

io::limitreader is an example of a simple I/O utility provided by the standard library to facilitate common I/O usage scenarios. io::limitwriter is similar. Additional useful utilities provided include:

io::tee: a stream which copies reads and writes from an underling source to a secondary handle
io::empty: a stream which always reads EOF and discards writes
io::zero: a stream which always reads zeroes and discards writes
io::drain: reads an entire I/O object into a []u8 slice
io::readall & io::writeall: reads or writes an entire buffer, without underreads

Working with strings

The string type in Hare is deliberately limited in its language-level utility, and most string operations are deferred to the standard libraries. A number of standard library modules are provided to assist with these operations, including, among others:

ascii: working with ASCII text and character classes
fmt: formatting values as strings
fnmatch: simple pattern matching (wildcards)
regex: regular expression support (covered in detail later)
strconv: converting numbers to strings and vice-versa
strings: various string-related operations

We’ll cover a subset of this functionality in this part of the tutorial.

Formatting text

The fmt module provides support for formatting various kinds of values as text, and writing this text to strings, buffers, or I/O handles. This family of functions generally accepts a “format string”, which is a constant string that has a series of “format specifiers” describing how to represent values as formatted text in their output.

You’ve already seen some simple uses of fmt throughout the tutorial, from the very first “hello world” program.

fmt::println("Hello world!")!; // Write "Hello world!" to stdout, then a newline

Let’s explain how it works in more detail.

A format string contains characters that are represented literally in the output, as well as format sequences that are replaced with formatted values from the paramters to the fmt function. A format sequence begins with “{” and ends with “}”, and characters between the braces can be used to customize the behavior of the formatting operation. The simplest option is re-ordering paramters by including an index between the braces:

fmt::printfln("{2} + {1} = {0}", 15, 10, 5)!; // Prints "5 + 10 = 15"

Additional format specifiers may be added by the addition of a : character, followed by a sequence of characters describing the desired format. For example, to print values in hexadecimal:

fmt::printfln("My favorite number is 0x{:x}", 4919)!; // "My favorite number is 0x1337"

Additional format modifiers can be used for other bases (octal and binary), to specify leading zeroes or spaces, floating point precision, aligning values to the left or right of a column, and so on. Consult the module documentation for the complete list of features.

String manipulation

The strings module provides a number of general-purpose utilities for working with strings. Your attention is drawn to some of the highlights:

strings::concat: concatenate strings
strings::contains: test for substrings
strings::cut: split a string in two by a delimiter
strings::split, strings::splitn: split a string by a delimiter
strings::dup: duplicate a string on the heap
strings::fromutf8, strings::toutf8: convert to/from UTF-8 byte slices
strings::hasprefix, strings::hassuffix: test for prefix/suffix
strings::index: identify the first instance of a substring or rune
strings::iter: iterate over the runes in a string
strings::join: join several strings by a delimiter
strings::ltrim, strings::rtrim, strings::trim: remove prefixes and suffixes
strings::replace: replace instances of a sub-string in a larger string
strings::sub: extract a sub-string
strings::tokenize: iterate over tokens in a string by some delimiter

strings::cut can be used to turn “key=value” into (“key”, “value”). strings::ltrim can remove the spaces from the start of a string. strings::tokenize can turn “x:y:z:q” into successive tokens of “x”, “y”, “z”, and “q”. Many other functions are available for a variety of string operations: consult the module documentation for the complete list.

Note that efficient use of strings in Hare requires careful attention paid to memory usage. Each of these functions documents its memory semantics: for instance, using strings::tokenize may be more desirable than strings::split in many situations given that the latter must heap-allocate the return value. Often the most efficient means of building a complex string is via memio and fmt.

More filesystem utilities

Working with paths

The path module provides utilities for normalizing and modifying filesystem paths. The paradigm of this module is centered around the path::buffer, which represents a normalized path. A path buffer can be converted to or from a string at any time, but the advantage of using a buffer is that it is mutable, and none of the functions in the path module will ever perform heap allocation (unlike many string manipulations). Additionally, the path buffer will ensure that the right path separators for the system are used, and that the buffer does not exceed the system’s maximum path length.

Most effective use of the path module involves creating one buffer and passing around a pointer to that buffer, only converting it to a string when a string is required. This prevents excessive copying and re-normalizing of the buffer.

Below is a simple recursive filetree traversal program.

use fmt;
use fs;
use os;
use path;

fn walk(buf: *path::buffer) void = {
	let iter = os::iter(path::string(buf))!;
	defer os::finish(iter);
	for (const d => fs::next(iter)!) {
		if (d.name == "." || d.name == "..") {
			continue;
		};
		path::push(buf, d.name)!;
		fmt::println(path::string(buf))!;
		if (fs::isdir(d.ftype)) {
			walk(buf);
		};
		path::pop(buf);
	};
};

export fn main() void = {
	const root = if (len(os::args) <= 1) "." else os::args[1];
	let buf = path::init(root)!;
	walk(&buf);
};