Learn You a Rust for Great Good!

This is the start of a very-much-work-in-progress Rust tutorial/explainer. I doubt it’ll be of much use considering the material already out there, but here you are anyways.

[Amended 2018-09-15 due to this comment]

I’ve come to appreciate the beauty of systems programming languages. For the uninitiated: a systems programming language, like C or C++, is a language you could concievably use to write an operating system or device driver in - it’s a language that operates at a very low level.

Because they operate close to the hardware, programs written in these languages tend to be quite fast when compared to something written in, say, Node.js - which has a certain amount of abstraction (that is, layers added to make something easier) inside it to make writing programs less painful. Speed is nice.

However, one key aspect of systems languages is the fact that they rely on manual memory management (ooh, alliteration!). This entails having to tell the computer, “I want a piece of memory that I can store a number in that’s at most this large”, and the computer answering back with a pointer to that memory. Pointers are often heralded as the bane of every new C programmer’s existence - you have to free them after you’re done with them (indicating to the computer that this memory may be reused), you can perform mind-bending pointer arithmetic using them, and if you somehow use one pointing to some invalid memory location, you’ve got a problem on your hands, as you may have just clobbered someone else’s data.

Rust solves all this. As their website states:

Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.

We’ve already covered the blazingly fast part. “Prevents segfaults” means that it stops you writing to someone else’s bit of memory (which raises an error called a segmentation fault, or segfault, on most systems as a form of protection). Guaranteed thread safety means that you can have multiple bits of code running at the same time, and Rust will ensure they don’t try to overwrite each other’s data - something C programmers start to really get annoyed about after a while.

This data is MINE! ALL MINE!!

“How does Rust do its job, then?” you might ask. Well, to demonstrate the magic powers of Rust, I’m going to show you a few code snippets. Here’s some unsuspecting JavaScript code:

function do_a_thing() {
    var x = {
        some: "data",
        clearly: {
            very: "important",
        },
    };
    var y = x;
    console.log(x.some); /* prints "data" */
    console.log(y.clearly.very); /* prints "important" */
}

It’s fairly simple to guess what’s going on here. In the variable x, we store a clearly very important bit of data. Since JavaScript is pretty relaxed, we’re allowed to give the data two names, so we let ourselves refer to the data through the y variable as well, and go on to get stuff out of both variables and log it to the console.

All simple and easy, right? Now let’s look at Rust:

struct ImportantData(u32, u32);

fn do_a_thing() {
    let x = ImportantData(0, 1);
    let y = x;
    println!("{}", x.0);
}

Here, we have a structure (struct) called ImportantData, which holds two integers. We put one in the variable x, try and assign the data to y as before, and then print out data from x. However, if we add this code to a Rust project and compile it, rustc (the Rust compiler) will get mad at us.

fail.rs:6:20: 6:23 error: use of moved value: `x.0` [E0382]
fail.rs:6     println!("{}", x.0);

fail.rs:6:20: 6:23 help: run `rustc --explain E0382` to see a detailed explanation


fail.rs:5:9: 5:10 note: `x` moved here because it has type `ImportantData`, which is moved by default
fail.rs:5     let y = x;
                  ^
fail.rs:5:9: 5:10 help: if you would like to borrow the value instead, use a `ref` binding as shown:
fail.rs:      let ref y = x;
error: aborting due to previous error

This hints at something very deep and meaningful. Rust, unlike other languages, has an ownership system - which means that only one variable can own a bit of data. So, when we look at our code again:

struct ImportantData(u32, u32);

fn do_a_thing() {
    let x = ImportantData(0, 1); // here, x owns the ImportantData
    let y = x; // now, the ImportantData is owned by y - it's been moved!
    println!("{}", x.0); // hang on a second, x doesn't own the data anymore!
}

Yes, behind the facade of its single-letter name, y has big dreams. Eventually, it wants to take over ALL the world’s data structures. But for now, it’ll have to be content with taking x’s ImportantData. When we uttered the fateful words of y = x, our beloved ImportantData was wrested from the hands of the poor x and given exclusively to y - instead of copied, as one might expect from other languages.

rustc does well to tell of x’s fate.

fail.rs:5:9: 5:10 note: `x` moved here because it has type `ImportantData`, which is moved by default
fail.rs:5     let y = x;

So, to sum up: in Rust, only one variable binding (that is, a name like y or x) may own a value at any one time. While this may seem like a terrible, painful thing at first, it helps prevent any of the issues one might encounter with another language. For example, if nobody kept track of who brought the ImportantData to life in the first place, someone might tell the computer that it’s no longer needed whilst another bit of code is still using it. Disaster!

But dealing with ownership can become a right pain in the backside after a while. Consider having to write a function:

fn do_stuff_with_data(a: ImportantData, b: ImportantData, c: ImportantData) -> ImportantData {
    /* does magic */
}
fn main() {
    let (a, b, c) = (ImportantData(0,1), ImportantData(1,2), ImportantData(2,3));
    let result = do_stuff_with_data(a, b, c);
}

What if main wants to keep its data? Right now, the do_stuff_with_data function gobbles up its arguments, meaning that data that enters will never see the light of day again. We could transfer ownership back to main:

fn do_stuff_with_data(a: ImportantData, b: ImportantData, c: ImportantData) -> (ImportantData, ImportantData, ImportantData, ImportantData) {
    /* does magic */
}
fn main() {
    let (a, b, c) = (ImportantData(0,1), ImportantData(1,2), ImportantData(2,3));
    let (result, d, e, f) = do_stuff_with_data(a, b, c);
}

But that looks ugly, and bears the brunt of moving these pieces of data around. What we really need is a way to lend the data out to do_stuff_with_data. To let it use it for a second, but have main still own it.

To do that, we’d need to use borrows, which, if I continue this series, will be covered next time.

If you’ve got here and really want me to write more, feel free to drop me a line on Twitter (@eeeeeta9).