Learn You a Rust II - References and Borrowing

This is the second part of the Learn You a Rust for Great Good! tutorial explainer series. If you’re coming to the series for the first time, I recommend starting at the first post linked above.

Last time, we covered the topic of ownership, and how Rust uses it to make really cool things happen such as complete memory safety. However, there’s a small problem with ownership, and I’ll demonstrate it with the same code snippet as last time:

fn do_stuff_with_data(a: ImportantData, b: ImportantData, c: ImportantData) -> (ImportantData, ImportantData, ImportantData, ImportantData) {
    /* does magic - let's say, adds them up and returns the result */
}
fn main() {
    let (a, b, c) = (ImportantData(0,1), ImportantData(1,2), ImportantData(2,3));
    let (result, d, e, f) = do_stuff_with_data(a, b, c);
}

Here, we want to have a do_stuff_with_data function that takes three ImportantDatas and returns one. However, with Rust’s ownership system, we end up having to return four ImportantDatas: our three inputs, and one output - else the do_stuff_with_data function will consume our input values. Remember last time, where we moved a value, making it no longer accessible by its original name? The same thing happens here, except if we don’t return the ImportantDatas we put in and move them back, they just die when do_stuff_with_data finishes doing its thing.

We need a way to have a reference to data. To look at it, perhaps to edit it, but not to own it.

Introducing borrows

Thankfully, Rust’s got our back. Rust provides two sorts of borrows that let you take a reference to a piece of data, whilst still making sure that nothing bad happens to it. (How? Stay tuned, and we’ll tell you later on in the series in the lifetimes section.)

These two types are &T and &mut T. I’m using the letter T here to mean “any type” - an integer, string, ImportantData, you name it. (You’ll encounter this pattern later with generics a lot.) The & operator means “reference to”, and the mut part means “mutable” - that is, editable. A borrow is created by using one or both of these operators - for example, borrowing variable as &variable. Let’s have a look at how we can use these to improve our code.

fn do_stuff_with_data(a: &ImportantData, b: &ImportantData, c: &mut ImportantData) -> ImportantData {
    /* does magic - let's say, adds them up and returns the result */
}
fn main() {
    let (a, b) = (ImportantData(0,1), ImportantData(1,2));
    let mut c = ImportantData(2,3);
    let result = do_stuff_with_data(&a, &b, &mut c);
}

Ah, much nicer - now do_stuff_with_data only has one output. It borrows a and b, and mutably borrows c, does computations with them, and returns an owned value - the ImportantData - that is moved back into main(). (It might accomplish this by returning something like ImportantData(a.0 + b.0, c.0).)

It’s important to note the difference between a &T and a &mut T. Consider the following:

fn add_4_to_vec(vec: &Vec<i32>) { /* `Vec` is an 'array' type, short for 'vector' */
    vec.push(4);
}
fn main() {
    let mut my_little_vector: Vec<i32> = vec![1, 2, 3];
    add_4_to_vec(&my_little_vector);
}

This code will not compile.

fail.rs:2:5: 2:8 error: cannot borrow immutable borrowed content `*vec` as mutable
fail.rs:2     vec.push(4);

Here, Rust is telling us that what we are trying to do is strictly verboten. We’ve only been handed an immutable borrow - a &T - yet we somehow are trying to edit the thing being borrowed. If you could do this, all sorts of Bad Things™ would happen.

(Also note that Rust has told us about one new piece of syntax, the magic dereferencing asterisk, used to mean “thing behind this borrow”.)

Changing the code to use a &mut T - a mutable borrow - works.

fn add_4_to_vec(vec: &mut Vec<i32>) { /* `Vec` is an 'array' type, short for 'vector' */
    vec.push(4);
}
fn main() {
    let mut my_little_vector: Vec<i32> = vec![1, 2, 3];
    add_4_to_vec(&mut my_little_vector);
}

Sound good? You should also note that moving stuff out of a borrow is not allowed.

struct Data(i32);

fn main() {
    let mut x = Data(5);
    let y = &mut x;
    let z = *y;
}

fail.rs:6:13: 6:15 error: cannot move out of borrowed content [E0507]
fail.rs:6     let z = *y;
                      ^~
fail.rs:6:9: 6:10 note: attempting to move value to here
fail.rs:6     let z = *y;
                  ^

What, did you want to leave poor x with no data? You’ve only borrowed x - not taken full ownership of it. Thus, moving as we would with ownership is strictly verboten.

A small note about `Copy`

There’s one small detail about some times in Rust that we didn’t cover last time. Ever try to do this…

fn add_two(a: i32, b: i32) -> i32 {
    a + b /* note: Rust's expression-based nature means you can leave
             an expression on the last line of a function (WITHOUT semicolon)
             to be a return value */
}
fn main() {
    let a = 3;
    let b = 4;
    let c = add_two(a, b);
    println!("{}, {}, {}", a, b, c); /* prints "3, 4, 7" */
}

…and wonder why add_two didn’t gobble up a and b? No? Well, it’s rather interesting, so let me tell you about it.

To prevent development with ownership being a massive pain, some types are labeled with a trait called Copy. (You don’t have to worry about traits for now, but here’s the relevant section of the Rust Book if you’re interested.) What does this mean? Well, let’s go to the excellent standard library documentation to find out:

Types that can be copied by simply copying bits (i.e. memcpy). By default, variable bindings have 'move semantics.' In other words:
#[derive(Debug)]
struct Foo;

let x = Foo;

let y = x;

// `x` has moved into `y`, and so cannot be used

// println!("{:?}", x); // error: use of moved value
However, if a type implements Copy, it instead has 'copy semantics':
// we can just derive a `Copy` implementation
#[derive(Debug, Copy, Clone)]
struct Foo;

let x = Foo;

let y = x;

// `y` is a copy of `x`

println!("{:?}", x); // A-OK!
It's important to note that in these two examples, the only difference is if you are allowed to access x after the assignment: a move is also a bitwise copy under the hood.

Basically, what this is saying is that Copy makes variables imbued with it have special copy semantics - that is, instead of moving them about everywhere and worrying about ownership, we simply just copy the 1s and 0s that make up the variable to a different place, where another variable can use them. Sharing is caring!

An even smaller note about mutability

We’ve now encountered two ways something can be mutable - editable - in Rust: a mut variable binding (variable binding is a fancy term for “name we use for a variable”), as in let mut x = 1, and a &mut borrow, as in &mut x. It’s important to distinguish between the two.

fn main() {
    let a = 1; /* immutable */
    let b = &mut a; /* impossible, as `a` is immutable */
    let b2 = &a; /* immutable variable binding to immutable borrow */
    let mut c = 2; /* mutable variable binding */
    let d = &mut c; /* immutable variable binding to mutable borrow */
    let mut e = &mut c; /* mutable variable binding to mutable borrow */
    
    *e = 3; /* works, as c - the thing borrowed by e - is mutable */
    e = &a; /* works, as e was mutable */
    d = &a; /* doesn't work, even though we can immutably borrow `a`,
               as `d` is an immutable variable binding */
}

Your head probably hurts after reading that. Don’t worry, it’s not as hard as it looks here.

The problem with borrowing

Okay. I’ll admit something to you: when I initially told you about mutable references (&mut), I made them seem simpler then they actually are.

With ownership, we were always sure that data was exclusively owned by one variable, and it could only be edited through use of that variable. To edit it, you’d either need to change it through the original variable or move the data elsewhere. However, with borrows, we can have many references to some data. What if one is reading whilst another is writing? We now have the possibility of a data race occuring. Here’s the definition:

There is a ‘data race’ when two or more pointers access the same memory location at the same time, where at least one of them is writing, and the operations are not synchronized.

The bad things that happen under the umbrella term data race include: two bits of code editing data at the same time, resulting in mangled data; something reading a piece of data being written, resulting in the reader getting garbage - and various other nefarious events. So how do we fix it?

Enter the borrowing rules

Rust is clever, and it has a set of rules to deal with this exact problem. Here they are, straight out of the Rust Book:

Rule one: any borrow must last for a scope no greater than that of the owner

Basically, “a borrow can’t outlive the thing it’s borrowing”. It’s hopefully obvious why: a borrow to nonexistent content doesn’t make any sense, and trying to read or write it will so something - it’ll try and read from where the thing it borrowed once was - but you have no idea what will happen.

Rule two: you may have EITHER 1+ immutable borrows OR EXACTLY 1 mutable borrow

This one prevents data races. It dictates that, whilst you have any immutable (&T) borrows, you’re not allowed any mutable (&mut T) borrows - to prevent the problem of something reading while something else is writing, or code getting lost because a piece of data it depended on changed in an unexpected way.

It also says that you can’t have two or more mutable borrows, to prevent the problem of two things overwriting the same piece of data.

A few examples of terrible and shameful code

Stolen from the Rust Book:

fn main() {
    let mut x = 5;
    let y = &mut x;

    *y += 1; /* magic dereferencing asterisk modifying `x` */

    println!("{}", x);
}

If we compile this, Rust yells at us.

error: cannot borrow `x` as immutable because it is also borrowed as mutable
    println!("{}", x);
                   ^

Let’s walk through how this is bad:

When y was created, we gave it a &mut x - a mutable borrow of x.
println!(), a macro to print stuff out to the console, needs to read x. Since it needs to read x only, it takes an immutable borrow (as most things in Rust do)
Rust looks through this code, and sees println!() trying to immutably borrow x. Since y’s still around - a mutable borrow - it gets annoyed and blows up in our face.

You’ll actually notice that Rust will be helpful to you if you encounter such an issue, and tell you where the borrow ends. In this case, it’s right at the end of main(). Dang.

  
note: previous borrow ends here
fn main() {

}
^

The solution? Again, pilfered from the very hands of the Rust Book’s author:

fn main() {
    let mut x = 5;
    
    { /* ooh look, braces! */
         
        let y = &mut x; // -+ &mut borrow starts here
        *y += 1;        //  |
    }                   // -+ ... and ends here

    println!("{}", x);  // <- try to borrow x here
}

The braces create a new scope - a new block of code for stuff to occur in. These help us express the notion that we only want y to live so we can add 1 to it, and after that we’re done.

Here’s a more sneaky example - one that I’ve encountered myself, and one that took me a while to figure out. I post it here, so that you may not be as stupid.

struct Data(i32); 

fn main() {
    let mut vec1: Vec<&Data> = Vec::new();
    let mut vec2: Vec<Data> = vec![Data(0), Data(1), Data(2)];
    
    for data in &vec2 {
        vec1.push(data);
    }
}

What’s wrong here? This code is totally fine! I wrote it, dammmit, I know what’s going on!

Nope.

fail.rs:8:18: 8:22 error: `vec2` does not live long enough
fail.rs:8     for data in &vec2 {
                           ^~~~

What’s going wrong here is more subtle. To solve it, you have to know this one fact:
Rust gets rid of data in reverse order to when it was created. This is called last in, first out (LIFO).

Let’s break the problem down.

We have vec1, a vector of borrows to some piece of Data. (You might be asking yourself: where does the Data come from?)
We then have vec2, a vector of Data. So this is where it comes from.
We borrow every piece of Data in vec2 with a for loop, and store these borrows in vec1.
We’re now at the end of main(). Rust comes along and starts destroying stuff.
Its first target is vec2, because it was created last. Bonk RIP vec2.
Now, we have a problem. vec1 is full of borrows to stuff in vec2 - which just got bonked on the head. This breaks rule one, as a borrow is outliving something it borrowed.

Rust even told us that at the start. Sigh. The solution is simple: reverse the order!

struct Data(i32); 

fn main() {
    let mut vec2: Vec<Data> = vec![Data(0), Data(1), Data(2)];
    let mut vec1: Vec<&Data> = Vec::new();
    
    for data in &vec2 {
        vec1.push(data);
    }
    
    /* vec1 dies first */
}

Everything works, and all is well with the world.

Sidenote: iterating over vectors

This stymied me when working on this part.

struct Data(i32); 

fn main() {
    let mut vec2: Vec<Data> = vec![Data(0), Data(1), Data(2)];
    let mut vec1: Vec<&Data> = Vec::new();
    
    /*
     ??? what goes here ???
     */
}

What I would like to do, as above, is fill vec1 with a bunch of &Data. This was my first instinct:

for ref data in vec2 {
    vec1.push(data);
}

The ref keyword, along with its friend ref mut, is used to borrow a value when used as part of a pattern. It desugars to:

for orig_data in vec2 {
    let data = &orig_data;
    vec1.push(data);
}

This won’t work, as we move the data out of the vector. The reference we create is thus only usable for the scope that the data is alive in - which is the curly braces of our for loop. We get this error:

fail.rs:7:9: 7:17 error: borrowed value does not live long enough
fail.rs:7     for ref data in vec2 {
                  ^~~~~~~~

Note how this is different from the error we got in the last section, and indicates the problem we just talked about instead of the last section’s problem.

…It’s all very subtle.

The proper way to do it is written in the last section:

for data in &vec2 {
    vec1.push(data);
}

Here, we immutably borrow the vec2. Therefore, the most it can do is give us &Ts - immutable borrows - because it itself is immutably borrowed; moving stuff out of it would be impossible, because it’s a borrow.

Thanks for reading this second post in the series! I initially wanted to get to explaining lifetimes, but since explaining all the nuances of borrowing in the detail I wanted to give took so much time, I’ll have to get to it next time.

You may have noticed the excessive references to the Rust Book - I highly recommend you also read that, if you haven’t already.