Learn You a Rust II - References and Borrowing
This is the second part of the Learn You a Rust for Great Good! tutorial explainer series. If you’re coming to the series for the first time, I recommend starting at the first post linked above.
Last time, we covered the topic of ownership, and how Rust uses it to make really cool things happen such as complete memory safety. However, there’s a small problem with ownership, and I’ll demonstrate it with the same code snippet as last time:
Here, we want to have a do_stuff_with_data
function that takes three ImportantData
s and returns one. However, with Rust’s
ownership system, we end up having to return four ImportantData
s: our three inputs, and one output - else the do_stuff_with_data
function will consume our input values. Remember last time, where we moved a value, making it no longer accessible by its original
name? The same thing happens here, except if we don’t return the ImportantData
s we put in and move them back, they just die
when do_stuff_with_data
finishes doing its thing.
We need a way to have a reference to data. To look at it, perhaps to edit it, but not to own it.
Introducing borrows
Thankfully, Rust’s got our back. Rust provides two sorts of borrows that let you take a reference to a piece of data, whilst still making sure that nothing bad happens to it. (How? Stay tuned, and we’ll tell you later on in the series in the lifetimes section.)
These two types are &T
and &mut T
. I’m using the letter T
here to mean “any type” - an integer, string, ImportantData
, you name it.
(You’ll encounter this pattern later with generics a lot.) The &
operator means “reference to”, and the mut
part means “mutable” -
that is, editable. A borrow is created by using one or both of these operators - for example, borrowing variable
as &variable
.
Let’s have a look at how we can use these to improve our code.
Ah, much nicer - now do_stuff_with_data
only has one output. It borrows a
and b
, and mutably borrows c
, does computations
with them, and returns an owned value - the ImportantData
- that is moved back into main()
. (It might accomplish this
by returning something like ImportantData(a.0 + b.0, c.0)
.)
It’s important to note the difference between a &T
and a &mut T
. Consider the following:
This code will not compile.
fail.rs:2:5: 2:8 error: cannot borrow immutable borrowed content `*vec` as mutable fail.rs:2 vec.push(4);
Here, Rust is telling us that what we are trying to do is strictly verboten. We’ve only been handed an immutable borrow -
a &T
- yet we somehow are trying to edit the thing being borrowed. If you could do this, all sorts of Bad Things™ would happen.
(Also note that Rust has told us about one new piece of syntax, the magic dereferencing asterisk, used to mean “thing behind this borrow”.)
Changing the code to use a &mut T
- a mutable borrow - works.
Sound good? You should also note that moving stuff out of a borrow is not allowed.
fail.rs:6:13: 6:15 error: cannot move out of borrowed content [E0507] fail.rs:6 let z = *y; ^~ fail.rs:6:9: 6:10 note: attempting to move value to here fail.rs:6 let z = *y; ^
What, did you want to leave poor x
with no data? You’ve only borrowed x - not
taken full ownership of it. Thus, moving as we would with ownership is strictly
verboten.
A small note about Copy
There’s one small detail about some times in Rust that we didn’t cover last time. Ever try to do this…
…and wonder why add_two
didn’t gobble up a
and b
? No? Well, it’s rather interesting, so let me tell you about it.
To prevent development with ownership being a massive pain, some types are labeled with a trait called Copy
. (You don’t have
to worry about traits for now, but here’s the relevant section of the Rust Book if you’re interested.) What does this mean? Well, let’s go to the excellent standard library documentation to find out:
Types that can be copied by simply copying bits (i.e. memcpy). By default, variable bindings have 'move semantics.' In other words: However, if a type implements Copy, it instead has 'copy semantics': It's important to note that in these two examples, the only difference is if you are allowed to access x after the assignment: a move is also a bitwise copy under the hood.
Basically, what this is saying is that Copy
makes variables imbued with it have special copy semantics - that is, instead of
moving them about everywhere and worrying about ownership, we simply just copy the 1s and 0s that make up the variable
to a different place, where another variable can use them. Sharing is caring!
An even smaller note about mutability
We’ve now encountered two ways something can be mutable - editable - in Rust: a mut
variable binding (variable binding is a fancy term for “name we use for a variable”), as in let mut x = 1
,
and a &mut
borrow, as in &mut x
. It’s important to distinguish between the two.
Your head probably hurts after reading that. Don’t worry, it’s not as hard as it looks here.
The problem with borrowing
Okay. I’ll admit something to you: when I initially told you about mutable references (&mut
), I made them seem simpler then they actually are.
With ownership, we were always sure that data was exclusively owned by one variable, and it could only be edited through use of that variable. To edit it, you’d either need to change it through the original variable or move the data elsewhere. However, with borrows, we can have many references to some data. What if one is reading whilst another is writing? We now have the possibility of a data race occuring. Here’s the definition:
There is a ‘data race’ when two or more pointers access the same memory location at the same time, where at least one of them is writing, and the operations are not synchronized.
The bad things that happen under the umbrella term data race include: two bits of code editing data at the same time, resulting in mangled data; something reading a piece of data being written, resulting in the reader getting garbage - and various other nefarious events. So how do we fix it?
Enter the borrowing rules
Rust is clever, and it has a set of rules to deal with this exact problem. Here they are, straight out of the Rust Book:
Rule one: any borrow must last for a scope no greater than that of the owner
Basically, “a borrow can’t outlive the thing it’s borrowing”. It’s hopefully obvious why: a borrow to nonexistent content doesn’t make any sense, and trying to read or write it will so something - it’ll try and read from where the thing it borrowed once was - but you have no idea what will happen.
Rule two: you may have EITHER 1+ immutable borrows OR EXACTLY 1 mutable borrow
This one prevents data races. It dictates that, whilst you have any immutable (&T
)
borrows, you’re not allowed any mutable (&mut T
) borrows - to prevent the problem
of something reading while something else is writing, or code getting lost
because a piece of data it depended on changed in an unexpected way.
It also says that you can’t have two or more mutable borrows, to prevent the problem of two things overwriting the same piece of data.
A few examples of terrible and shameful code
Stolen from the Rust Book:
If we compile this, Rust yells at us.
error: cannot borrow `x` as immutable because it is also borrowed as mutable println!("{}", x); ^
Let’s walk through how this is bad:
- When
y
was created, we gave it a&mut x
- a mutable borrow ofx
. println!()
, a macro to print stuff out to the console, needs to readx
. Since it needs to readx
only, it takes an immutable borrow (as most things in Rust do)- Rust looks through this code, and sees
println!()
trying to immutably borrowx
. Sincey
’s still around - a mutable borrow - it gets annoyed and blows up in our face.
You’ll actually notice that Rust will be helpful to you if you encounter such an issue, and tell you where the borrow ends. In this case, it’s right at the end of main()
. Dang.
note: previous borrow ends here fn main() { } ^
The solution? Again, pilfered from the very hands of the Rust Book’s author:
The braces create a new scope - a new block of code for stuff to occur in.
These help us express the notion that we only want y
to live so we can add 1 to it, and after that we’re done.
Here’s a more sneaky example - one that I’ve encountered myself, and one that took me a while to figure out. I post it here, so that you may not be as stupid.
What’s wrong here? This code is totally fine! I wrote it, dammmit, I know what’s going on!
Nope.
fail.rs:8:18: 8:22 error: `vec2` does not live long enough fail.rs:8 for data in &vec2 { ^~~~
What’s going wrong here is more subtle. To solve it, you have to know this one fact:
Rust gets rid of data in reverse order to when it was created. This is called
last in, first out (LIFO).
Let’s break the problem down.
- We have
vec1
, a vector of borrows to some piece ofData
. (You might be asking yourself: where does theData
come from?) - We then have
vec2
, a vector ofData
. So this is where it comes from. - We borrow every piece of
Data
invec2
with afor
loop, and store these borrows invec1
. - We’re now at the end of
main()
. Rust comes along and starts destroying stuff. - Its first target is
vec2
, because it was created last. Bonk RIPvec2
. - Now, we have a problem.
vec1
is full of borrows to stuff invec2
- which just got bonked on the head. This breaks rule one, as a borrow is outliving something it borrowed.
Rust even told us that at the start. Sigh. The solution is simple: reverse the order!
Everything works, and all is well with the world.
Sidenote: iterating over vectors
This stymied me when working on this part.
What I would like to do, as above, is fill vec1
with a bunch of &Data
. This
was my first instinct:
The ref
keyword, along with its friend ref mut
, is used to borrow a value when
used as part of a pattern. It
desugars to:
This won’t work, as we move the data out of the vector. The reference we create
is thus only usable for the scope that the data is alive in - which is the curly
braces of our for
loop. We get this error:
fail.rs:7:9: 7:17 error: borrowed value does not live long enough fail.rs:7 for ref data in vec2 { ^~~~~~~~
Note how this is different from the error we got in the last section, and indicates the problem we just talked about instead of the last section’s problem.
…It’s all very subtle.
The proper way to do it is written in the last section:
Here, we immutably borrow the vec2
. Therefore, the most it can do is give us &T
s - immutable
borrows - because it itself is immutably borrowed; moving stuff out of it would be
impossible, because it’s a borrow.
Thanks for reading this second post in the series! I initially wanted to get to explaining lifetimes, but since explaining all the nuances of borrowing in the detail I wanted to give took so much time, I’ll have to get to it next time.
You may have noticed the excessive references to the Rust Book - I highly recommend you also read that, if you haven’t already.