Rust Future（Pin-3）

Rust AsyncRuntime

这一章我们学习 Pin 存在的意义，以及如何使用它。TLDR：Pin 是用来固定 !Unpin 类型的编译器标记，其本身不影响内存分配。

Pin 提案包括 Pin 和 Unpin 两个标记，其存在的目的是为了管理需要应用于那些实现了 !Unpin 的类型的规则。没错，!Unpin 的意思就是并非-不能-Pin。

Pin was suggested in RFC#2349

This naming scheme is one of Rusts safety features where it deliberately tests if you're too tired to safely implement a type with this marker. If you're starting to get confused, or even angry, by !Unpin it's a good sign that it's time to lay down the work and start over tomorrow with a fresh mind.

如果对 Pin 为什么要这样命名有兴趣，可以看看这些讨论。

自引用 struct

首先我们从一个 unsafe 自引用结构体例子开始，这个例子比上一节的 Generator 要简单很多。

RUSTuse std::pin::Pin;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
        }
    }

    fn init(&mut self) {
        let self_ref: *const String = &self.a;
        self.b = self_ref;
    }
    
    fn a(&self) -> &str {
        &self.a
    } 
    
    fn b(&self) -> &String {
        unsafe {&*(self.b)}
    }
}

为了让例子简短，我们使用 init 来初始化自引用属性，由于 b 属性要引用 a 属性，我们需要使用指针类型而不是引用，因为我们无法定义其生命周期。在下面的逻辑中，这个结构体可以正常使用。

RUSTfn main() {
    let mut test1 = Test::new("test1");
    test1.init();
    let mut test2 = Test::new("test2");
    test2.init();

    println!("a: {}, b: {}", test1.a(), test1.b());
    println!("a: {}, b: {}", test2.a(), test2.b());
}

输出结果：

a: test1, b: test1
a: test2, b: test2

如果我们像之前一样，添加一个 swap 操作，把两个变量对换一下：

RUSTfn main() {
    let mut test1 = Test::new("test1");
    test1.init();
    let mut test2 = Test::new("test2");
    test2.init();

    println!("a: {}, b: {}", test1.a(), test1.b());
    std::mem::swap(&mut test1, &mut test2);
    println!("a: {}, b: {}", test2.a(), test2.b());
}

看看输出：

a: test1, b: test1
a: test1, b: test2

我们会发现 test2.b 指向的是旧的位置，自引用结构体被破坏了，test2.b 的生命周期变得不可预测，这会很容易导致段错误、未定义行为等等。

Pin 在栈上

现在，我们尝试用 Pin 来解决这个问题。第一步是引入 Pin 到我们的例子里面，这里的 _marker 的作用是让我们的结构体成为 !Unpin 的类型，意思是固定使用一个栈上的内存地址。

RUSTuse std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}


impl Test {
    fn new(txt: &str) -> Self {
        let a = String::from(txt);
        Test {
            a,
            b: std::ptr::null(),
            // This makes our type `!Unpin`
            _marker: PhantomPinned,
        }
    }
    fn init<'a>(self: Pin<&'a mut Self>) {
        let self_ptr: *const String = &self.a;
        let this = unsafe { self.get_unchecked_mut() };
        this.b = self_ptr;
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

然后运行这个例子：

RUSTpub fn main() {
    // test1 is safe to move before we initialize it
    let mut test1 = Test::new("test1");
    // Notice how we shadow `test1` to prevent it from being accessed again
    let mut test1 = unsafe { Pin::new_unchecked(&mut test1) };
    Test::init(test1.as_mut());
     
    let mut test2 = Test::new("test2");
    let mut test2 = unsafe { Pin::new_unchecked(&mut test2) };
    Test::init(test2.as_mut());

    println!("a: {}, b: {}", Test::a(test1.as_ref()), Test::b(test1.as_ref()));
    println!("a: {}, b: {}", Test::a(test2.as_ref()), Test::b(test2.as_ref()));
}

看起来一切正常。但是如果我们使用之前的 swap 手法来验证一下，这次出现了一个编译错误：

RUSTpub fn main() {
    let mut test1 = Test::new("test1");
    let mut test1 = unsafe { Pin::new_unchecked(&mut test1) };
    Test::init(test1.as_mut());
     
    let mut test2 = Test::new("test2");
    let mut test2 = unsafe { Pin::new_unchecked(&mut test2) };
    Test::init(test2.as_mut());

    println!("a: {}, b: {}", Test::a(test1.as_ref()), Test::b(test1.as_ref()));
    // error[E0308]: arguments to this function are incorrect
    // note: expected `&mut _`, found `Pin<&mut Test>`
    std::mem::swap(test1.as_mut(), test2.as_mut());
    println!("a: {}, b: {}", Test::a(test2.as_ref()), Test::b(test2.as_ref()));
}

我们可以看到，添加了 !Unpin 之后，手动交换地址的操作被编译器阻止了。

值得一提的是，我们将数据 Pin 在某个栈内存上，而栈数据依赖于栈帧，因此我们不能在栈帧中创建一个自引用对象并返回它，因为它已经被回收了。对于开发者来说，这也是需要额外注意的事项，因为 Pin 的栈内存一旦被回收，就可能访问到野指针。

Pin 在堆上

Pin 从堆上分配的内存就可以不在外部使用任何 unsafe，并且无需额外的 init 方法。Pin 到堆上是安全的。

RUSTuse std::pin::Pin;
use std::marker::PhantomPinned;

#[derive(Debug)]
struct Test {
    a: String,
    b: *const String,
    _marker: PhantomPinned,
}

impl Test {
    fn new(txt: &str) -> Pin<Box<Self>> {
        let a = String::from(txt);
        let t = Test {
            a,
            b: std::ptr::null(),
            _marker: PhantomPinned,
        };
        let mut boxed = Box::pin(t);
        let self_ptr: *const String = &boxed.as_ref().a;
        unsafe { boxed.as_mut().get_unchecked_mut().b = self_ptr };

        boxed
    }

    fn a<'a>(self: Pin<&'a Self>) -> &'a str {
        &self.get_ref().a
    }

    fn b<'a>(self: Pin<&'a Self>) -> &'a String {
        unsafe { &*(self.b) }
    }
}

pub fn main() {
    let mut test1 = Test::new("test1");
    let mut test2 = Test::new("test2");

    println!("a: {}, b: {}",test1.as_ref().a(), test1.as_ref().b());
    println!("a: {}, b: {}",test2.as_ref().a(), test2.as_ref().b());
}

当我们使用这个自引用结构体，它会拥有一个稳定的地址，不需要额外注意指针是否时刻有效。

关于 Pin 的实践规则

对于 T: Unpin，Pin<'a, T> 完全等同于 &a' mut T，也就是说，Pin 对 Unpin 类型没有任何影响。
对于 T: !Unpin，从 Pinned T 中获取一个 &mut T 需要 unsafe，也就是说，编译器会拦截任何移动被 Pin 的 !Unpin 的操作。
大多数标准库类型都实现了 Unpin，Futures 和 Coroutines 是两个例外。
Pin 不会像将其放入“只读”内存或任何花哨的东西那样做任何特殊的事情。它仅使用类型系统来防止对这个值进行某些操作。
Pin 的主要用例是允许自引用类型。
如果要实现一个自引用类型，还是需要 unsafe 代码。

总结

到这一步，我们可以整理一下 Pin 是什么，以及他为什么而存在。首先我们希望能够实现 Stackless Coroutine，这会使用 yield 语法和枚举来重组代码，需要解决如何在多个 yield 块之间进行借用的问题。在多个 yield 块之间，能够复用的状态存储在 Coroutine 结构体当中，所以变量和借用都是 Coroutine 结构体的属性，因此问题就变成了如何实现自引用结构体。

自引用结构体的实现与 Rust 的所有权机制冲突，因此引入了 Pin 这个标记来辅助，拦截会引起问题的内存移动，减少不必要的 unsafe。

下一篇：实现