Demystifying Rust Async: Building Your Own Future and Executor from Scratch

Table of Contents

If you have been working with Rust for a while, you rely heavily on tokio or async-std. By 2025, these runtimes have become incredibly mature, handling everything from networking to file I/O with impressive efficiency. However, for a Senior Rust Developer, treating the async runtime as a “black box” is a liability.

When you hit a deadlock, a mysterious Send trait violation, or a performance cliff where tasks aren’t waking up, understanding the mechanics under the hood is the difference between a quick fix and days of debugging.

In this deep-dive tutorial, we aren’t just going to use async/await. We are going to rebuild the engine. We will implement a custom Future, a task notification system (Waker), and a rudimentary Executor from scratch.

By the end of this guide, you will understand exactly what happens when you type .await.

Prerequisites
#

To get the most out of this article, you should have:

Rust Knowledge: Mid-to-Senior level (comfortable with Traits, Arc, Mutex, and lifetimes).
Environment: Rust 1.80+ (Stable).
Tools: An IDE like VS Code with rust-analyzer or JetBrains RustRover.

We will stick to the standard library (std) primarily, but we will use futures crate utilities to simplify the Waker construction boilerplate, allowing us to focus on the logic.

Setup
#

Create a new binary project:

cargo new custom_async_runtime
cd custom_async_runtime

Update your Cargo.toml:

[package]
name = "custom_async_runtime"
version = "0.1.0"
edition = "2024" # Assuming 2024 edition is standard in 2026

[dependencies]
# We use this primarily for the ArcWake trait to simplify waker creation
futures = "0.3"

1. The Architecture of Async
#

Before writing code, we must visualize the flow. Rust’s async model is poll-based (pull-based). Unlike JavaScript’s Promise system (which pushes callbacks), Rust futures do nothing unless an executor polls them.

The Core Components
#

Component	Role	Analogy
The Future	A state machine representing a value not yet ready.	A package tracking number.
The Executor	The runtime that calls `poll` on futures to drive them to completion.	The postal worker checking if the package arrived.
The Reactor	The system (usually OS-backed like `epoll`/`kqueue`) that notifies when IO is ready.	The sorting facility that scans the package.
The Waker	A callback mechanism to tell the Executor “I’m ready to be polled again.”	A notification text sent to the postal worker.

The Polling Loop
#

Here is how these components interact. This interaction is the heartbeat of every Rust async application.

sequenceDiagram participant E as Executor participant F as Future participant R as Reactor (Timer/IO) Note over E, R: The Async Cycle E->>F: poll(Context) F->>R: Register Waker (if not ready) R-->>F: Acknowledge F-->>E: return Poll::Pending Note over E: Executor sleeps or runs other tasks R->>R: Event Occurs (Time passes / Data arrives) R->>E: Waker.wake() called Note over E: Executor puts Task back in Queue E->>F: poll(Context) again F-->>E: return Poll::Ready(Value)

2. Defining a Custom Future
#

Let’s start with the leaf node: the Future. We will create a TimerFuture that completes after a specific duration. This simulates an I/O operation.

To create a Future, we must implement std::future::Future.

The Shared State
#

We need a way for the “reactor” (a background thread in our simulation) to communicate with the Future struct. We’ll use a SharedState protected by a Mutex.

use std::future::Future;
use std::pin::Pin;
use std::sync::{Arc, Mutex};
use std::task::{Context, Poll, Waker};
use std::thread;
use std::time::Duration;

/// Shared state between the Future and the background thread (reactor).
struct SharedState {
    /// Whether the time has elapsed.
    completed: bool,
    /// The waker to notify the executor when ready.
    waker: Option<Waker>,
}

/// A Future that resolves after a specific duration.
pub struct TimerFuture {
    shared_state: Arc<Mutex<SharedState>>,
}

implementing the Future Trait
#

The magic happens in poll.

Check Status: Is the timer done? If yes, return Poll::Ready.
Register Waker: If not done, we must store the Waker from the Context. The background thread will call this Waker later.
Return Pending: Tell the executor “not yet.”

impl Future for TimerFuture {
    type Output = ();

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // Look at the shared state to see if the timer has already completed.
        let mut shared_state = self.shared_state.lock().unwrap();

        if shared_state.completed {
            Poll::Ready(())
        } else {
            // Set the waker so that the thread can wake up the current task
            // when the timer has completed, ensuring that the future is polled again.
            // Cloning the waker is cheap (it's essentially an Arc).
            shared_state.waker = Some(cx.waker().clone());
            Poll::Pending
        }
    }
}

The Constructor: Spawning the “Reactor”
#

When we create the TimerFuture, we start the thread that acts as our hardware interrupt or OS event loop.

impl TimerFuture {
    /// Create a new TimerFuture which will complete after the provided duration.
    pub fn new(duration: Duration) -> Self {
        let shared_state = Arc::new(Mutex::new(SharedState {
            completed: false,
            waker: None,
        }));

        // Spawn the new thread (simulating the Reactor/OS)
        let thread_shared_state = shared_state.clone();
        thread::spawn(move || {
            thread::sleep(duration);
            
            let mut shared_state = thread_shared_state.lock().unwrap();
            
            // Signal that the timer has completed and wake up the last task
            // that polled the future, if any.
            shared_state.completed = true;
            if let Some(waker) = shared_state.waker.take() {
                waker.wake();
            }
        });

        TimerFuture { shared_state }
    }
}

Key Takeaway: Notice that TimerFuture::new does not block. It spins up a thread and returns immediately. The blocking happens asynchronously via the poll mechanism.

3. Building the Executor
#

A Future is useless without something to run it. The Executor’s job is to manage a queue of tasks. When a task is woken up, the executor pulls it from the queue and polls it.

We need three main parts:

Task: Wraps the top-level future.
Executor: Runs the tasks.
Spawner: Allows us to inject new tasks into the executor.

The Task Structure
#

The Task essentially holds the Future. It needs to be able to reschedule itself onto the Executor’s queue. We will use futures::task::ArcWake to make this easy. When wake() is called on our task, it sends itself back to the channel.

use futures::task::{waker_ref, ArcWake};
use std::sync::mpsc::{sync_channel, Receiver, SyncSender};

/// A Future that can reschedule itself to be polled by an `Executor`.
struct Task {
    /// The actual future being executed.
    /// Mutex is needed because `poll` requires mutability, but ArcWake guarantees thread safety via shared references.
    future: Mutex<Option<Pin<Box<dyn Future<Output = ()> + Send + 'static>>>>,
    
    /// Handle to place the task back into the task queue.
    task_sender: SyncSender<Arc<Task>>,
}

impl ArcWake for Task {
    fn wake_by_ref(arc_self: &Arc<Self>) {
        // When `wake` is called, we send a clone of the Arc<Task> back to the channel.
        // This puts the task back into the executor's "ready to run" queue.
        let cloned = arc_self.clone();
        arc_self
            .task_sender
            .send(cloned)
            .expect("Too many tasks queued");
    }
}

Note: In high-performance runtimes like Tokio, this channel is usually a lock-free queue, and the Mutex around the future is replaced with unsafe UnsafeCell logic for speed. For our “Deep” tutorial, Mutex and mpsc are safer and clearer.

The Executor and Spawner
#

The Spawner creates the task and sends it to the queue initially. The Executor pulls from the queue and runs poll.

pub struct Executor {
    ready_queue: Receiver<Arc<Task>>,
}

#[derive(Clone)]
pub struct Spawner {
    task_sender: SyncSender<Arc<Task>>,
}

impl Spawner {
    pub fn spawn(&self, future: impl Future<Output = ()> + Send + 'static) {
        let future = Box::pin(future);
        let task = Arc::new(Task {
            future: Mutex::new(Some(future)),
            task_sender: self.task_sender.clone(),
        });
        
        // Initial load: Put the task into the queue so the executor sees it immediately.
        self.task_sender.send(task).expect("Queue full");
    }
}

/// Helper to create the pair
pub fn new_executor_and_spawner() -> (Executor, Spawner) {
    // Maximum 10,000 tasks waiting in queue
    let (task_sender, ready_queue) = sync_channel(10_000);
    (Executor { ready_queue }, Spawner { task_sender })
}

The Run Loop
#

This is the engine room. This code runs on the main thread (or worker threads in a thread pool).

impl Executor {
    pub fn run(&self) {
        // Pull tasks off the channel continuously
        while let Ok(task) = self.ready_queue.recv() {
            // Take the future, and if it has not yet completed (is still Some),
            // poll it in an attempt to complete it.
            let mut future_slot = task.future.lock().unwrap();
            
            if let Some(mut future) = future_slot.take() {
                // Create a Waker from the Task instance itself
                let waker = waker_ref(&task);
                let context = &mut Context::from_waker(&*waker); // Create the context

                // POLL THE FUTURE
                if future.as_mut().poll(context).is_pending() {
                    // If pending, put it back in the slot.
                    // When the reactor wakes it up, the Task will be sent back 
                    // to the channel, and we loop again.
                    *future_slot = Some(future);
                } else {
                    // If Ready, we do nothing. The future is dropped/consumed.
                }
            }
        }
    }
}

4. Putting It All Together
#

Now we have a complete, albeit simple, async runtime. Let’s write the main function to verify it works.

We will create a main flow that:

Initialize Executor.
Spawns a task that waits for a timer.
Prints “Start”, then “End” after the delay.

fn main() {
    let (executor, spawner) = new_executor_and_spawner();

    // Spawn a task using our custom system
    spawner.spawn(async {
        println!("howdy!");
        
        // Wait for our TimerFuture to complete (2 seconds)
        TimerFuture::new(Duration::new(2, 0)).await;
        
        println!("done!");
    });

    // Drop the spawner so that our executor knows it is finished and won't
    // receive more incoming tasks (eventually closing the loop).
    drop(spawner);

    // Run the executor
    println!("Executor starting...");
    executor.run();
    println!("Executor finished.");
}

Expected Output
#

When you run cargo run, you should see:

Executor starting...
howdy!
... (2 second pause) ...
done!
Executor finished.

If you see this, congratulations! You have successfully implemented the core logic of tokio or async-std.

5. Performance and Pitfalls in 2025
#

While our implementation works, it is designed for education. In a production environment in 2025/2026, there are several nuances you must be aware of when dealing with custom futures.

Comparison: Our Runtime vs. Production Runtimes
#

Feature	Our Educational Runtime	Production (Tokio/Glommio)
Task Queue	`std::sync::mpsc` (Mutex-based)	Lock-free queues (Crossbeam/Internal)
Reactor	Thread per Timer	`epoll`/`kqueue`/`IOCP` (Syscalls)
Scheduling	FIFO (First In, First Out)	Work-stealing (Multi-threaded)
Waker	`Arc<Task>` cloning	RawWakerVTable (Zero-cost abstractions)

Common Pitfalls
#

1. Blocking the Executor
#

The cardinal sin of async Rust is blocking the executor thread.

// BAD PRACTICE
spawner.spawn(async {
    // This sleeps the entire EXECUTOR thread, preventing other tasks from running.
    std::thread::sleep(Duration::from_secs(5)); 
});

// GOOD PRACTICE
spawner.spawn(async {
    // This yields control back to the executor, letting others run.
    TimerFuture::new(Duration::from_secs(5)).await;
});

In our implementation, since the executor is single-threaded, std::thread::sleep inside a task would freeze everything.

2. Over-waking (Thundering Herd)
#

If you implement a Future that wakes the task up too often (e.g., inside a tight loop without returning Pending), you create a busy loop. This spikes CPU usage to 100%. Always ensure that if you return Pending, there is a legitimate external event that will trigger the Waker later.

3. Pinning Complexity
#

You noticed Pin<Box<...>> or self: Pin<&mut Self>. Pinning guarantees that the memory location of the Future does not change. This is crucial for async blocks because they compile down to self-referential structs (state machines holding references to their own local variables).

If you move a Future that has started executing (polled once), you invalidate those internal pointers = Undefined Behavior.

Conclusion
#

We have peeled back the layers of Rust’s async story. By implementing TimerFuture, Task, and Executor, we’ve proven that async/await isn’t magic—it’s just state machines and callbacks orchestrated by a clever compiler.

Key Takeaways:

Futures are lazy: They do nothing until polled.
Wakers are vital: They are the glue between the OS events and the Executor.
Executors are loopers: They simply cycle through ready tasks.

Where to go from here?
#

If you want to extend this runtime, try these challenges:

Add a Network Reactor: Instead of thread::sleep, use mio to listen for TCP events.
Make it Multithreaded: Use a thread pool for the Executor and implement work stealing.
Implement JoinHandle: Allow spawner.spawn to return a handle to get the result of the future.

Keep coding, stay safe, and happy Rusting!

References:

Prerequisites #

Setup #

1. The Architecture of Async #

The Core Components #

The Polling Loop #

2. Defining a Custom Future #

The Shared State #

implementing the Future Trait #

The Constructor: Spawning the “Reactor” #

3. Building the Executor #

The Task Structure #

The Executor and Spawner #

The Run Loop #

4. Putting It All Together #

Expected Output #

5. Performance and Pitfalls in 2025 #

Comparison: Our Runtime vs. Production Runtimes #

Common Pitfalls #

1. Blocking the Executor #

2. Over-waking (Thundering Herd) #

3. Pinning Complexity #

Conclusion #

Where to go from here? #

Related Articles