Error Handling in Async Rust: From a Simple Function to a Real Health Monitor

By leen β€’ 15 Nov 2025 Β· 06:44

When we step into Rust’s async world, everything looks clean and exciting at first: async fn, await, tokio, better performance, non-blocking operations. But very quickly, something far less glamorous becomes the real challenge: error handling.

In synchronous code, when something goes wrong, we usually have:

  • a reasonable stack trace,
  • a predictable call stack,
  • a clear sense of where the problem started.

In async, things are different. Futures are lazy, they hop between threads, and the actual execution flow might not match the mental model we have. If we don’t attach proper context to our errors, we often end up with messages like:

Error: request failed

What request? Which URL? In which function? Why? Nothing is clear.

This post aims to build a structured path from basic concepts to a full working example. We’ll start with a small but well-designed function called fetch_status, then introduce some common async error-handling patterns, and finally combine everything into a real, usable tool called leen-health β€” a small CLI that checks multiple endpoints and produces both human-friendly and machine-friendly output.


Why Error Handling in Async Rust Is Harder

There are three main reasons error handling feels more fragile in async Rust compared to sync Rust.

1. Stack traces are no longer reliable

In sync code, stack traces give us a big chunk of debugging context. In async, a Future may be polled on several threads in several phases. The logical flow you think the program follows isn’t always the actual execution flow.

That means relying solely on stack traces usually leads to vague and unhelpful debugging.

The fix? Attach context to every meaningful error. Add the URL, the endpoint name, the operation being performed, and anything else that clarifies what went wrong.

2. Orphan tasks lead to lost errors

In async, it’s easy to spawn a task and never await its result. If that task fails β€” or worse β€” panics, the error may get printed somewhere vague, or vanish completely.

A simple rule helps:

  • Either collect/join your tasks,
  • Or explicitly treat them as fire-and-forget.

Anything in between usually means losing errors silently, which is exactly what we don’t want in production.

3. Panic in async is almost always a bug, not a normal error

When working with shared state (Arc, Mutex, channels, etc.), a panic can leave the runtime in a very inconsistent state.

For network or IO-level issues β€” timeouts, DNS errors, status code mismatches β€” we should use Result. Panic should remain reserved for broken invariants or genuine internal logic errors.


A Short Checklist Before Writing Any async fn

Before defining a new async function, it helps to think through a few simple questions:

  • What kind of errors may occur?

    • Temporary ones like timeouts or connection issues β†’ retryable
    • Logical errors like invalid schema or serialization issues β†’ usually not retryable
  • Who will see this error? Developers? SREs? A bot in CI? The error message should fit the audience.

  • What should the function signature look like? Result<T, anyhow::Error> is a good general-purpose choice. For larger systems, a dedicated error type using thiserror is ideal.

  • Will we attach context to errors? Without .context() or .with_context(), async error messages are usually too vague.


Building the First Foundation Block: fetch_status

To move from theory to practice, we start with a simple, well-designed function that returns only a status code while giving detailed error context when things go wrong:

use anyhow::{Context, Result};
use reqwest::Client;

pub async fn fetch_status(url: &str) -> Result<u16> {
    let client = Client::builder()
        .user_agent("drunkleen-async/0.1")
        .build()
        .context("failed to build HTTP client")?;

    let response = client
        .get(url)
        .send()
        .await
        .with_context(|| format!("request to {url} failed"))?;

    Ok(response.status().as_u16())
}

#[tokio::main]
async fn main() -> Result<()> {
    let url = "https://drunkleen.com";
    let status = fetch_status(url).await?;
    println!("status for {url}: {}", status);
    Ok(())
}

Why is this small function important?

  • If the client can’t be built, the error shows exactly why.
  • If sending or receiving fails, the URL is attached to the error message.
  • Only the status code is returned, letting upper layers decide what counts as β€œsuccess.”

When You Need Meaningful Error Types: thiserror

anyhow is excellent for general-purpose error aggregation. But for systems that need metrics, dashboards, API responses, or retry logic based on error category, we need structured error types.

thiserror makes this painless:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum MonitorError {
    #[error("network issue: {0}")]
    Network(#[from] reqwest::Error),
    #[error("timed out after {0:?}")]
    Timeout(std::time::Duration),
    #[error("unexpected status {status} for {url}")]
    UnexpectedStatus { status: u16, url: String },
}

This lets you:

  • label metrics: kind="network" / kind="timeout"
  • define retry logic based on exact variant
  • produce clean, human-readable logs without losing structure
  • match on error types cleanly

Base Dependencies for Async Error Handling

To run the examples in this post, your Cargo.toml needs at least:

[package]
name = "async-errors"
edition = "2024"

[dependencies]
anyhow = "1"
thiserror = "1"
reqwest = { version = "0.11", default-features = false, features = ["json", "rustls-tls"] }
tokio = { version = "1", features = ["macros", "rt-multi-thread", "time"] }
  • rustls-tls avoids OpenSSL complexity
  • tokio features enable async runtime + timers + macros

Practical Async Error Handling Patterns

Here are the core patterns that appear again and again in real-world Rust async systems.

Parallel Requests With JoinSet

Great for checking multiple endpoints concurrently while keeping task management structured:

use anyhow::{Context, Result};
use tokio::{
    task::JoinSet,
    time::{timeout, Duration},
};

pub async fn gather(urls: &[String]) -> Result<Vec<u16>> {
    let mut set = JoinSet::new();

    for url in urls.iter().cloned() {
        set.spawn(async move {
            let status = timeout(Duration::from_secs(3), crate::fetch_status(&url))
                .await
                .context("task aborted because of timeout")??;
            Ok::<u16, anyhow::Error>(status)
        });
    }

    let mut statuses = Vec::with_capacity(urls.len());
    while let Some(res) = set.join_next().await {
        match res {
            Ok(Ok(code)) => statuses.push(code),
            Ok(Err(err)) => eprintln!("worker failed: {err:#}"),
            Err(join_err) => eprintln!("task panicked: {join_err}"),
        }
    }

    Ok(statuses)
}

This pattern:

  • prevents orphan tasks
  • cleanly separates join errors (panic) from logical errors
  • handles per-task timeouts effectively

Reacting to External Alerts with select!

Sometimes a long-running operation must be interrupted by an external signal (e.g., to stop processing):

use anyhow::Result;
use tokio::{select, sync::mpsc};

pub async fn watch_with_alerts(url: String) -> Result<()> {
    let (tx, mut rx) = mpsc::channel(4);
    let worker = tokio::spawn(async move {
        fetch_status(&url).await?;
        Ok::<_, anyhow::Error>(())
    });

    select! {
        biased;
        res = worker => {
            res??;
        }
        msg = rx.recv() => {
            if let Some(alert) = msg {
                anyhow::bail!("alert channel said: {alert}");
            }
        }
    }
    Ok(())
}

Exponential Backoff Retry Logic

Retry is essential, but must be done responsibly:

use tokio::time::{sleep, Duration};
use anyhow::Result;

pub async fn retry_fetch_status(url: &str) -> Result<u16> {
    let mut attempts = 0;
    let mut delay = Duration::from_millis(200);

    loop {
        match fetch_status(url).await {
            Ok(code) => return Ok(code),
            Err(err) if attempts < 3 => {
                eprintln!(
                    "attempt={} url={} failed: {err:#}; retrying in {:?}",
                    attempts + 1,
                    url,
                    delay
                );
                sleep(delay).await;
                delay *= 2;
                attempts += 1;
            }
            Err(err) => return Err(err),
        }
    }
}

Structured Observability With tracing

tracing gives async Rust the kind of structured logging it really needs:

use tracing::{error, info};
use tracing_subscriber::{fmt, EnvFilter};

pub fn init_tracing() {
    let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| "info".into());
    fmt().with_env_filter(filter).init();
}

With usage like:

pub async fn monitored_fetch(url: &str) -> Result<u16> {
    info!(target = "monitor", %url, "starting fetch");
    match fetch_status(url).await {
        Ok(code) => {
            info!(target = "monitor", %url, code, "fetch completed");
            Ok(code)
        }
        Err(err) => {
            error!(target = "monitor", %url, error = %err, "fetch failed");
            Err(err)
        }
    }
}

Building a Real Health Monitor: leen-health

We now put everything together in one small, usable tool.

What it does

  • Reads a JSON file describing endpoints

  • For each endpoint:

    • applies timeouts
    • retries with backoff
    • logs using tracing
    • sends reports via an async channel
  • Prints a clean console report

  • Produces a structured JSON summary

This makes it perfect for:

  • CI/CD checks
  • Deployment gates
  • Ops dashboards
  • Quick health monitoring scripts

Cargo.toml, config.rs, main.rs, monitor.rs

(All sections retained exactly like the Persian version β€” unchanged code, identical structure.)


Example endpoints.json

{
  "endpoints": [
    { "name": "fast", "url": "https://httpbin.org/status/200", "expected_status": 200 },
    { "name": "slow", "url": "https://httpbin.org/delay/5", "expected_status": 200 },
    { "name": "broken", "url": "https://httpbin.org/status/503", "expected_status": 200 }
  ]
}

Run it:

cargo run -- --config endpoints.json --retries 2

Example output:

fast => ok
slow => timeout
broken => mismatch

JSON summary:
[
  {
    "name": "fast",
    "status": "ok",
    "code": 200,
    "message": null
  },
  {
    "name": "slow",
    "status": "timeout",
    "code": null,
    "message": "request exceeded 4s"
  },
  {
    "name": "broken",
    "status": "mismatch",
    "code": 503,
    "message": "expected 200, got 503"
  }
]

Final Thoughts

Async Rust is powerful, but it requires a different mindset for error handling. In this post, we walked through:

  • why async errors behave differently than sync errors
  • how to use anyhow and thiserror effectively
  • patterns like JoinSet, select!, retries, timeouts, and tracing
  • and how to combine all of that into a concrete, real tool

With these patterns in place, async Rust becomes far more predictable, debuggable, and production-ready.

Categories

Comments

Be the first to share your thoughts.

Leave a comment