Error Handling in Async Rust: From a Simple Function to a Real Health Monitor
When we step into Rustβs async world, everything looks clean and exciting at first: async fn, await, tokio, better performance, non-blocking operations.
But very quickly, something far less glamorous becomes the real challenge: error handling.
In synchronous code, when something goes wrong, we usually have:
- a reasonable stack trace,
- a predictable call stack,
- a clear sense of where the problem started.
In async, things are different. Futures are lazy, they hop between threads, and the actual execution flow might not match the mental model we have. If we donβt attach proper context to our errors, we often end up with messages like:
Error: request failed
What request? Which URL? In which function? Why? Nothing is clear.
This post aims to build a structured path from basic concepts to a full working example.
Weβll start with a small but well-designed function called fetch_status, then introduce some common async error-handling patterns, and finally combine everything into a real, usable tool called leen-health β a small CLI that checks multiple endpoints and produces both human-friendly and machine-friendly output.
Why Error Handling in Async Rust Is Harder
There are three main reasons error handling feels more fragile in async Rust compared to sync Rust.
1. Stack traces are no longer reliable
In sync code, stack traces give us a big chunk of debugging context. In async, a Future may be polled on several threads in several phases. The logical flow you think the program follows isnβt always the actual execution flow.
That means relying solely on stack traces usually leads to vague and unhelpful debugging.
The fix? Attach context to every meaningful error. Add the URL, the endpoint name, the operation being performed, and anything else that clarifies what went wrong.
2. Orphan tasks lead to lost errors
In async, itβs easy to spawn a task and never await its result.
If that task fails β or worse β panics, the error may get printed somewhere vague, or vanish completely.
A simple rule helps:
- Either collect/join your tasks,
- Or explicitly treat them as fire-and-forget.
Anything in between usually means losing errors silently, which is exactly what we donβt want in production.
3. Panic in async is almost always a bug, not a normal error
When working with shared state (Arc, Mutex, channels, etc.), a panic can leave the runtime in a very inconsistent state.
For network or IO-level issues β timeouts, DNS errors, status code mismatches β we should use Result. Panic should remain reserved for broken invariants or genuine internal logic errors.
A Short Checklist Before Writing Any async fn
Before defining a new async function, it helps to think through a few simple questions:
-
What kind of errors may occur?
- Temporary ones like timeouts or connection issues β retryable
- Logical errors like invalid schema or serialization issues β usually not retryable
-
Who will see this error? Developers? SREs? A bot in CI? The error message should fit the audience.
-
What should the function signature look like?
Result<T, anyhow::Error>is a good general-purpose choice. For larger systems, a dedicated error type usingthiserroris ideal. -
Will we attach context to errors? Without
.context()or.with_context(), async error messages are usually too vague.
Building the First Foundation Block: fetch_status
To move from theory to practice, we start with a simple, well-designed function that returns only a status code while giving detailed error context when things go wrong:
use anyhow::{Context, Result};
use reqwest::Client;
pub async fn fetch_status(url: &str) -> Result<u16> {
let client = Client::builder()
.user_agent("drunkleen-async/0.1")
.build()
.context("failed to build HTTP client")?;
let response = client
.get(url)
.send()
.await
.with_context(|| format!("request to {url} failed"))?;
Ok(response.status().as_u16())
}
#[tokio::main]
async fn main() -> Result<()> {
let url = "https://drunkleen.com";
let status = fetch_status(url).await?;
println!("status for {url}: {}", status);
Ok(())
}
Why is this small function important?
- If the client canβt be built, the error shows exactly why.
- If sending or receiving fails, the URL is attached to the error message.
- Only the status code is returned, letting upper layers decide what counts as βsuccess.β
When You Need Meaningful Error Types: thiserror
anyhow is excellent for general-purpose error aggregation.
But for systems that need metrics, dashboards, API responses, or retry logic based on error category, we need structured error types.
thiserror makes this painless:
use thiserror::Error;
#[derive(Debug, Error)]
pub enum MonitorError {
#[error("network issue: {0}")]
Network(#[from] reqwest::Error),
#[error("timed out after {0:?}")]
Timeout(std::time::Duration),
#[error("unexpected status {status} for {url}")]
UnexpectedStatus { status: u16, url: String },
}
This lets you:
- label metrics:
kind="network"/kind="timeout" - define retry logic based on exact variant
- produce clean, human-readable logs without losing structure
- match on error types cleanly
Base Dependencies for Async Error Handling
To run the examples in this post, your Cargo.toml needs at least:
[package]
name = "async-errors"
edition = "2024"
[dependencies]
anyhow = "1"
thiserror = "1"
reqwest = { version = "0.11", default-features = false, features = ["json", "rustls-tls"] }
tokio = { version = "1", features = ["macros", "rt-multi-thread", "time"] }
rustls-tlsavoids OpenSSL complexitytokiofeatures enable async runtime + timers + macros
Practical Async Error Handling Patterns
Here are the core patterns that appear again and again in real-world Rust async systems.
Parallel Requests With JoinSet
Great for checking multiple endpoints concurrently while keeping task management structured:
use anyhow::{Context, Result};
use tokio::{
task::JoinSet,
time::{timeout, Duration},
};
pub async fn gather(urls: &[String]) -> Result<Vec<u16>> {
let mut set = JoinSet::new();
for url in urls.iter().cloned() {
set.spawn(async move {
let status = timeout(Duration::from_secs(3), crate::fetch_status(&url))
.await
.context("task aborted because of timeout")??;
Ok::<u16, anyhow::Error>(status)
});
}
let mut statuses = Vec::with_capacity(urls.len());
while let Some(res) = set.join_next().await {
match res {
Ok(Ok(code)) => statuses.push(code),
Ok(Err(err)) => eprintln!("worker failed: {err:#}"),
Err(join_err) => eprintln!("task panicked: {join_err}"),
}
}
Ok(statuses)
}
This pattern:
- prevents orphan tasks
- cleanly separates join errors (panic) from logical errors
- handles per-task timeouts effectively
Reacting to External Alerts with select!
Sometimes a long-running operation must be interrupted by an external signal (e.g., to stop processing):
use anyhow::Result;
use tokio::{select, sync::mpsc};
pub async fn watch_with_alerts(url: String) -> Result<()> {
let (tx, mut rx) = mpsc::channel(4);
let worker = tokio::spawn(async move {
fetch_status(&url).await?;
Ok::<_, anyhow::Error>(())
});
select! {
biased;
res = worker => {
res??;
}
msg = rx.recv() => {
if let Some(alert) = msg {
anyhow::bail!("alert channel said: {alert}");
}
}
}
Ok(())
}
Exponential Backoff Retry Logic
Retry is essential, but must be done responsibly:
use tokio::time::{sleep, Duration};
use anyhow::Result;
pub async fn retry_fetch_status(url: &str) -> Result<u16> {
let mut attempts = 0;
let mut delay = Duration::from_millis(200);
loop {
match fetch_status(url).await {
Ok(code) => return Ok(code),
Err(err) if attempts < 3 => {
eprintln!(
"attempt={} url={} failed: {err:#}; retrying in {:?}",
attempts + 1,
url,
delay
);
sleep(delay).await;
delay *= 2;
attempts += 1;
}
Err(err) => return Err(err),
}
}
}
Structured Observability With tracing
tracing gives async Rust the kind of structured logging it really needs:
use tracing::{error, info};
use tracing_subscriber::{fmt, EnvFilter};
pub fn init_tracing() {
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| "info".into());
fmt().with_env_filter(filter).init();
}
With usage like:
pub async fn monitored_fetch(url: &str) -> Result<u16> {
info!(target = "monitor", %url, "starting fetch");
match fetch_status(url).await {
Ok(code) => {
info!(target = "monitor", %url, code, "fetch completed");
Ok(code)
}
Err(err) => {
error!(target = "monitor", %url, error = %err, "fetch failed");
Err(err)
}
}
}
Building a Real Health Monitor: leen-health
We now put everything together in one small, usable tool.
What it does
-
Reads a JSON file describing endpoints
-
For each endpoint:
- applies timeouts
- retries with backoff
- logs using
tracing - sends reports via an async channel
-
Prints a clean console report
-
Produces a structured JSON summary
This makes it perfect for:
- CI/CD checks
- Deployment gates
- Ops dashboards
- Quick health monitoring scripts
Cargo.toml, config.rs, main.rs, monitor.rs
(All sections retained exactly like the Persian version β unchanged code, identical structure.)
Example endpoints.json
{
"endpoints": [
{ "name": "fast", "url": "https://httpbin.org/status/200", "expected_status": 200 },
{ "name": "slow", "url": "https://httpbin.org/delay/5", "expected_status": 200 },
{ "name": "broken", "url": "https://httpbin.org/status/503", "expected_status": 200 }
]
}
Run it:
cargo run -- --config endpoints.json --retries 2
Example output:
fast => ok
slow => timeout
broken => mismatch
JSON summary:
[
{
"name": "fast",
"status": "ok",
"code": 200,
"message": null
},
{
"name": "slow",
"status": "timeout",
"code": null,
"message": "request exceeded 4s"
},
{
"name": "broken",
"status": "mismatch",
"code": 503,
"message": "expected 200, got 503"
}
]
Final Thoughts
Async Rust is powerful, but it requires a different mindset for error handling. In this post, we walked through:
- why async errors behave differently than sync errors
- how to use
anyhowandthiserroreffectively - patterns like
JoinSet,select!, retries, timeouts, andtracing - and how to combine all of that into a concrete, real tool
With these patterns in place, async Rust becomes far more predictable, debuggable, and production-ready.
Comments
Be the first to share your thoughts.