Build AI Agents That Resume from Failure with Pydantic AI
Durable execution for Pydantic AI with Prefect

TL;DR
- AI agents fail mid-workflow due to API timeouts and network errors, forcing you to re-run everything from scratch and waste API credits
- Prefect adds durable execution to Pydantic AI agents. Workflows resume from the point of failure instead of starting over
- This works through smart caching: successful steps (LLM calls, tool invocations) get cached, so retries skip completed work and save money
- Pydantic's structured outputs let you compose agents into larger workflows with type-safe handoffs between steps
AI agents are flexible and capable tools for automating complex reasoning tasks, but they have a fundamental weakness: fragility. LLM API failures, tool execution errors, network timeouts, and inconsistent data structure can break agent workflows at any step. When an agent fails halfway through a multi-step reasoning process, developers can do nothing but grumble and burn tokens as they re-run the entire workflow.
Consider a data analysis agent that makes three LLM calls and invokes two tools to generate a report. If the final tool call fails due to a transient network error, you lose all progress. The entire workflow restarts from scratch, re-executing five potentially expensive operations. This brittleness becomes especially problematic when you want to compose agents as components in larger systems, where a single failure can cascade through multiple downstream processes.
To address these challenges, we teamed up with Pydantic to build an integration that makes agents production-ready. The integration combines two tools: Pydantic AI, a Python framework for building type-safe AI agents with structured outputs, and Prefect, a workflow orchestration platform that adds reliability, observability, and scheduling to Python code. Together, they enable automatic failure recovery and agentic workflow composition. In this post, I’ll walk through how it works and show you practical patterns for building resilient agent workflows.
Making Agents Production-Ready
Let’s say you’re building an AI agent with Pydantic AI. The agent works great in development, but now you need it to run reliably in production, handling API rate limits, recovering from network hiccups, and providing visibility into what’s happening during execution.
The PrefectAgent wrapper adds these production capabilities without changing your agent’s logic. Before we see it in action, here’s what you need to know: a Pydantic AI Agent is a Python object that calls LLMs with typed inputs and outputs, and Prefect flows are workflows (sequences of operations) that Prefect tracks and orchestrates. Each operation within a flow is called a task.
1from pydantic_ai import Agent
2from pydantic_ai.providers.prefect import PrefectAgent, TaskConfig
3
4
5# Standard agent definition
6agent = Agent(
7 'openai:gpt-4o',
8 name='data_analyst',
9 system_prompt='You are an expert data analyst.',
10)
11
12# Wrap with Prefect instrumentation# TaskConfig controls retry behavior, timeouts, and other execution policies
13prefect_agent = PrefectAgent(
14 agent,
15 model_task_config=TaskConfig( # Separate config for LLM calls
16 retries=3,
17 retry_delay_seconds=[1.0, 2.0, 4.0],
18 timeout_seconds=60.0,
19 ),
20 tool_task_config=TaskConfig( # Separate config for tool invocations
21 retries=2,
22 retry_delay_seconds=[0.5, 1.0],
23 ),
24)
25
26# This .run() call now executes as a Prefect flow
27result = await prefect_agent.run('Analyze the sales data for Q4')What This Gives You
When you call .run(), Prefect:
- Creates a flow to track the entire agent execution - The full workflow becomes a Prefect flow with complete execution history
- Runs each LLM call as a task - Model calls get their own retry policy (3 retries, exponential backoff in the example above)
- Runs each tool invocation as a task - Tools get faster retries (2 retries, shorter delays in the example above) since they typically have different failure characteristics
- Records call type and duration - Tasks are named based on the call type and duration is recorded for each task
- Caches successful task results - Full workflow retries don’t repeat completed work, saving time and API costs
You can view all of this in Prefect’s web dashboard, which shows which steps succeeded, which failed, and how long everything took.
Failure Recovery: Durable Execution and Smart Retries
We mentioned that Prefect caches successful task results, but what does that actually mean for your agent workflows? It means failures don’t force you to start over from scratch. This is called durable execution: when a workflow fails, you can retry it and Prefect skips the steps that already succeeded, resuming from the point of failure.
Agent workflows fail in predictable ways: API rate limits, network timeouts, tool execution errors, and validation failures. While you can’t eliminate all errors, recovering from them efficiently improves your chances of success.
Where Agent Workflows Break
Consider a data analysis agent that:
- Calls the LLM to plan the analysis
- Invokes a calculate_statistics tool
- Calls the LLM again to interpret results
- Invokes a detect_anomalies tool
- Makes a final LLM call to generate recommendations
If step 4 fails due to a transient network error, a naive retry means re-executing all five steps—including three potentially expensive LLM API calls. This wastes money and time.
Granular Retry Configuration
The first thing you can do to add resilience to your agent workflows is to configure granular retry policies for different components. Different components have different failure characteristics, so the integration allows separate retry policies:
1prefect_agent = PrefectAgent(
2 agent,
3 model_task_config=TaskConfig(
4 retries=3, # LLM calls: retry with exponential backoff
5 retry_delay_seconds=[1.0, 2.0, 4.0],
6 timeout_seconds=60.0,
7 ),
8 tool_task_config=TaskConfig(
9 retries=2, # Tools: fewer retries, faster backoff
10 retry_delay_seconds=[0.5, 1.0],
11 ),
12)Configuring retries ensures that your agent workflows can recover from transient failures and continue to execute successfully. This will mitigate errors in a lot of cases, but what happens when you break through all the retries and the agent still fails? That’s where Prefect’s durable execution model comes in.
Durable Execution
Prefect adds additional durability to your agent workflows with transactional task semantics. When a task completes successfully, Prefect generates a cache key and persists its result. If the flow fails, subsequent runs load these cached results via their cache key and skip the completed tasks. Execution resumes from the failure point.
One of the things that makes Prefect’s caching so useful is its customizability. Prefect ships with a sane default cache policy that works for most use cases, where the task’s inputs, source code, and the ID of its parent function are used to generate a cache key.
Turns out, wrapping Pydantic AI functionality falls outside of most cases. Because each model call and tool invocation includes a timestamp of when it was invoked, the default cache policy would generate unique cache keys for every invocation and lead to redundant executions on retries.
Here’s a concrete example. Consider an LLM call with these inputs:
1{
2 "prompt": "Analyze Q4 sales data",
3 "model": "gpt-4o",
4 "timestamp": "2025-01-15T10:30:45Z", # Changes on every retry!
5 "run_id": "abc123"
6}With the default policy, the timestamp and run_id would be included in the cache key, making it unique across retries. The custom cache policy in PrefectAgent generates keys based only on stable factors like the prompt content, model name, and tool parameters, effectively ignoring transient metadata.
With the custom cache policy in place, when you retry a failed agent run, Prefect:
- Loads the flow’s execution history
- Identifies which tasks completed successfully
- Returns cached results for those tasks without re-execution
- Resumes from the first failed or pending task
In the same way that we created a custom cache policy for the PrefectAgent, you can create your own custom cache policies to suit your needs. Want to cache based on prompt content across runs, or always skip caching for certain tool types? Custom cache policies give you that control.
The Recovery Flow: A Step-by-Step Walkthrough
Let’s walk through what happens when our data analysis agent fails at step 4:
Initial Run (Fails at Step 4):
- ✓ Model call: Plan analysis (result cached)
- ✓ Tool call: calculate_statistics (result cached)
- ✓ Model call: Interpret statistics (result cached)
- ✗ Tool call: detect_anomalies (fails multiple times with a timeout)
- ⊘ Model call: Generate recommendations (not reached)
Retry Attempt:
- ⚡ Model call: Plan analysis (skipped—loaded from cache)
- ⚡ Tool call: calculate_statistics (skipped—loaded from cache)
- ⚡ Model call: Interpret statistics (skipped—loaded from cache)
- ↻ Tool call: detect_anomalies (retried after 0.5s delay)
- ✓ Model call: Generate recommendations (executes normally)
The agent is able to replay and reload its cached state so that it resumes exactly where it left off. This avoids redundant LLM calls and wasted API credits without requiring developers to write their own checkpointing logic.
Your pocketbook will be a big fan of durable execution. A single GPT-5 call with around 1,000 total tokens (prompt + response) costs roughly $0.006. If your agent makes 10 LLM calls and fails at the end, re-running without caching wastes about $0.06. Scale that to hundreds of agent runs per day, and durable execution can easily save hundreds of dollars each month by avoiding repeated LLM calls.
Composing Workflows with Agents
Using Prefect to orchestrate your agents seems like a good idea, but how do you get started? Is there an incremental way to start using agents in your existing workflows?
Pydantic AI’s structured outputs allow you to treat agents as composable workflow components. This structured output allows you mix agent execution with deterministic downstream tasks and other agents.
Why Structured Outputs Matter
Traditional agents return unstructured text. You can’t reliably pass that text to downstream systems without fragile parsing logic. For example, an agent might return:
"The data shows three key findings: sales increased 15%, customer retention improved, and costs decreased.
I found two anomalies at timestamps 2024-03-15 and 2024-04-22.
My recommendation is to increase marketing spend."
Parsing this text reliably is brittle because the format might change between runs. With Pydantic AI, you can enforce a schema on the agent’s output:
1from pydantic import BaseModel
2
3class DataAnalysis(BaseModel):
4 summary: str key_findings: list[str]
5 anomalies: list[dict[str, float]]
6 recommendations: list[str]
7
8
9agent = Agent('openai:gpt-5', result_type=DataAnalysis)The DataAnalysis class inherits from Pydantic’s BaseModel, which provides automatic validation and serialization. Now every agent run returns a validated DataAnalysis object and you can rest easy knowing that your workflow is type safe even though you’ve sprinkled some non-deterministic magic into your code.
Here are a couple of patterns that you can use to compose workflows with agents.
Pattern 1: Agent Output Feeds Downstream Tasks
The simplest composition pattern: use an agent’s structured output as input to subsequent Prefect tasks.
1from prefect import flow, task
2from pydantic_ai import Agent
3from pydantic_ai.providers.prefect import PrefectAgent
4
5@task
6def generate_report(analysis: DataAnalysis) -> str:
7 """Convert analysis into a formatted PDF report."""
8 return f"# Data Analysis Report\\n\\n{analysis.summary}\\n\\n..."
9
10
11@task
12def send_notification(report: str) -> None:
13 """Email the report to stakeholders."""
14 send_email(to="team@company.com", body=report)
15
16
17@flow
18async def analyze_and_report(dataset: pd.DataFrame):
19 # Agent analyzes the data
20 agent = Agent('openai:gpt-5', result_type=DataAnalysis, deps_type=pd.Dataframe)
21 prefect_agent = PrefectAgent(agent)
22 analysis = await prefect_agent.run(f"Analyze this dataset", deps=dataset)
23 # Downstream tasks consume the structured output
24 report = generate_report(analysis)
25 send_notification(report)The agent is one step in a larger workflow. Its structured output flows into Python functions decorated as Prefect tasks.
Pattern 2: Multiple Agents in Sequence
Complex workflows might require multiple agents with validated hand-offs:
1class ResearchFindings(BaseModel):
2 summary: str
3 key_points: list[str]
4 sources: list[str]
5
6
7class AudienceSummaries(BaseModel):
8 technical: str
9 executive: str
10 general: str
11
12
13@flow
14async def research_and_summarize(topic: str):
15 # Agent 1: Research the topic
16 research_agent = PrefectAgent(
17 Agent('openai:gpt-5', result_type=ResearchFindings)
18 )
19 findings = await research_agent.run(f"Research {topic}")
20 # Agent 2: Summarize for different audiences
21 # Pass the structured findings object directly via dependencies
22 summary_agent = PrefectAgent(
23 Agent(
24 'openai:gpt-5',
25 result_type=AudienceSummaries,
26 deps_type=ResearchFindings # Type-safe dependency injection
27 )
28 )
29 summaries = await summary_agent.run(
30 "Create summaries for technical, executive, and general audiences",
31 deps=findings # Pass typed object directly
32 )
33 return summariesEach agent validates its output against a Pydantic schema. If the first agent produces malformed results, you don’t waste API credits on the second agent. If the second agent fails, retrying the flow skips the first agent entirely since Prefect cached the results of its intermediate steps.
Pattern 3: Conditional Workflows
Agent outputs can determine execution paths:
1class SupportTicket(BaseModel):
2 id: str
3 text: str
4 customer_id: str
5
6
7class TicketTriage(BaseModel):
8 severity: str # "critical", "high", "medium", "low"
9 category: str # "billing", "technical", "general", etc.
10 priority: int
11
12
13@flow
14async def triage_support_ticket(ticket: SupportTicket):
15 triage_agent = PrefectAgent(
16 Agent('openai:gpt-5', result_type=TicketTriage)
17 )
18 triage = await triage_agent.run(f"Triage this ticket: {ticket.text}")
19 if triage.severity == "critical":
20 escalate_to_oncall(ticket)
21 elif triage.category == "billing":
22 route_to_billing_team(ticket)
23 else:
24 assign_to_support_queue(ticket)The agent’s structured output controls workflow logic. Prefect tracks which path executed, enabling non-deterministic agentic workflows with full observability.
Deployment as Reusable Services
So far, everything we’ve shown has been a standalone script that you run directly. This works great for ad-hoc tasks and local development, but production use cases often require running agents as persistent services.
Prefect’s .serve() method turns flows into long-running services:
1if __name__ == "__main__":
2 analyze_and_report.serve(
3 name="data-analysis-service",
4 parameters={"dataset": load_default_dataset()},
5 )Once invoked, your agent workflow becomes a service that other systems can invoke. Common patterns include:
- Scheduled execution: Run the agent on a cron schedule (e.g., daily data analysis at 9 AM)
- Event-driven triggers: Invoke the agent via webhooks when new data arrives or tickets are created
- API integration: Call the agent from other workflows or services via Prefect’s REST API
- Manual runs: Trigger ad-hoc executions from the Prefect UI with custom parameters
This deployment model means your agent workflows fit into existing data platforms. They can consume data from upstream ETL pipelines, respond to events from monitoring systems, and feed results into downstream analytics dashboards using the same infrastructure that orchestrates your other data workflows.
Conclusion
The Prefect integration for Pydantic AI addresses two fundamental challenges in building with AI agents: fragility and isolation. By wrapping agents as Prefect flows and their operations as tasks, the integration enables automatic failure recovery through durable execution. With Pydantic’s structured outputs, agents become reusable building blocks rather than potentially rogue actors in your workflow.
This represents a shift from “scripts with AI” to “AI as workflow primitives.” Agents aren’t special cases that need bespoke orchestration, they’re Prefect flows like any other script, and participate in the same deployment, scheduling, and observability infrastructure as the rest of your data platform.
Try It Yourself
Build a fault-tolerant agent: The AI Data Analyst example in the Prefect documentation provides a complete working implementation you can run locally. Try triggering failures (kill the process, simulate timeouts) and observe how the retry logic recovers gracefully.
Read the integration docs: The Pydantic AI integration documentation covers installation, configuration options, and advanced patterns for building durable agents with Prefect.
Share your patterns: We’d love to see how developers compose agents in workflows! Are you chaining multiple agents? Using agent output to trigger conditional logic? Building multi-tenant agent services? Share your patterns and challenges in the Prefect Slack Community.
AI agents are highly capable, but only when they’re reliable enough for production use. With durable execution and structured composition, the Prefect and Pydantic AI integration makes that reliability achievable.
Related Content








