
TL;DR
AI agents are flexible and capable tools for automating complex reasoning tasks, but they have a fundamental weakness: fragility. LLM API failures, tool execution errors, network timeouts, and inconsistent data structure can break agent workflows at any step. When an agent fails halfway through a multi-step reasoning process, developers can do nothing but grumble and burn tokens as they re-run the entire workflow.
Consider a data analysis agent that makes three LLM calls and invokes two tools to generate a report. If the final tool call fails due to a transient network error, you lose all progress. The entire workflow restarts from scratch, re-executing five potentially expensive operations. This brittleness becomes especially problematic when you want to compose agents as components in larger systems, where a single failure can cascade through multiple downstream processes.
To address these challenges, we teamed up with Pydantic to build an integration that makes agents production-ready. The integration combines two tools: Pydantic AI, a Python framework for building type-safe AI agents with structured outputs, and Prefect, a workflow orchestration platform that adds reliability, observability, and scheduling to Python code. Together, they enable automatic failure recovery and agentic workflow composition. In this post, I’ll walk through how it works and show you practical patterns for building resilient agent workflows.
Let’s say you’re building an AI agent with Pydantic AI. The agent works great in development, but now you need it to run reliably in production, handling API rate limits, recovering from network hiccups, and providing visibility into what’s happening during execution.
The PrefectAgent wrapper adds these production capabilities without changing your agent’s logic. Before we see it in action, here’s what you need to know: a Pydantic AI Agent is a Python object that calls LLMs with typed inputs and outputs, and Prefect flows are workflows (sequences of operations) that Prefect tracks and orchestrates. Each operation within a flow is called a task.
1from pydantic_ai import Agent
2from pydantic_ai.providers.prefect import PrefectAgent, TaskConfig
3
4
5# Standard agent definition
6agent = Agent(
7 'openai:gpt-4o',
8 name='data_analyst',
9 system_prompt='You are an expert data analyst.',
10)
11
12# Wrap with Prefect instrumentation# TaskConfig controls retry behavior, timeouts, and other execution policies
13prefect_agent = PrefectAgent(
14 agent,
15 model_task_config=TaskConfig( # Separate config for LLM calls
16 retries=3,
17 retry_delay_seconds=[1.0, 2.0, 4.0],
18 timeout_seconds=60.0,
19 ),
20 tool_task_config=TaskConfig( # Separate config for tool invocations
21 retries=2,
22 retry_delay_seconds=[0.5, 1.0],
23 ),
24)
25
26# This .run() call now executes as a Prefect flow
27result = await prefect_agent.run('Analyze the sales data for Q4')When you call .run(), Prefect:
You can view all of this in Prefect’s web dashboard, which shows which steps succeeded, which failed, and how long everything took.
We mentioned that Prefect caches successful task results, but what does that actually mean for your agent workflows? It means failures don’t force you to start over from scratch. This is called durable execution: when a workflow fails, you can retry it and Prefect skips the steps that already succeeded, resuming from the point of failure.
Agent workflows fail in predictable ways: API rate limits, network timeouts, tool execution errors, and validation failures. While you can’t eliminate all errors, recovering from them efficiently improves your chances of success.
Consider a data analysis agent that:
If step 4 fails due to a transient network error, a naive retry means re-executing all five steps—including three potentially expensive LLM API calls. This wastes money and time.
The first thing you can do to add resilience to your agent workflows is to configure granular retry policies for different components. Different components have different failure characteristics, so the integration allows separate retry policies:
1prefect_agent = PrefectAgent(
2 agent,
3 model_task_config=TaskConfig(
4 retries=3, # LLM calls: retry with exponential backoff
5 retry_delay_seconds=[1.0, 2.0, 4.0],
6 timeout_seconds=60.0,
7 ),
8 tool_task_config=TaskConfig(
9 retries=2, # Tools: fewer retries, faster backoff
10 retry_delay_seconds=[0.5, 1.0],
11 ),
12)Configuring retries ensures that your agent workflows can recover from transient failures and continue to execute successfully. This will mitigate errors in a lot of cases, but what happens when you break through all the retries and the agent still fails? That’s where Prefect’s durable execution model comes in.
Prefect adds additional durability to your agent workflows with transactional task semantics. When a task completes successfully, Prefect generates a cache key and persists its result. If the flow fails, subsequent runs load these cached results via their cache key and skip the completed tasks. Execution resumes from the failure point.
One of the things that makes Prefect’s caching so useful is its customizability. Prefect ships with a sane default cache policy that works for most use cases, where the task’s inputs, source code, and the ID of its parent function are used to generate a cache key.
Turns out, wrapping Pydantic AI functionality falls outside of most cases. Because each model call and tool invocation includes a timestamp of when it was invoked, the default cache policy would generate unique cache keys for every invocation and lead to redundant executions on retries.
Here’s a concrete example. Consider an LLM call with these inputs:
1{
2 "prompt": "Analyze Q4 sales data",
3 "model": "gpt-4o",
4 "timestamp": "2025-01-15T10:30:45Z", # Changes on every retry!
5 "run_id": "abc123"
6}With the default policy, the timestamp and run_id would be included in the cache key, making it unique across retries. The custom cache policy in PrefectAgent generates keys based only on stable factors like the prompt content, model name, and tool parameters, effectively ignoring transient metadata.
With the custom cache policy in place, when you retry a failed agent run, Prefect:
In the same way that we created a custom cache policy for the PrefectAgent, you can create your own custom cache policies to suit your needs. Want to cache based on prompt content across runs, or always skip caching for certain tool types? Custom cache policies give you that control.
Let’s walk through what happens when our data analysis agent fails at step 4:
Initial Run (Fails at Step 4):
Retry Attempt:
The agent is able to replay and reload its cached state so that it resumes exactly where it left off. This avoids redundant LLM calls and wasted API credits without requiring developers to write their own checkpointing logic.
Your pocketbook will be a big fan of durable execution. A single GPT-5 call with around 1,000 total tokens (prompt + response) costs roughly $0.006. If your agent makes 10 LLM calls and fails at the end, re-running without caching wastes about $0.06. Scale that to hundreds of agent runs per day, and durable execution can easily save hundreds of dollars each month by avoiding repeated LLM calls.
Using Prefect to orchestrate your agents seems like a good idea, but how do you get started? Is there an incremental way to start using agents in your existing workflows?
Pydantic AI’s structured outputs allow you to treat agents as composable workflow components. This structured output allows you mix agent execution with deterministic downstream tasks and other agents.
Traditional agents return unstructured text. You can’t reliably pass that text to downstream systems without fragile parsing logic. For example, an agent might return:
"The data shows three key findings: sales increased 15%, customer retention improved, and costs decreased.
I found two anomalies at timestamps 2024-03-15 and 2024-04-22.
My recommendation is to increase marketing spend."
Parsing this text reliably is brittle because the format might change between runs. With Pydantic AI, you can enforce a schema on the agent’s output:
1from pydantic import BaseModel
2
3class DataAnalysis(BaseModel):
4 summary: str key_findings: list[str]
5 anomalies: list[dict[str, float]]
6 recommendations: list[str]
7
8
9agent = Agent('openai:gpt-5', result_type=DataAnalysis)The DataAnalysis class inherits from Pydantic’s BaseModel, which provides automatic validation and serialization. Now every agent run returns a validated DataAnalysis object and you can rest easy knowing that your workflow is type safe even though you’ve sprinkled some non-deterministic magic into your code.
Here are a couple of patterns that you can use to compose workflows with agents.
The simplest composition pattern: use an agent’s structured output as input to subsequent Prefect tasks.
1from prefect import flow, task
2from pydantic_ai import Agent
3from pydantic_ai.providers.prefect import PrefectAgent
4
5@task
6def generate_report(analysis: DataAnalysis) -> str:
7 """Convert analysis into a formatted PDF report."""
8 return f"# Data Analysis Report\\n\\n{analysis.summary}\\n\\n..."
9
10
11@task
12def send_notification(report: str) -> None:
13 """Email the report to stakeholders."""
14 send_email(to="team@company.com", body=report)
15
16
17@flow
18async def analyze_and_report(dataset: pd.DataFrame):
19 # Agent analyzes the data
20 agent = Agent('openai:gpt-5', result_type=DataAnalysis, deps_type=pd.Dataframe)
21 prefect_agent = PrefectAgent(agent)
22 analysis = await prefect_agent.run(f"Analyze this dataset", deps=dataset)
23 # Downstream tasks consume the structured output
24 report = generate_report(analysis)
25 send_notification(report)The agent is one step in a larger workflow. Its structured output flows into Python functions decorated as Prefect tasks.
Complex workflows might require multiple agents with validated hand-offs:
1class ResearchFindings(BaseModel):
2 summary: str
3 key_points: list[str]
4 sources: list[str]
5
6
7class AudienceSummaries(BaseModel):
8 technical: str
9 executive: str
10 general: str
11
12
13@flow
14async def research_and_summarize(topic: str):
15 # Agent 1: Research the topic
16 research_agent = PrefectAgent(
17 Agent('openai:gpt-5', result_type=ResearchFindings)
18 )
19 findings = await research_agent.run(f"Research {topic}")
20 # Agent 2: Summarize for different audiences
21 # Pass the structured findings object directly via dependencies
22 summary_agent = PrefectAgent(
23 Agent(
24 'openai:gpt-5',
25 result_type=AudienceSummaries,
26 deps_type=ResearchFindings # Type-safe dependency injection
27 )
28 )
29 summaries = await summary_agent.run(
30 "Create summaries for technical, executive, and general audiences",
31 deps=findings # Pass typed object directly
32 )
33 return summariesEach agent validates its output against a Pydantic schema. If the first agent produces malformed results, you don’t waste API credits on the second agent. If the second agent fails, retrying the flow skips the first agent entirely since Prefect cached the results of its intermediate steps.
Agent outputs can determine execution paths:
1class SupportTicket(BaseModel):
2 id: str
3 text: str
4 customer_id: str
5
6
7class TicketTriage(BaseModel):
8 severity: str # "critical", "high", "medium", "low"
9 category: str # "billing", "technical", "general", etc.
10 priority: int
11
12
13@flow
14async def triage_support_ticket(ticket: SupportTicket):
15 triage_agent = PrefectAgent(
16 Agent('openai:gpt-5', result_type=TicketTriage)
17 )
18 triage = await triage_agent.run(f"Triage this ticket: {ticket.text}")
19 if triage.severity == "critical":
20 escalate_to_oncall(ticket)
21 elif triage.category == "billing":
22 route_to_billing_team(ticket)
23 else:
24 assign_to_support_queue(ticket)The agent’s structured output controls workflow logic. Prefect tracks which path executed, enabling non-deterministic agentic workflows with full observability.
So far, everything we’ve shown has been a standalone script that you run directly. This works great for ad-hoc tasks and local development, but production use cases often require running agents as persistent services.
Prefect’s .serve() method turns flows into long-running services:
1if __name__ == "__main__":
2 analyze_and_report.serve(
3 name="data-analysis-service",
4 parameters={"dataset": load_default_dataset()},
5 )Once invoked, your agent workflow becomes a service that other systems can invoke. Common patterns include:
This deployment model means your agent workflows fit into existing data platforms. They can consume data from upstream ETL pipelines, respond to events from monitoring systems, and feed results into downstream analytics dashboards using the same infrastructure that orchestrates your other data workflows.
The Prefect integration for Pydantic AI addresses two fundamental challenges in building with AI agents: fragility and isolation. By wrapping agents as Prefect flows and their operations as tasks, the integration enables automatic failure recovery through durable execution. With Pydantic’s structured outputs, agents become reusable building blocks rather than potentially rogue actors in your workflow.
This represents a shift from “scripts with AI” to “AI as workflow primitives.” Agents aren’t special cases that need bespoke orchestration, they’re Prefect flows like any other script, and participate in the same deployment, scheduling, and observability infrastructure as the rest of your data platform.
Build a fault-tolerant agent: The AI Data Analyst example in the Prefect documentation provides a complete working implementation you can run locally. Try triggering failures (kill the process, simulate timeouts) and observe how the retry logic recovers gracefully.
Read the integration docs: The Pydantic AI integration documentation covers installation, configuration options, and advanced patterns for building durable agents with Prefect.
Share your patterns: We’d love to see how developers compose agents in workflows! Are you chaining multiple agents? Using agent output to trigger conditional logic? Building multi-tenant agent services? Share your patterns and challenges in the Prefect Slack Community.
AI agents are highly capable, but only when they’re reliable enough for production use. With durable execution and structured composition, the Prefect and Pydantic AI integration makes that reliability achievable.








