How Climate Policy Radar Processes 25,000 Policy Documents with Prefect

Meet Climate Policy Radar

Climate Policy Radar's mission is ambitious: make the world's climate laws and policies accessible to everyone who needs them. The NGO tackles one of the most challenging data problems in climate research: extracting actionable insights from tens of thousands of dense, technical government documents that are often buried in the depths of official websites or, in some cases, only available as physical copies that require someone to "go into a basement to find the physical copy and scan it in for you."

"Good data doesn't guarantee good decision-making when it comes to the climate crisis, but bad data guarantees bad decisions and no data guarantees bad decisions," explains Henry Franks, Chief Architect of Climate Policy Radar. "That's our core informal mantra."

The organization processes around 25,000 climate policy documents averaging 80 pages each, with some exceeding 3,400 pages. They transform this massive corpus of PDFs into searchable, structured data that serves around 350,000 annual users worldwide. From academics and policy makers to journalists and insurance risk officers.

Working with partners including the Grantham Institute at LSE, the UNFCCC (the global climate governance body), and Multi-lateral climate funds responsible for hundreds of billions of dollars in climate adaptation funding, Climate Policy Radar works with these organizations to curate a gold standard dataset for understanding global climate policy responses.

Mark Cottam, the first data engineer at Climate Policy Radar who led the Prefect migration, works alongside Henry to build the data generation platform that makes this global climate intelligence possible.

The Challenge: Manual Orchestration at Scale

Before adopting Prefect, Climate Policy Radar relied heavily on AWS Step Functions for orchestration, but the limitations quickly became apparent as they scaled their operations.

"With Prefect you can declare DAG flows and pipelines in Python with complex conditionals, logic and parameters as opposed to JSON style ASL (Amazon States Language) pipeline definitions in AWS Step Functions which are much harder for developers to pick up and work with. Prefect also provides the ability to orchestrate tasks outside of the AWS domain as well as offering many advanced features in v3 like transactional workflows and run artifacts.”," Cottam explained. "Data scientists would have to come to data engineers just to orchestrate tasks, and you'd see random things like lambda functions on crons being spun up to trigger processes across the platform.. Ad hoc tasks were particularly challenging, whereas with Prefect, it's as simple as adding a decorator."

They faced several problems:

Engineering bottlenecks: Data scientists couldn't independently run their scripts without engineering support
Manual parallelization: Teams had to manually define how parallelism worked and how processes fanned out
Limited flexibility: Step Functions required complex configuration for simple tasks
Scattered orchestration: Lambda functions and cron jobs created an unmaintainable patchwork of automation

"Even the processes that we orchestrate ourselves have benefited from Prefect," Cottam noted.

Highlighting how the orchestration challenges affected not just deployment but the fundamental design of their data pipelines.

For a startup that had grown from when Henry joined as employee number 12 to over 35 employees, these inefficiencies were particularly painful. As Franks described the startup reality:

"Early days of startup is when you wake up, there's a hundred things on fire, you work out which fire you have capacity to actually get to, and you have to accept there's 99 fires that are going to keep burning."

This reality made it difficult to carve out time for addressing their orchestration needs.

The Solution: Choosing Prefect over Airflow

The decision to adopt Prefect was straightforward: Henry had prior experience with the platform from his previous role, where he had evaluated multiple orchestration tools including Airflow, Prefect, and Dagster.

That previous evaluation had led them to reject Airflow:

"There's a happy path on Airflow which is, you know, not very happy, but also all of my team at the time had some Airflow scars."

The impact was clear and tangible. "With Prefect, we quickly switched from having to do this very manual reinventing of wheels. Now we can actually spend time on what the core of these flows are: the actual business logic and data processing logic. It just bought us time in that sense," Henry explained.

Implementation: Building for Scale and Flexibility

Climate Policy Radar started investigating Prefect in January 2024 and has just renewed their Professional contract.

Their Prefect implementation includes their document processing workflows, a new AI-powered knowledge graph system, and various operational automations:

Document Processing Pipeline

The team processes their massive document corpus through workflows that handle the complex process of transforming raw government documents into searchable, structured data. The workflows include text extraction, translation, and embedding documents into search indices.

Knowledge Graph Pipeline

Their AI-powered pipeline, built greenfield on Prefect, runs classifiers on the processed text to identify critical policy concepts.

"We have what we call our knowledge graph pipeline where we are essentially running classifiers over our text data set and that is fully orchestrated by Prefect," Cottam explained.

"Three main steps: we run inference, we aggregate and bring together all the outputs, and then we index the results into our document database. That's all in Prefect, utilizing most of the features that you guys provide like Prefect blocks." The system currently runs around 70 classifiers across the text dataset, with hundreds more planned.

Flexible Multi-Team Architecture

Climate Policy Radar built a distributed deployment model where any team can contribute workflows without complex setup.

"We've got this pattern where flows can be declared in any other child repos and then they are deployed to Prefect Cloud from those child repos," Cottam detailed.

"We can literally package up any code from anywhere in our estate and push it and then orchestrate it."

The architecture separates infrastructure from business logic:

Platform team: Manages the base infrastructure and core data pipelines
Data science team: Runs ML workflows and deploys nightly analysis scripts that help policy experts evaluate their classification criteria
Application team: Built workflows for the litigation data mapper

They're also integrating specialized compute: "We are also integrating Coiled, which is a recommended integration with Prefect for running flows on GPUs," addressing their growing AI processing needs.

Cottom notes a tip for adoption,"Prefect isn't that opinionated on how you should do things," Cottam explained, noting that flow code can be stored in GitHub, containers, or even S3.

The organization now runs around 3,300 flows per month across 102 deployed flows.

Key Results

Enabling Data Science Independence

A key impact has been empowering data scientists to work more independently.

"Our data science team can write a script that's just like a local script and then have that run on Prefect," Franks explained. "

They love running things on their laptops, but that creates key person dependencies. Prefect makes it reusable and reproducible."

"Our data scientists are able to just quickly write a script that creates a static site based on the current definitions from what we call the concept store where the policy team work and run that on Prefect every night," he continued. This enables policy experts to wake up each morning and immediately see how their latest classification criteria performed.

Enabling Large-Scale Research

When Climate Policy Radar undertook an exploratory research collaboration with Google, focused on responsible AI usage, Prefect was essential to the project's success.

"We absolutely could not have done [the Google research project] without Prefect. That was months saved," Franks stated definitively.

The scale was unprecedented for their team: "We tortured the Prefect infrastructure a little bit. The project involved 173,000 synthetic question-answer pairs, entailing around a million Prefect-orchestrated workflow runs and LLM calls. That was months saved."

Scaling for the Litigation Project

Climate Policy Radar is taking on their biggest challenge yet. "Litigation data is like a 10x scale project for us," Franks explained. Through their partnership with Columbia University's Sabin Center, they're adding approximately 50,000 climate litigation documents to their platform.

This scale required significant preparation. "We've had to do upgrades across the infrastructure to deal with it," Franks noted. The upgrades are already paying off. Discussing how Prefect helped, Franks explained: "It made it much easier for the application team to just write a Prefect flow for talking to the stakeholder systems and ultimately just automating that in a much more scalable fashion."

The application team has already put this to use, creating a Prefect workflow for the litigation data mapper. This demonstrates how teams beyond data engineering can now contribute directly to the data pipeline.

The Developer Experience Advantage

What makes Prefect particularly powerful at Climate Policy Radar is how it changed their relationship with orchestration code.

"One of the biggest changes with Prefect is writing pipeline orchestration code very close to the actual functionality rather than just having some container you orchestrate," Cottam explained. "That gives you a lot of flexibility."

This approach allows teams to focus on their domain logic rather than orchestration infrastructure.

The Bottom Line

For Climate Policy Radar, Prefect has delivered tangible improvements in how they build and operate their data platform. The transition bought them time to focus on core business logic rather than orchestration mechanics, while enabling data scientists to run nightly workflows without engineering bottlenecks.

"With Prefect, we can abstract away infra, we can abstract away laptops," Franks explained.

"We create a set of abstractions so teams that aren't tasked with caring about infrastructure can devote more effort and capacity to their areas of business value."

With plans to add climate litigation documents through their partnership with Columbia University's Sabin Center and continuing to expand their AI capabilities, Climate Policy Radar shows how the right orchestration platform can help organizations tackle complex global challenges.