How Flatiron Health Orchestrates Cancer Research Data Pipelines
Industry: Health Technology
Use Case: Orchestration of data pipelines building real-world evidence datasets for cancer research and new drug development
Key Outcomes:
- Reduced incident resolution time from days to minutes
- Enabled non-technical team members to independently manage pipeline issues
- Simplified pipeline development to "just Python"
Meet Flatiron Health
Flatiron Health, Inc. is reimagining the infrastructure of cancer care and is on a mission to learn from the experience of every person with cancer. The company operates on two fronts: providing a cloud-based electronic health record (EHR) software for community oncology practices to document patient care at the point of treatment, and transforming that real-world data into deidentified datasets that life science and pharma companies and academic medical centers use to answer critical research questions.
Jai Pidugu, an engineering manager on Flatiron's Real World Evidence team, oversees the data generation platform that processes clinical data points from millions of cancer patients. His team builds the systems that pull data from raw sources, perform complex transformations, and prepare de-identified datasets for delivery to clients like major pharmaceutical companies as well as academic partners.
The Challenge: Too Many Custom Pipelines
Before adopting Prefect, Flatiron faced a fundamental scaling problem. Each new client's research question required dedicated engineering resources to build custom pipelines using their internal ETL framework called "Blocks."
"Every time a project came in, we were ultimately limited by the number of engineers that we have," Pidugu explained. "Every single engineer was building a pipeline specifically tailored to that project. Next thing you know, you have an explosion of pipelines all over the place with no centralized place to manage them."
The problems compounded quickly:
- Each project required dedicated engineers to build custom pipelines
- Non-technical team members had no insight into pipeline status or issues
- Every pipeline was "custom tailored to that specific project" which made business logic reuse challenging
- The Blocks framework lacked adequate maintenance and investment
- The system couldn't orchestrate human-in-the-loop processes beyond basic ETL
"If you had a question as a non-technical person—clinical data analysts, product managers—none of those folks had visibility into what was happening," Pidugu said.
Evaluating the Options
Flatiron considered several alternatives. Workflow management tools that were already deployed at the company seemed like an obvious choice. But the engineering team pushed back.
"We found one of our inhouse tools was unpleasant to work with locally. It just wasn't up to par. More teams are starting to take a closer look at Prefect because of that same pain point," Pidugu explained.
Lower-level in-house tools were also considered for their distributed computing capabilities, but Prefect's higher-level abstractions and built-in integrations made it the clear winner. "While we could have used those tools, going one level up with Prefect granted us the entire UI and visibility functionality we weren't getting with those other tools."
While the team considered the various options, once they went hands-on with Prefect by running a lightweight proof-of-concept, the decision happened quickly. "One of the engineers at the time just wrote up a very simple example, added the annotations and then ran it and said okay cool like we have to look more seriously at Prefect," Pidugu recalled.
Building with Prefect
Flatiron implemented Prefect as the engine for their data generation platform on an EKS cluster. The Flatiron team built a service called "Maestro" that handles requests and communicates with Prefect Cloud to schedule runs.
"If we have to run processes in Europe and they're using ECS, the only thing I have to think about is deploying an ECS Prefect worker in Europe—I know everything else will work," Pidugu explained. Compare that to Flatiron's old approach: "With other tools I'd have to set up a new cluster in Europe and then I'd have to mirror the entire setup."
Flatiron's Prefect setup handles massive scale, with parent flow runs triggering hundreds or thousands of child flow runs to process patient data. "We use child flow runs to fan out the work and use our cluster to the maximum extent," he said.
Transforming Operations
The impact was transformative. Prefect eliminated the engineering bottleneck that previously constrained data delivery.
"For a lot of these packaged deliverables, you don't need any engineering resources," Pidugu said. "You can go into a system, click a button, and it'll trigger something in Prefect. In that sense, you don't need an engineer to deliver these datasets—which is the biggest cost saving we could probably do."
"Looking at a recent project, we saw almost 2.5 Full Time Equivalent (FTE) weeks saved for engineering and more than 1.5 FTE weeks saved for the data team with Prefect," Pidugu said. "These savings will continue to compound and scale with additional projects."
Incidents and Debugging
The team hasn't had any major incidents with Prefect since implementing it. More importantly, when issues do arise, resolution time has improved dramatically. "We can find out what happened in the matter of minutes rather than days with Prefect," Pidugu said.
This improvement is particularly critical given that Flatiron's work involves deidentified patient data that goes into regulatory filings. The enhanced visibility means "non-technical folks have way more visibility into what is going on and what happened" compared to the old system.
Independence for Product Managers
Product managers gained unprecedented independence in managing pipeline operations. Pidugu's product manager particularly benefits from this change: "With Prefect, product managers can independently identify that something went wrong, look at the logs, understand what's going on, and just click retry. There's a first level of triage that they can take care of now that just wasn't possible before. It was always engineer-driven in the past."
The Developer Experience Advantage
What makes Prefect particularly powerful at Flatiron is its Python-native approach. "I don't have to think that much about the inner workings of Prefect. I just have to understand Python code," Pidugu explained. "I have to be okay with how many things I'm running in parallel, but past that point, I'm just thinking about code as Python."
This simplicity was crucial for adoption and long-term maintainability. "There are a couple of flow and task annotations and a little bit of state management code, but it's quite well isolated. Other than that, it's just Python, and that offers us a lot of flexibility."
Security and Compliance
For a healthcare company handling protected patient information, Prefect's architecture provided essential security benefits. "We can manage all PHI ourselves—no information gets sent to Prefect that we're not in control of," Pidugu said. "We keep all PHI within our specific network and only send metadata."
This approach allows Flatiron to maintain strict data governance while leveraging cloud orchestration capabilities.
A Real-World Example
A typical use case involves determining why patients discontinued specific cancer treatments: whether they completed their course of therapy or stopped due to adverse effects. Through the Maestro service, users specify patient cohorts, relevant variables, and generation parameters. The system processes millions of patients, orchestrating both ETL operations and human-in-the-loop abstraction processes where unstructured documents are converted to structured data.
"You make an API request, find out when it's done, and get the result," Pidugu explained. "Users don't see any of the internal workings—they just get their answers about patient discontinuation reasons, treatment outcomes, and other critical research data."
Spreading Success
The positive results haven't gone unnoticed across the organization. "Some of our counterparts on the ML team are starting to look at Prefect," Pidugu said. "These teams have several workflows that they were orchestrating with a different tool, and they "did not enjoy that experience."
"This is probably going to expand beyond our teams," Pidugu said. "We're showing how effective Prefect is for basic orchestration between multiple jobs. It's opening doors for other teams to maybe jump on the Prefect bus."
Looking Forward
Flatiron continues expanding their Prefect usage, with plans to leverage more native features and explore other integrations for enhanced parallelization. The Flatiron team has also built its own integrations. "Something that we've done on our side is we've figured out how to annotate Prefect flow runs," he said, connecting Prefect to their broader observability tools.
Lessons Learned
From a system where every data delivery required dedicated engineering time to one where non-technical team members can independently manage pipelines, Prefect fundamentally changed how Flatiron operates. For Pidugu, the appeal remains simplicity: "Prefect is a lightweight wrapper on top of Python."