Snorkel AI replaced their homegrown orchestration system with Prefect Open Source, eliminating custom infrastructure for caching, telemetry, orchestration and queueing while achieving significant performance improvements. Now running thousands of workflows daily, they chose Prefect for its incremental adoption approach, developer-friendly interface, and ability to provide immediate value without operational complexity.
Snorkel AI’s mission is to build the data layer for specialized AI. The company has two core offerings:
Snorkel Expert Data-as-a-Service (DaaS) – an end-to-end data delivery service used by many of the world’s leading frontier model providers. It combines a global network of highly trained subject matter experts with Snorkel’s proprietary programmatic data labeling and quality-control platform to deliver high-quality, specialized datasets and simulation environments. Customers can blend their own in-house expertise and data with proprietary datasets developed through Snorkel’s expert network.
Snorkel Enterprise AI – a combination of Snorkel’s Enterprise Platform and its AI Services business that helps large enterprises evaluate, tune, and operationalize AI for specialized, mission-critical use cases. Snorkel partners with enterprise data science and machine learning teams to build custom models, create tailored evaluation frameworks, and accelerate AI deployment for high-value applications.
Together, these offerings accelerate the evaluation and tuning of specialized AI systems with expert data—helping teams move from prototype to production at scale.
Snorkel started like most fast-growing startups, with an easy solution that could work. "We started off very simple with basic jobs and low scale requirements like ML model scoring and bulk programmatic labeling. You don't need anything too fancy for that," explains Smit Shah, Principal Engineer on Snorkel's AI Platform team.
The company initially used Redis Queue for async processing, which worked fine early on. "We were using RQ which uses Redis as a substrate for queueing and asynchronous processing," Shah explains.
But as Snorkel's ML workloads got more complex, the system showed its limitations. "The number of workers were static and there was a lack of resource granularity. We had four workers per pod and if there was contention we would scale up to another four, then eight and so on. The problem with this is it's not elastic enough and it's bulky so we were wasting a lot of resources. The system also lacked isolation between parallel workloads, so different types of jobs could interfere with each other's performance."
"The first bottleneck we hit after resource isolation using k8s was the job taking too long. So how do you do parallelization?" Shah recalls. As their compute platform grew, including adding Ray for distributed data parallel computation, they kept building more and more orchestrations workarounds on top of their simple queuing system.
Things got more complex as Snorkel evolved beyond simple workflows, including multi-step workflows with dependencies, caching, and retries. The problems went beyond just performance. The system lacked proper observability and debugging capabilities. "When something failed we were not able to namespace and scope the logs and easily figure out exactly where it went wrong," Shah explains. Without native dashboards or comprehensive monitoring, the team had to build extensive custom tooling around their queuing and processing frameworks.
"We had to build the caching stack for incremental processing, we had to build the telemetry stack for better debuggability, we had to build our own DAG planning and orchestration layer for branching and efficient execution" Shah explains. "There was no concept of a flow like what Prefect has, where you could imperatively describe the dependencies across jobs and then nest the jobs. Our jobs were pretty monolithic."
When looking at orchestration platforms, Snorkel evaluated the usual suspects: Airflow and Dagster. But Prefect's approach to incremental adoption stood out.
"We didn’t want to do a wholesale revamp of our architecture, but wanted to do it incrementally. Prefect was providing in-process execution libraries where even if you have the work in an existing worker process, you could still run Prefect inside the worker."
This meant Snorkel could migrate workflows gradually instead of ripping everything out and starting over. The iterative approach was crucial to their success. "We wanted to start by doing it in-process for network-bound workloads (like ones with calls to LLM APIs). Prefect provided us with the ability to use decorators for subtasks within the workload that were network bound. Prefect also provided inbuilt rate limiting primitives for network bound operations. This helped solve network bottlenecks while also keeping the Prefect integration lightweight."
Prefect's plugin capabilities also played a key role in their architecture. When they needed compute isolation and parallelization for certain workloads, they were able to use their existing stack which was orchestrated by Prefect. "Prefect has good task runners which automatically call out to Ray. It has good integration with Ray and we use Ray for distributed computation for cpu-bound jobs. Prefect was expressive enough for both high concurrency network bound as well as cpu bound workloads."
Snorkel's move to Prefect solved multiple problems at once:
"Prefect provides reliability, observability, alerting, and reducing the amount of code we have to maintain and the components we have to build out," Shah explains.
The platform replaced a bunch of homegrown systems:
One of the biggest wins was operational visibility. "When someone comes in saying these programmatic checks are not running as expected against a data point, now Prefect has nice filtering around flows at a certain timestamp and with a certain number of data points," Shah describes.
"Previously we had to scan through every single log—some in CloudWatch, some still local because they hadn't been published yet. The filtering was extremely difficult. We built out a jobs dashboard for listing jobs, filtering with important facets and retrieving logs but it was a lot of infrastructure to maintain and had weird edge cases."
With Prefect's dashboard, debugging became much simpler: "You have this GUI for the dashboard where you can see all these green and red indicators for each flow. If you click on a red one, there's obviously a failure, and you just drill down into the specific set of subtasks and subflows."
Snorkel's prompting workloads needed serious performance improvements. "We wanted to improve the throughput by 20-30x for large dataset processing with LLMs," Shah explains. "We wanted to do it in-process and run it in Prefect with an async IO loop for processing, because our workload was network bound."
Prefect was instrumental in achieving their 20x throughput improvement by providing the orchestration backbone and resiliency needed for high-volume operations. "We migrated to Prefect in a couple of places. The first flagship feature was improving the throughput of our LLM prompting jobs by 20x. Prefect provided both the resiliency (via configurable retries and periodic reruns) and incremental processing capabilities using task level caching," Shah explains.
The move to Prefect delivered measurable improvements:
The team achieved significant improvements across their AI operations. As Shah notes, Prefect enabled substantial scale: "It's just reliable execution of tens of thousands of flows that we run every day with Prefect. It's a workhorse for us."
By ditching homegrown systems, Snorkel cut way down on maintenance while getting enterprise features. "At a high level we are using a lot less moving parts that we have to maintain on an ongoing basis. That's obvious because Prefect provides queuing, orchestration, control and execution plane—everything. So that's the biggest outcome," Shah reflects.
Snorkel runs Prefect across several critical workflows, handling a wide range of use cases from financial document classification, including "earning reports, stock sale agreements, mortgage agreements for housing loans," to simulating and providing quality control for sophisticated multi-turn conversational datasets. One example Shah describes is "a multi-turn conversational insurance copilot where you have a conversation between a user and a chatbot" that includes contextual, personalized data from enterprise databases.
These complex datasets require careful curation. "You want responses which are fairly distinct from each other so the model can learn well instead of those being very similar," Shah explains. "There's a coverage aspect where you have queries across many domains. Assume there's an internet provider chatbot where you would have 'okay my service is down,' 'I want to sign up,' 'I want to change my plan,' 'my speed is slow,' 'how do I get a phone plan along with internet plan, do I get a discount.' You want coverage as well as diversity in the evaluation dataset."
"The main workload is large scale, dynamic data processing with LLMs in-the-loop. It is very important to us," Shah explains. "We batch process datapoints, and measure datapoint and annotator quality as the annotators generate and review data. Think of a dynamic machine where new data points are generated every minute."
These pipelines handle sophisticated data processing requirements:
Prefect's scheduling capabilities handle the complexity: "There are things where when you schedule let's say every minute, the first one might run more than a minute. So then there needs to be synchronization where Prefect has these primitives that ensure you won't have two runs of the same job at the same time."
"We use live quality checks and automated feedback for human experts when they’re creating or reviewing data points. You want fast feedback as an expert, say whether what you’ve create is relevant to the target topic and subtopic," Shah describes.
For their real-time use case, Prefect's extensibility allowed them to optimize response times: "We were using the work pools in Redis but they polled at regular intervals e.g. every 10 seconds, so we built some homegrown scaffolding on top which is like a Redis pub-sub mechanism for the Prefect worker to process the job very quickly."
Prefect handles these interactive workflows while keeping them isolated from the main app: "If we had it in our application server directly instead of in Prefect offloaded to another process, the scalability would be pretty limited. Prefect is good for isolating anything potentially long-running from the transactional stateless servers."
The platform orchestrates various computational patterns:
"There are a few flavors of jobs. There are network-bound ones like the prompting workloads, compute-bound ones like the embedding jobs, and I/O-bound jobs where you're reading a lot of data but your computation is light," Shah explains.
Snorkel runs Prefect on their own Kubernetes infrastructure, maintaining control over their orchestration platform.
Snorkel's decision to self-host Prefect aligns naturally with their existing infrastructure and business model. "The reason it's not a high lift is we had to do that for all our services and we have developed expertise as a result," Shah explains. "We have a Ray cluster, we have Prefect, we have our stateless services. We have an infrastructure team, seven or eight people who are experts in Kubernetes and Docker. What they do is make sure that our developer environment is very close to production, managing the staging and production environments, and managing the control plane."
The team already manages complex infrastructure for their core product, including private deployments for enterprise customers. "Given that we have these private deployments that we need to do anyway, it just so happens that we have the talent for it and it's needed for differentiating our product. In which case it's not a lot of lift to actually use the same thing for Prefect."
The Snorkel team uses standard Kubernetes practices to manage their Prefect deployment. "What we do is install a Kubernetes Helm chart which manages and autoscales the replicas as needed."
The setup handles significant scale: "We run about a thousand flows an hour with Prefect and we're perfectly fine since most of these are network bound with only a few compute-bound jobs."
Looking back at the decision, Shah identifies what made the difference. Importantly, he notes that speed was not the primary factor: "I wouldn't say imperative versus declarative was the biggest thing, but the big thing was just that it was much more natural to onboard onto it as a developer."
Prefect's ability to deliver immediate benefits was the decisive factor. "Prefect provides value very quickly. That's the main reason we onboarded to Prefect," Shah emphasizes.
The contrast with alternatives was stark: "We tried Airflow and it worked but it was very hard to operate. It also didn’t have the in-process execution capabilities for a gradual migration that we needed."
Developer experience was a major factor in their decision. "One of the things I would say is that we wanted to limit the freedom of expression in our workflows while still having the core abstractions right next to the business logic. It was much more natural to onboard onto Prefect compared to other frameworks," Shah explains.
The readability advantage is immediate for new team members: "If I am a new developer joining the company and I'm looking at a Prefect flow, it's so much more obvious in terms of readability." Rather than having to find deployments and correlate them with flow logic, Prefect's decorator-based approach makes the task structure self-evident.
"I would go back to Prefect being our workhorse for asynchronous processing. I would also say it's like a Swiss Army knife. It has a lot of characteristics that map well to our workloads, including scheduling, queuing, orchestration, robust integrations, having a good control plane and observability, incremental processing and caching."
As Snorkel continues to evolve, they're exploring next-generation orchestration patterns that could reshape how AI workflows are managed. "The way the future is going we have these agents which will figure out and create the dynamic graph on the fly," Shah observes. "You have these nodes and the connections are created dynamically at runtime almost like a random walk through possibilities, where the structure emerges step by step as the system explores its path."
This represents a shift from traditional DAG-based orchestration to more dynamic, agent-driven workflows. "As everything moved from local compute to serverless network-bound LLM prompting operations, the next thing is orchestration is moving from static DAG workflows for LLMs to agentic orchestration," Shah explains.
Snorkel's move from homegrown orchestration to Prefect shows how the right platform can eliminate technical debt while enabling serious scale. By picking a solution that supported gradual migration and delivered immediate value, Snorkel transformed their ML operations without disrupting their core business.
The team continues to expand their use of Prefect's capabilities. "Most recently we are using the Prefect artifacts feature for metadata management," Shah notes.
"Even though we have been using Prefect for a year we are still early adopters," Shah notes. "We are still learning every day around and figuring it out on the fly. We are very much in active development and have gotten a lot of value out of Prefect."
For teams facing similar orchestration challenges, Snorkel's experience shows that the right platform choice can deliver both immediate performance gains and long-term architectural benefits.
To learn more about Prefect:
Happy Engineering!