Expect Great Data: Using Great Expectations With Prefect to Ensure Data Quality
Apr. 8, 2022

By adding Great Expectations quality checks into a Prefect workflow, data teams can increase overall data reliability and confidence.

Andrew Black
Andrew BlackHead of Partnerships

Most data teams have a common mission: get data to the people and systems who need it, on time, with high data integrity. To do this, data engineers build and run pipelines. While historically this might have been done with bespoke scripting, today most employ a workflow orchestrator such as Prefect. Workflow orchestrators not only speed development and reduce overhead through automation, they also detect and manage failures elegantly, and can alert teams when a critical step has failed. In this way, workflow orchestrators can dramatically reduce failures so downstream users have the data they need when they need it; in fact, Prefect users have reported pipeline error reductions of up to 75% (Prefect User Survey, Oct 2021).

But pipeline reliability is only part of the equation. It’s also crucial that the data these pipelines carry is accurate and timely. Most of us have experienced getting information from a dashboard or report only to be told later it is wrong or out of date. When this happens, it can lead to incorrect insights and poor decisions, not to mention frustration and distrust.

Increasingly, data teams are taking on responsibility for the quality of the data in addition to data workflows. Data quality allows data teams or end users to check for anomalies, expired or missing data, and other errors before loading into systems of record. Great Expectations is one of the leading data quality tools and it is the open, shared standard of data quality. It has quickly gained a following of thousands of active users and contributors to its open source project. 

By adding Great Expectations quality checks into a Prefect workflow, data teams can increase overall data reliability and confidence. Error handling and notifications can be combined into a single Prefect workflow so management and monitoring is easier. Observability is in one place. Common logging makes it easier to diagnose and recover from errors as well as prevent them from reoccurring. And because both Great Expectations and Prefect are open-source Python frameworks, they offer the flexibility and optionality many data teams want. 

For those data engineers using Great Expectations without orchestration, Prefect enables continuous validation as well as including Great Expectations tasks as part of broader workflows. This means less manual work, and a centralized operation and observation of errors.

Prefect has had an integration with Great Expectations for nearly a year, with hundreds of joint users. Last December we updated our integration for Prefect Core (now Prefect 1.0).

Today we’re excited to announce the next stage of our partnership to create even better experiences for our joint users across both open source and cloud.

There are two areas we’re focused on:

  1. Product integrations: We’ve updated and verified the Great Expectations task to leverage the latest version (v3), while maintaining backwards compatibility. And we just released a task for Prefect 2.0, our next generation platform.

  2. Go-forward partnership: Besides making our current products work well together, we’re already working on what’s next as our platforms evolve. That means Prefect will have a task ready for Great Expectations’ Cloud offering once it’s released, and Great Expectations will be one of the first tasks available in our new Prefect 2.0 platform. You’ll also see more joint thought leadership as we learn from each other and our communities.

Of course none of this would have been possible without both the Prefect and Great Expectations community contributors. Both the integrations they created and ideas put forward helped us shape our joint vision, and will continue into the future.

Posted on Apr 8, 2022
Blog Post
Some Other Category
Dynamic DAGs

Love your workflows again

Orchestrate your stack to gain confidence in your data