Deploying Prefect flows with GitHub Actions
Apr. 27, 2022

Store & register your prefect workflows with GitHub Actions

Jean Luciano
Jean LucianoSolutions Engineer

Prefect allows you to write repeatable, orchestrated data workflows using a modern Python framework. One interesting aspect of the “orchestrated” part of this solution means that you can deploy your data workflows like any other code, using familiar tools from your development and CI/CD infrastructure.

For example, Prefect flows are just Python files, and as such they can be both stored and registered using GitHub. In this article I’ll demonstrate how you can leverage GitHub Actions to automate the deployment of your Prefect flows.

This manner of flow registration means using script-based storage for your flows. By default, Prefect serializes flows using cloudpickle, then unpacks the pickled flow code at runtime. With script-based storage, however, the flow object is created by grabbing code file stored in a bucket or repository.

Using GitHub for both storage and deployment streamlines your workflow: just push your changes to the repo and they’ll be reflected in your next flow run!

Let’s step through the process of creating a GitHub Action that registers your flow definition file. Here’s what you’ll need:

Create a .github/workflows directory in the root of the repo containing your flow. In this directory, add a YAML file called prefect.yml. This is your GitHub Actions workflow.

Okay, open this file and start filling out your workflow! This is what our example GitHub Action looks like:

Since the example registers the flow on a merge to main, we call it “Register Flow using GitHub Storage”, but of course this can be anything.

Next, define the on event. In this case, you will be registering the flow on a push to main. By setting this branch as a protected branch, your flow will only register after the code is reviewed and merged, following common Software Development Lifecycle (SDLC) practices.

Next, define the job. We’re calling it “Register flow”. The example here uses the latest Ubuntu version to run this action. The registration will be done in a container, and Prefect provides images on DockerHub with Prefect already set up, so it grabs the latest version for that from prefecthq/prefect:latest.

To register the flow, you need a way to authenticate with Prefect Cloud.

In the example code, we grab an API Key from the cloud. These keys are tenant scoped, allowing you to register multiple different flows for the same tenant.

You can store this key using GitHub’s Secrets feature. For this, go into the settings in your repo on GitHub, and go to Secrets > Actions.

After your key is stored as a secret, you can grab and set as an environment variable called KEY:

Okay! now let’s define our steps. Our Prefect image includes prefect and it’s dependencies, but chances are our flow also uses other Python dependencies. If your flow doesn’t, this part is not required.

Create a requirements.txt file by running pip freeze. Then use the community action, Pip Installer, to install the dependencies listed in the file.

Our environment is now set, let’s register our flow.

First we have to authenticate with Prefect Cloud, this is where we will be using our KEY environment variable we defined earlier.

Finally, run the prefect register command to register the flow. The project flag registers the flow to your chosen project.

Now that the action is set up, push to your repo. Again, our flow is set to kick off when we push to main or, in the case that it is a protected branch, merge to main after a review. By default, actions are enabled on all repos, allowing you to kick off the action on the first time pushing your prefect.yml file. No other configuration needed!

By leveraging GitHub Actions to register our flows, we streamline our data pipeline development process even further. Try out the Prefect tutorial today to start automating your workflows.

Happy engineering!

Posted on Apr 27, 2022
Blog Post
GitHub Actions
DevOps & CI/CD

Love your workflows again

Orchestrate your stack to gain confidence in your data