Collections in Prefect 2.0
Mar. 17, 2022

New with Prefect 2.0, we’re announcing Collections: a faster way to connect your data stack

Alex Streed
Alex StreedTeam Lead (Integrations)
  • New with Prefect 2.0, we’re announcing Collections. Collections group tasks and flows that interact with a specific service or resource in the data stack; for example AWS, Snowflake, or Slack.

  • Collections make developing and maintaining dataflows easier and faster because collections exist as a single resource that can be accessed and updated as one, and the tasks and flows within can be combined to create end-to-end flows.

  • Examples of collections include prefect-aws for AWS functionality and prefect-slack for Slack tasks. All the collections are available in Collections Catalog in the Prefect 2.0 docs.

  • While Prefect will continue to develop and maintain certain Collections, we encourage developers to contribute to this open source project by creating their own Collections. Tooling is available in the Github repo.

Prefect orchestrates the modern data stack, and, if you pardon my French, there’s beaucoup de choses in the modern data stack. This means that when engineers create workflows with Prefect there are usually a handful of other services and platforms that they need to interact with to accomplish their objectives. A Prefect flow may involve extracting data from an object storage service like S3, validating that data with Great Expectations, loading the data into a Snowflake data warehouse, transforming it with dbt, and sending a notification of the result with Slack.

To date, developers building these dataflows would use the Prefect Task Library to hit the ground running. Since its inception, the Prefect Task Library has been an excellent place for developers to get involved in contributing to an open-source project and has grown to include over 180 tasks.

With Prefect 2.0, we wanted to make task access and usage even easier. With new and powerful features like async support, first class subflows, and the ability to execute arbitrary Python code, off-the-shelf functionality that the Prefect community is able to build will be supercharged. Flows can now easily be shared via import from another Python module. Imagine being able to import a flow that loads data to an S3 bucket and populates a fact table in Snowflake, which can then be composed within your own flow. All the time you would have spent writing that boilerplate code to connect those services can now be spent writing code more meaningful to the business outcome that you want to achieve.

In order to enable the use and creation of these new possibilities we are excited to announce Prefect Collections. Prefect Collections are groupings of pre-built tasks and flows that make creating and maintaining dataflows easier. Each Prefect Collection will contain tasks and flows for a given platform or service and can be installed as a Python package to be used within your flows. Having each Prefect Collection as a separate package allows the collections to evolve along with the services that they integrate with and gives users more control over their dependencies. Collections can be upgraded independently of Prefect and vice versa.

Need to pull some data from an AWS S3 bucket? prefect-aws can handle that:

Want to post a message to Slack during a flow run? prefect-slack has your back:

You can find all the Prefect Collections available in the Prefect docs.

Just like the Prefect Task Library, contributing to Prefect Collections is an excellent opportunity to get involved in and contribute to an open source project. Furthermore, Prefect Collections are an excellent opportunity to own and maintain an open source project of your own. There will be Prefect Collections that are owned and maintained by the Prefect team and our partners, but anyone can create and publish a collection to share with the community. In order to help community members easily get started with their own Prefect Collections we have created a template that gives you everything that you need to get started with a Prefect Collection.

Bootstrapping a Prefect Collection with prefect-collection-template will give you the tools you need to develop, document, and publish your Prefect Collection. We’ve included tools like black, flake8, and pre-commit to enable automated formatting and linting; mkdocs to provide automated documentation; and GitHub Actions workflow templates to automate testing, building, and publishing your collection. To get started, visit the prefect-collection-template repository and follow the instructions to create your very own Prefect Collection.

We’re super excited to see what we build together.

Happy engineering!

Posted on Mar 17, 2022
Blog Post

Love your workflows again

Orchestrate your stack to gain confidence in your data