Scaling up Prefect with GitStorage
SHARE
Feb. 28, 2022

Building the startup is just as important as building the product. Here's how we establish a high-performance and collaborative culture.

Chris Ottinger
Chris OttingerSenior Technologist

Prefect.io is a python based Data Engineering toolbox for building and operating Data Pipelines. Out of the box, Prefect provides an initial workflow for managing data pipelines that results in a container image per data pipeline job. The one-to-one relationship between data pipeline jobs and container images enables data engineers to craft pipelines that are loosely coupled and don't require a shared runtime environment configuration. However, as the number of data pipeline jobs grow the default container per job approach starts to introduce workflow bottlenecks and lifecycle management overheads. For example, in order to update software components used by flows, such as upgrading the version of Prefect, all the data pipeline job images have to be rebuilt and redeployed. Additionally the container image per job workflow introduces a wait time for data engineers to re-build data pipeline container images and test flows centrally on Prefect Server or Prefect Cloud environment. Fortunately, Prefect comes to its own rescue with the ability to open up the box, exposing the flexibility in the underlying framework.

Posted on Feb 28, 2022
Community Post
Some Category
Dynamic DAGs
Debugging
Mapping
RELATED LINKS

    Love your workflows again

    Orchestrate your stack to gain confidence in your data