Distributed data pipelines made easy with AWS EKS and Prefect
SHARE
Aug. 25, 2020

Anna shows you how you can obtain a highly available, scalable, distributed system that will make the orchestration of your data pipelines for ETL & ML much more enjoyable

Anna Geller
Anna GellerLead Community Engineer

Building distributed systems for ETL & ML data pipelines is hard. If you tried implementing one yourself, you may have experienced that tying together a workflow orchestration solution with distributed multi-node compute clusters such as Spark or Dask may prove difficult to properly set up and manage. By leveraging AWS and Prefect, Anna shows how to obtain a highly available, scalable, distributed system that will make orchestration fun and free up your time to work with data and generate value.

Posted on Aug 25, 2020
Community Post
Dask
AWS EKS
Spark
Caching
Debugging
Dynamic DAGs
Error Handling
Logging
Mapping
Notifications
Retries
Scheduling
Integrations
Monitoring
DevOps & CI/CD

Love your workflows again

Orchestrate your stack to gain confidence in your data