Prefect is rapidly becoming the standard in dataflow automation. In our last blog we briefly ran through using Prefect Executors to parallelize on a single node. The question that usually follows single node parallelization is:
How do we parallelize across multiple nodes.
The goal of this blog is to provide a minimal example of how to(with as little external configuration as possible) parallelize across multiple nodes using an ephemeral Dask cluster. In this example we will be using Kubernetes for Flow scheduling, but the pattern is similar with ECS.