Getting Started with Parallel Computation
blog

Getting Started with Parallel Computation

Prefect provides many simple and expressive building blocks that users can snap together to add advanced features to their workflows. This post shows how easily Prefect tasks can be executed in parallel without requiring deep knowledge of distributed computing. In fact, it only requires two extra lines of code!

Kingsley Blatter | Chief of Staff to the CTO

January 20, 2021

At Prefect’s core is Dask. Because of the way this is natively built in to Prefect, you can take advantage of this distributed framework with some very simple steps without ever having to write any Dask code yourself.
In this post, the first part of a “Getting Started” series, I’m going to demonstrate how Prefect allows you to run parallel Python tasks on your local machine as if you were running in a distributed environment, and I’m going to start from scratch (specifically I’ll start from a fresh Mac OS install). You’ll see why users are choosing Prefect, next to other options such as scripting, no code or other Python (and other languages!) packages.
All steps below are done on Mac OS. (Also, please note, there are multiple ways to install and run Prefect. For the purposes of this blog post, I am using the Anaconda distribution)
Creating a Python environment
Your machine may already have Python installed, but in general it is not a good idea to alter your system’s default Python installation. Instead, you should use a virtual environment manager to prevent corrupting your base environment.
The Anaconda distribution allows you to set up Python environments that are easy to manage and can help ensure you have all the necessary packages and installs required to run your code in a managed environment.
  1. Download the distribution for your operating system from here (note that you can instead install Miniconda for the conda CLI alone)
  2. Once installed, confirm the default Python package is using the distribution that came with Anaconda
>> which python /Users/<username>/opt/anaconda3/bin/python
3. Create a conda environment with a Python version we currently support (3.6+)
>> conda create -n prefectenv python=3.7
4. Activate the conda environment
>> conda activate prefectenv
Lastly, we need to set up your conda environment with the open source Prefect package. This command will install Prefect along with the necessary packages required to run Prefect code:
>> conda install -c conda-forge prefect
Writing and Running your first Flow
You can execute your Python code with Prefect Cloud, Prefect Server or running locally on your machine. For the purposes of this post, I am going to show you how to run locally with flow.run() which requires no additional setup or configuration.
  1. Copy the following code into a file named example_flow.py
import prefect from prefect import task, Flow import datetime import random from time import sleep from prefect.triggers import manual_only
@task def inc(x): sleep(random.random() / 10) return x + 1
@task def dec(x): sleep(random.random() / 10) return x — 1
@task def add(x, y): sleep(random.random() / 10) return x + y
@task(name=”sum”) def list_sum(arr): logger = prefect.context.get("logger") logger.info(f"total sum : {sum(arr)}") return sum(arr)
with Flow(“getting-started-example”) as flow: incs = inc.map(x=range(10)) decs = dec.map(x=range(10)) adds = add.map(incs, decs) total = list_sum(adds)
flow.run()
2. Run the file above and observe the tasks happening all in your terminal by watching the logger statements.
>> python example_flow.py
3. In the example above the tasks are executed sequentially. I mentioned in my opening paragraph that Prefect can easily give you parallelism in your Python code. We can easily achieve this now by adding two lines of code and modifying one other!
# Please note, because we are running this on our local machine # we are using the LocalDaskExecutor. There is also a DaskExecutor # that can be used to distribute the tasks across multiple # machines. This is a more advanced use case that is beyond # the scope of this blog post. # import the line below at the top of your file from prefect.engine.executors import LocalDaskExecutor .... # in the last two lines, add the new executor and modify # flow.run to use said executor executor = LocalDaskExecutor() flow.run(executor=executor)
What you’re observing and benefiting from here is parallelism made readily available by Prefect. This is simply achieved by adding task decorators to your Python code whilst using the Dask executor. That’s all it takes.
For the next part of this Getting Started series, I am going to show you how you can visualize this in the Prefect UI.
Please continue reaching out to us with your questions and feedback — we appreciate the opportunity to work with all of you!
Happy Engineering!
— The Prefect Team