Prefect Product

Glue it all together with Prefect

August 13, 2023
Bill Palombi
Head of Product
Share

Modern programming isn’t so much a matter of “architecting” your own software as it is “gluing” together other people’s code. With each year that passes, there are ever more databases, libraries, and web services to stitch together. It’s not only code that needs to be glued together - with runtimes spread across a vast sea of clouds and containers, it’s other people’s infrastructure as well.

Python is the Glue of Software

Gluing a bunch of components together doesn’t sound fun. It sounds frustrating, and it often is. In his list of Fundamental things I believe about society and life, antirez captures the dismal logic of it all:

[m]odern programming is becoming complex, uninteresting, full of layers that just need to be glued. It is losing most of its beauty. In that sense, most programming is no longer art nor high engineering (most programs written at big and small corporations are trivial: coders just need to understand certain ad-hoc abstractions, and write some logic and glue code).

I agree on the facts, but I have a different, more optimistic interpretation: modern programming can be both interesting and beautiful because of this complexity. With the right tool for the job, solving any problem can be a joy.

Python, of course, is the right tool for the job. In his seminal paper, Glue It All Together With Python, Guido van Rossum, Python’s creator, described it as:

an advanced scripting language that is being used successfully to glue together (”steer”) large software components.

Twenty five years later, most Python code is just that - it’s the glue that holds together the stack - not just the data stack, but many backend stacks. That’s why it's become one of the most popular programming languages ever, especially for data-intensive applications.

To be Pythonic is to be Readable

Why is Python so useful for gluing things together? Readability. Readability is so highly valued by the Python community that “Pythonic” is nearly synonymous with “readable.” After all, code is read many more times than it is written. As Guido put it:

You primarily write your code to communicate with other coders, and, to a lesser extent, to impose your will on the computer.

Readability is especially important for glue code. When there’s a database or a web service for everything, all that's left to do is express the process that uses those components to achieve an outcome. Python enables engineers to express those processes clearly.

If Python is so great, why is gluing things together still such a pain? Think about the last time you solved a problem with Python. What was painful about it? It probably wasn’t writing the script itself, but everything that came after “it works on my machine” - getting the code to run reliably in the right place at the right time.

Airflow Giveth, Airflow Taketh Away

Airflow was supposed to fix this. It gave engineers the ability to run Python code remotely, but at a cost of the very thing that makes Python great: its readability. To work with Airflow, an elegant script must be contorted to conform to a static, linear, sometimes incomprehensible, DAG.

Let’s illustrate with a basic conditional, the humble if statement. Take this example from Airflow’s documentation:

1@task.branch(task_id="branch_task")
2def branch_func(ti=None):
3    xcom_value = int(ti.xcom_pull(task_ids="start_task"))
4    if xcom_value >= 5:
5        return "continue_task"
6    elif xcom_value >= 3:
7        return "stop_task"
8    else:
9        return None
10
11start_op = BashOperator(
12    task_id="start_task",
13    bash_command="echo 5",
14    xcom_push=True,
15    dag=dag,
16)
17
18branch_op = branch_func()
19
20continue_op = EmptyOperator(task_id="continue_task", dag=dag)
21stop_op = EmptyOperator(task_id="stop_task", dag=dag)
22
23start_op >> branch_op >> [continue_op, stop_op]

What are xcom_pull and xcom_push? Why does each task need a task_id? What is this >> syntax? What is dag=dag doing? If you know the answer to these questions, I’m sorry. If not, hopefully you never will.

Now, let’s express the same workflow in Prefect:

1from prefect import flow, task
2from prefect_shell import run_shell_command
3
4@task
5def continue_task(value):
6    pass
7
8@task
9def stop_task():
10    pass
11
12@flow
13def main():
14    value = int(run_shell_command("echo 5"))
15    if value >= 5:
16        continue_task()
17    elif value >= 3:
18        stop_task()

With Prefect, workflows are simply Python code with some @flow and @task decorators sprinkled in. You can not only use if statements, but all of Python’s abilities, including for loops, while loops, and even asyncio for concurrent code.

Importantly, though, if you removed these decorators, this code would still work because it's just Python. If you aren’t using a legacy orchestrator, your code probably looks more like the second example, so you can imagine how much easier it is to start using Prefect. In fact, folks often start using Prefect for just one feature, such as scheduling, then they add other Prefect features over time as their needs demand. What are those features?

Prefect Productionizes Python without Sacrificing Readability

The Prefect @flow and @task decorators, and the services they engage, supercharge a script with the features it needs to be depended on for production workloads:

  • Schedules and triggers ensure that the script starts at the right time and under the right conditions.
  • Caching ensures that workflows are executed efficiently and performantly.
  • Concurrency controls ensure that resources like databases aren’t overwhelmed.
  • Retries ensure that if something unexpected happens, they can attempt to recover.
  • Deployments ensure that a workflow has the dependencies and the configuration it needs to run.
  • Notifications ensure that when a human needs to get involved, they know it.

Unadulterated Python code is the clearest expression of a workflow. Airflow made us choose between the clean, readable, Pythonic code that we want, and the robust, distributed, observable production workflows we need. It doesn’t have to be this way. Prefect delivers on Airflow’s false promise: to elevate Python scripts to repeatable, resilient, reactive workflows, without sacrificing readability.

Prefect makes complex workflows simpler, not harder. Try Prefect Cloud for free for yourself, download our open source package, join our Slack community, or talk to one of our engineers to learn more.