Workflow Orchestration

When to Run Python on Serverless Architecture

October 25, 2023
Sarah Krasnik Bedell
Director of Growth Marketing
Share

Building and maintaining data pipelines can be challenging, especially when dealing with complex data sources, diverse data formats, high data volumes, and changing business requirements. This is why many developers choose Python for data: its flexibility.

It doesn’t stop with just Python. You're also responsible for managing the infrastructure that runs your workflows, including servers, networks, storage, and security, which can be costly, time-consuming, and error-prone. And not to mention complex.

What if you could simplify your approach to infrastructure? You don’t always need maximally robust. What if you could focus on the logic and the value of your code rather than the details of how to run it? This is where serverless architecture comes in.

👉🏼 Definition
Serverless architecture
is a deployment pattern in which workflows are hosted as individual pieces of work by a cloud provider.

This means the developer doesn’t need to focus on maintaining infrastructure: no scaling up or down, provisioning resources, or any of that—you just focus on your code. There are still servers in serverless, but they are abstracted away from development.

But when do you choose a serverless approach? You need to consider cost, security, reliability, scaling, and more. We’ll dive into the ins-and-outs, pros-and-cons of everything serverless especially as it concerns deploying Python functions.

What is serverless architecture?

Serverless architecture consists of:

  • Business logic
  • Monitoring
  • Security
  • Orchestration

Serverless architecture isn’t just a definition. We need to dive into how serverless architecture works in practice, both with Python functions and beyond.

Types of serverless architecture

When adopting a serverless approach, the choice of cloud provider and serverless product is crucial, as it dictates the level of customization and control you'll have over your runtime environment.

1️⃣ AWS, GCP, Azure serverless functions

This is where serverless began: with functions like AWS Lambda, Google Cloud Functions, and Azure Functions. They offer a simplified, easy-to-use environment where the serverless platform invokes user-created functions in response to events. You can consider these functions just like you’d think about invoking Python functions, just remotely hosted. These services provide environments with pre-installed Python dependencies and the ability to install additional dependencies at runtime, which is convenient for quick deployments. However, they generally do not allow for custom images, limiting the ability to tailor the environment to your specific needs.

2️⃣ Functions with more customization and layers

To offer more flexibility, many serverless vendors offer advanced tiers that permit customizations. For example, AWS Lambda Layers lets add libraries and dependencies to Lambda functions. Azure and Google Cloud have similar features for customizing serverless functions. This lets you bundle additional dependencies ahead of time, reducing the startup time of your serverless functions while keeping the simplicity of the serverless function model.

3️⃣ Serverless containers like ECS, Cloud Run, Azure Containers

For even greater control over your runtime environment, you can opt for container-based serverless services like Amazon ECS, Google Cloud Run, and Azure Container Instances. These services allow you to deploy custom container images, giving you the freedom to include any dependencies and configurations you require at the cost of additional complexity.

Serverless business and workflow logic

Business and workflow logic in serverless Python scripts is encapsulated in functions or containers, which are typically event-driven and executed in response to triggers like HTTP requests, file uploads, or database modifications.

This modular approach makes it easy to develop, test, and deploy individual components of your apps and workflows. Conceptually, this isn't much different than traditional architecture. The code you run in serverless apps is typically similar or the equivalent code in more traditional apps even though the high-level architecture differs.

Monitoring and observability of serverless

Monitoring Python serverless functions means raising and communicating exceptions while validating functions ran when they were supposed to.

For larger applications, splitting workflows into serverless functions or containers has advantages, but it can make end-to-end monitoring more difficult.Cloud-native tools like Amazon CloudWatch, Azure Monitor, and Google Cloud Operations Suite commonly used for monolithic apps also provide metrics, logs, and alerts for serverless apps.

While cloud-native monitoring solutions might be hard to setup and get information out of, many third-party monitoring tools also have excellent serverless support. For instance, with Prefect Cloud, you can trigger serverless functions as you would with any API inside of a workflow, and react to them with automations.

Security with serverless

Security in a serverless environment involves configuring permissions at the function or container level. This ensures that your applications can securely connect to external services like databases or third-party APIs. Additionally, large enterprises consider security crucial for even individual Python functions. This is even more important for end-to-end applications that serverless architecture can power.

Serverless providers offer fine-grained access control via tools like AWS Identity and Access Management (IAM), Google Cloud IAM, and Microsoft Entra. This type of access control gives you the ability to control exactly what services can access your serverless applications, and also which services and data sources your serverless apps can access.

Orchestrating serverless workflows

It's worth noting that in many cases, only certain components of a Python workflow may be suitable for a serverless architecture. This is changing with the advent of serverless-friendly data stores like Amazon Aurora Serverless, which can scale down to zero when not in use, but there are still many cases where serverless isn't the right choice for 100% of a workflow. For instance, sometimes you don’t need actual compute power, only the ability to run a Python job and observe its success.

Serverless can be seamlessly integrated into workflows using orchestration frameworks like Prefect. Orchestration tools allow you to build, observe, and react to workflows across multiple environments making them a valuable tool when beginning a migration to serverless. Orchestration tools enable you to avoid an unwieldy mess of functions that are hard to connect to each other when something fails.

The question remains, when should you actually choose serverless functions or containers?

Pros and cons of serverless architecture

Although serverless has advantages over more traditional infrastructure, it's not the right tool for every use case. This is especially true for Python applications, so let's examine some of the pros and cons data engineers should keep in mind when considering serverless architecture:

Pros of serverless

  • No server management. In traditional architectures, you would need to spend considerable time and resources on provisioning, managing, and scaling servers. Serverless architecture eliminates these tasks, freeing engineers to focus on writing code and implementing pipeline logic. This not only accelerates development but also reduces the chances of human error in infrastructure management.
  • Cost-efficiency. Serverless computing is often more cost-effective than traditional computing models. You are billed only for the actual amount of resources consumed by your functions during their execution, down to the millisecond. This eliminates the cost of idle time, making it particularly cost-efficient for data pipelines that experience variable loads.
  • Native cloud services. Each cloud provider offers a suite of native services designed to integrate seamlessly with their serverless offerings. These services can significantly simplify the process of building complex, event-driven data pipelines. For example, Amazon EventBridge can route custom events to Lambda functions, while GCP Pub/Sub and Azure Service Bus work similarly on their respective platforms.
  • High availability and reliability. Serverless functions are inherently designed for high availability and fault tolerance. They are automatically distributed across multiple availability zones in a cloud provider's data center, ensuring that a failure in one zone doesn't bring down your entire pipeline. This is crucial for data pipelines that require high uptime, but adds complexity.
  • Workflow flexibility. Serverless allows for a modular approach to building data pipelines. You can easily provision different types of infrastructure for different stages of your pipeline, whether it's data ingestion, transformation, or analysis. This makes it easier to update or scale individual components without affecting the entire pipeline.

Cons of serverless

  • Cold starts and latency. One of the drawbacks of serverless is the "cold start" phenomenon. When a function is invoked after being idle, it may take some time to start up, causing latency. This can be problematic for data pipelines that require real-time processing. However, some cloud providers offer ways to mitigate this, such as provisioned concurrency in AWS Lambda.
  • Limited control. Serverless abstracts away most of the underlying infrastructure, which simplifies operations but also limits your control. For instance, you can't always optimize the operating system or choose the type of virtual machine that runs your functions. This may be a limitation for data pipelines that require specialized configurations, but using container-based serverless mitigates some of the downside.
  • Complexity and learning curve. Serverless technologies introduce a new paradigm that requires a different skill set and understanding. The event-driven, stateless nature of serverless functions can be challenging to grasp initially, especially for engineering teams accustomed to traditional, server-based architecture. Additionally, when there are many interdependent serverless functions, they can become hard to maintain and understand for new engineers on the team.
  • Scaling challenges. Serverless functions can auto-scale, but this doesn't mean they are free from scaling challenges. As your workflows grow, the interactions and dependencies between different functions can become complex, requiring careful architectural planning to ensure smooth operation. This is particularly evident when a failure occurs, and you get into debugging mode.
  • Monitoring and debugging. Serverless architectures can make monitoring and debugging more complex, especially as your workflow scales. While cloud providers offer monitoring tools, these may not provide the level of detail needed to debug intricate issues in a multi-stage data pipeline. Third-party monitoring solutions can fill this gap but add to overall complexity. With Prefect Cloud, for instance, failures are put into context and summarized.

So, to server or to serverless?

Considering these nuanced pros and cons puts you in a better position to decide if serverless is the right choice for your data pipeline, and how to effectively implement it if you choose to go this route. Choose serverless if: you need to get started quickly with a few independent functions that need to be highly available.

The pros of serverless are undeniable, which is why there’s one fully reasonable approach which is to simply mitigate the cons at all costs. That is one of the motivations of Prefect: observing Python in any form is just as important as orchestrating it. With Prefect, consume external events like serverless function logs, alert, and act on them.

Prefect makes complex workflows simpler, not harder. TryPrefect Cloud for free for yourself, download ouropen source package, join ourSlack community, ortalk to one of our engineers to learn more.