Building and maintaining data pipelines can be challenging, especially when dealing with complex data sources, diverse data formats, high data volumes, and changing business requirements. This is why many developers choose Python for data: its flexibility.
It doesn’t stop with just Python. You're also responsible for managing the infrastructure that runs your workflows, including servers, networks, storage, and security, which can be costly, time-consuming, and error-prone. And not to mention complex.
What if you could simplify your approach to infrastructure? You don’t always need maximally robust. What if you could focus on the logic and the value of your code rather than the details of how to run it? This is where serverless architecture comes in.
👉🏼 Definition
Serverless architecture is a deployment pattern in which workflows are hosted as individual pieces of work by a cloud provider.
This means the developer doesn’t need to focus on maintaining infrastructure: no scaling up or down, provisioning resources, or any of that—you just focus on your code. There are still servers in serverless, but they are abstracted away from development.
But when do you choose a serverless approach? You need to consider cost, security, reliability, scaling, and more. We’ll dive into the ins-and-outs, pros-and-cons of everything serverless especially as it concerns deploying Python functions.
Serverless architecture consists of:
Serverless architecture isn’t just a definition. We need to dive into how serverless architecture works in practice, both with Python functions and beyond.
When adopting a serverless approach, the choice of cloud provider and serverless product is crucial, as it dictates the level of customization and control you'll have over your runtime environment.
1️⃣ AWS, GCP, Azure serverless functions
This is where serverless began: with functions like AWS Lambda, Google Cloud Functions, and Azure Functions. They offer a simplified, easy-to-use environment where the serverless platform invokes user-created functions in response to events. You can consider these functions just like you’d think about invoking Python functions, just remotely hosted. These services provide environments with pre-installed Python dependencies and the ability to install additional dependencies at runtime, which is convenient for quick deployments. However, they generally do not allow for custom images, limiting the ability to tailor the environment to your specific needs.
2️⃣ Functions with more customization and layers
To offer more flexibility, many serverless vendors offer advanced tiers that permit customizations. For example, AWS Lambda Layers lets add libraries and dependencies to Lambda functions. Azure and Google Cloud have similar features for customizing serverless functions. This lets you bundle additional dependencies ahead of time, reducing the startup time of your serverless functions while keeping the simplicity of the serverless function model.
3️⃣ Serverless containers like ECS, Cloud Run, Azure Containers
For even greater control over your runtime environment, you can opt for container-based serverless services like Amazon ECS, Google Cloud Run, and Azure Container Instances. These services allow you to deploy custom container images, giving you the freedom to include any dependencies and configurations you require at the cost of additional complexity.
Business and workflow logic in serverless Python scripts is encapsulated in functions or containers, which are typically event-driven and executed in response to triggers like HTTP requests, file uploads, or database modifications.
This modular approach makes it easy to develop, test, and deploy individual components of your apps and workflows. Conceptually, this isn't much different than traditional architecture. The code you run in serverless apps is typically similar or the equivalent code in more traditional apps even though the high-level architecture differs.
Monitoring Python serverless functions means raising and communicating exceptions while validating functions ran when they were supposed to.
For larger applications, splitting workflows into serverless functions or containers has advantages, but it can make end-to-end monitoring more difficult.Cloud-native tools like Amazon CloudWatch, Azure Monitor, and Google Cloud Operations Suite commonly used for monolithic apps also provide metrics, logs, and alerts for serverless apps.
While cloud-native monitoring solutions might be hard to setup and get information out of, many third-party monitoring tools also have excellent serverless support. For instance, with Prefect Cloud, you can trigger serverless functions as you would with any API inside of a workflow, and react to them with automations.
Security in a serverless environment involves configuring permissions at the function or container level. This ensures that your applications can securely connect to external services like databases or third-party APIs. Additionally, large enterprises consider security crucial for even individual Python functions. This is even more important for end-to-end applications that serverless architecture can power.
Serverless providers offer fine-grained access control via tools like AWS Identity and Access Management (IAM), Google Cloud IAM, and Microsoft Entra. This type of access control gives you the ability to control exactly what services can access your serverless applications, and also which services and data sources your serverless apps can access.
It's worth noting that in many cases, only certain components of a Python workflow may be suitable for a serverless architecture. This is changing with the advent of serverless-friendly data stores like Amazon Aurora Serverless, which can scale down to zero when not in use, but there are still many cases where serverless isn't the right choice for 100% of a workflow. For instance, sometimes you don’t need actual compute power, only the ability to run a Python job and observe its success.
Serverless can be seamlessly integrated into workflows using orchestration frameworks like Prefect. Orchestration tools allow you to build, observe, and react to workflows across multiple environments making them a valuable tool when beginning a migration to serverless. Orchestration tools enable you to avoid an unwieldy mess of functions that are hard to connect to each other when something fails.
The question remains, when should you actually choose serverless functions or containers?
Although serverless has advantages over more traditional infrastructure, it's not the right tool for every use case. This is especially true for Python applications, so let's examine some of the pros and cons data engineers should keep in mind when considering serverless architecture:
Considering these nuanced pros and cons puts you in a better position to decide if serverless is the right choice for your data pipeline, and how to effectively implement it if you choose to go this route. Choose serverless if: you need to get started quickly with a few independent functions that need to be highly available.
The pros of serverless are undeniable, which is why there’s one fully reasonable approach which is to simply mitigate the cons at all costs. That is one of the motivations of Prefect: observing Python in any form is just as important as orchestrating it. With Prefect, consume external events like serverless function logs, alert, and act on them.
Prefect makes complex workflows simpler, not harder. TryPrefect Cloud for free for yourself, download ouropen source package, join ourSlack community, ortalk to one of our engineers to learn more.