Engineering

Pydantic Enums: An Introduction

March 07, 2024
Jay Allen
Share

In my recent post, I’ve been raving about Pydantic, the most popular package for data validation and coercion in Python. In this article, I’ll dive into how Pydantic’s enum support brings better and more consistent data validation to your apps.

Why you need Pydantic enums

Are you sick of a Status field showing up as either “Fulfilled” or “Fulfillment” depending on who the caller is? If so, you need enum validation.

Let’s expand on this example. Assume you have an online order system. An order’s status can be in one of a finite number of states - let’s say Ordered, Fulfilled, Shipped, and Delivered.

The problem is that order data might come from numerous locations. It could come from multiple apps managed by multiple teams within the company - a Web app, an iOS app, an Android app, etc. You might accept orders from 3rd parties via an API or a data feed of some kind. And you may have to parse and warehouse legacy data for data analysis.

If you only save these values as strings, you’re in for a world of hurt. Without a way to enforce valid order status values, you could end up with label variations (e.g., “Fulfilled”, “Fulfillment Complete”, “FLFLD”) that you have to reconcile. Some teams might add order statuses only supported by their internal workflows.

That’s why languages like Python enable you to define these values as enums. For an order, you can encode a list of valid statuses in Python by creating a new class derived from Python’s IntEnum class.

1from enum import Enum, IntEnum
2
3class OrderStatusEnum(IntEnum):
4    ORDERED=1,
5    FULFILLED=2,
6    SHIPPED=3,
7    DELIVERED=4

You could also use the Enum class to declare these as strings:

1class OrderStatusStrEnum(str, Enum): 
2    ordered="ordered"
3    fulfilled="fulfilled"
4    shipped="shipped"
5    delivered="delivered"

However, these structures aren’t any good if you don’t enforce them. Using Pydantic, you can check and enforce that callers are using the appropriate enum values at runtime and generate an error if the data is incorrect. This puts the responsibility on upstream callers to ensure their data is consistent and correct before you agree to process it.

The role of Pydantic in data validation

I’ve covered this in-depth in my Pydantic overview and my deep dive into data validation with Pydantic. But to recap quickly, Pydantic greatly simplifies data validation by providing several great features:

  • A data model (based on the Pydantic class BaseModel) with code that performs validation and data coercion on your data, based on Python’s support for type hints.
  • A detailed error handling mechanism so you can catch errors at validation time and report them back to your callers.
  • A built-in set of supported types - from simple types like integers to complex types like e-mail addresses and UUIDs - that you can leverage out-of-the-box as opposed to coding yourself.

Pydantic brings a consistent model for data error handling that you can leverage across your team or even across your entire organization. It makes it easy to develop highly reusable validation logic that not only keeps your data clean but makes it easier to read.

How Pydantic enums work

Pydantic enums are straightforward. As per the package’s documentation, Pydantic relies heavily on Python’s built-in enum support.

Let’s see this in action. Say you want to validate the details of a customer’s order using Pydantic. Building off of the code above, you can create a CustomerOrder model derived from Pydantic’s BaseModel that leverages the CustomerOrder enum you defined above. For illustration purposes, we’ll assume you are just supporting three fields in a CustomerOrder for now, customer ID as a UUID, an order date/time, and an order status.

1from enum import Enum, IntEnum
2from pydantic import BaseModel, ValidationError
3from uuid import UUID
4from datetime import datetime
5
6class OrderStatusEnum(IntEnum):
7    ORDERED=1,
8    FULFILLED=2,
9    SHIPPED=3,
10    DELIVERED=4
11
12class CustomerOrder(BaseModel):
13    customer_id: UUID
14    order_dt: datetime
15    status: OrderStatusEnum = OrderStatusEnum.ORDERED

You can define defaults in your model as well. In this case, if an order comes in without a status, you assume it’s a new order and set the status field to ORDERED. You can then use this code to load and validate a new order:

1data = {
2    "customer_id": "02e901a0-a371-4709-8190-6938522fd504",
3    "order_dt": "2032-04-23T10:20:30.400+02:30",
4    "status": OrderStatusEnum.ORDERED
5}
6
7try: 
8    order = CustomerOrder(**data)
9except ValidationError as e:
10    for error in e.errors():
11        print(f'{error["loc"][0]}: {error["input"]} - {error["msg"]}')

Note that you can also supply a straight integer value for status, so long as it’s a defined member of OrderStatusEnum:

1data = {
2    "customer_id": "02e901a0-a371-4709-8190-6938522fd504",
3    "order_dt": "2032-04-23T10:20:30.400+02:30",
4    "status": 1
5}

Since all of your data is in the correct format, running this code won’t generate any errors. But if you use a non-supported value for status, Pydantic will catch it:

Some notes and pitfalls about enums in Python and Pydantic

While enums in Pydantic are handy, there are a few things you should be aware of. First, they’re best used in circumstances like the above, where you have a defined, limited number of choices that you can control.

Second, while a Python Enum is technically a class, there are restrictions on how you can subclass them. Particularly, you can only subclass an Enum that has no members. For example, you can define an Enum with defined methods but no enumerated members, and then define your enumerations in subclasses.

Finally, serialization of enums in Python/Pydantic may not work the way you expect it to. By default, an enum is serialized as a key-value pair that contains the data type, as opposed to just the value. For example, here’s what happens if you deserialize your object above to a dictionary:

1try: 
2    order = CustomerOrder(**data)
3    print(order.model_dump())

Fortunately, you can override this behavior in your Pydantic model as follows:

1from pydantic import BaseModel, ConfigDict, ValidationError
2
3class CustomerOrder(BaseModel):
4    model_config = ConfigDict(use_enum_values=True)
5
6    customer_id: UUID
7    order_dt: datetime
8    status: OrderStatusEnum = OrderStatusEnum.ORDERED

Alternatively, you can export as JSON using Pydantic’s model_dump_json() method, which only returns the value:

1try: 
2    order = CustomerOrder(**data)
3    print(order.model_dump_json())

Using Pydantic and enums with Prefect

Prefect is a workflow orchestration system that enables the observability of critical back-end application logic across your architecture. Using Prefect, you can write complex workflow logic in Python and run it anywhere - from on-premise or virtual servers to Docker containers and serverless methods. You also gain access to advanced workflow management capabilities, including sophisticated retry logic, rich logging, and an easy-to-use observability dashboard.

As I’ve noted before, we love Pydantic at Prefect. We view using Pydantic as a highly recommended best practice all teams should use to ensure data quality. That’s why we’ve built Pydantic into Prefect to enable easy data validation.

Let’s see how you can use Pydantic, Prefect, and the power of enums together. Say you need to create a workflow that runs some verification procedures on an order (e.g., checking for potential fraud) before sending it off for fulfillment. To create this procedure, you build a Prefect flow. Flows in Prefect are containers for workflow logic and enable you to control how your workflow behaves.

To create a flow, first install Prefect on your local system using pip: pip install -U prefect

Next, convert your previous code to run a Prefect flow, which is just a Python method marked with the @flow decorator:

1from enum import Enum, IntEnum
2from pydantic import BaseModel, ConfigDict, ValidationError
3from uuid import UUID
4from datetime import datetime
5from prefect import flow
6
7class OrderStatusEnum(IntEnum):
8    ORDERED=1,
9    FULFILLED=2,
10    SHIPPED=3,
11    DELIVERED=4
12
13class CustomerOrder(BaseModel):
14    customer_id: UUID
15    order_dt: datetime
16    status: OrderStatusEnum = OrderStatusEnum.ORDERED
17
18@flow(log_prints=True)
19def verify_customer(order: CustomerOrder):
20    # Perform order checks
21    return
22
23if __name__ == "__main__":
24    data = {
25        "customer_id": "02e901a0-a371-4709-8190-6938522fd504",
26        "order_dt": "2032-04-23T10:20:30.400+02:30",
27        "status": 1
28    }
29    verify_customer(data)

Let’s break down what’s happening here:

  • Everything up until the verify_customer() function is the same as before, except you’ve added an import for Prefect’s flow class.
  • The verify_customer() method is our flow logic. In this simple example, it’ll run locally on your machine. But you could also configure the flow to run remotely on your own infrastructure or on infrastructure created and managed by Prefect.Your verify_customer() function takes one argument: your CustomerOrder model.
  • The __main__ function that runs your workflow passes in the order as a dictionary object. Prefect serializes this into your Pydantic BaseModel subclass so that your data is validated before your flow logic even runs.

Note that I’m passing the data in hard-coded for the sake of clarity in this example. In a real-world scenario, you’d supply this data via an event parameter that would pass it in dynamically when triggering the flow.

If you run this with the (valid) data above, Prefect runs your workflow without a hitch:

However, say you changed the value of the enumerated order status:

1data = {
2    "customer_id": "02e901a0-a371-4709-8190-6938522fd504",
3    "order_dt": "2032-04-23T10:20:30.400+02:30",
4    "status": 5
5}

Since Prefect instantiates your data as a Pydantic model, your model catches this and errors out before running any other code in your workflow:

Assume you’re debugging a flow failure after the fact. You can use the Prefect CLI’s flow-run command to list your flow runs and see the failed run:

You can then use prefect flow-run inspect to see the exact error as well as the original data that was passed:

You don’t need to use the command line for this, by the way - you can also get an easy-to-read version of the errors by using the Prefect Cloud UI. Navigate to Flow Runs and select the name of your flow:

Consider everything we just got for free. Without Pydantic and Prefect here, you’d miss out on:

  • An easy way to validate enums and ensure data consistency
  • A clean, consistent API for sharing such validation across all teams using Python
  • Rich, extensive logging summarized using AI, generated for you without writing a single additional line of code.

To dig deeper into the power of Prefect with Pydantic, create a Prefect Cloud account and start creating your first workflows today.