Creating Human in the Loop Interactive AI Workflows

Generative AI can create new, diverse, and innovative outputs from its base models - but the output is only as good as the models are. Human in the loop feedback systems can play a pivotal role in refining the models and ensuring that the generated content meets your desired standards. This involves incorporating human judgment into AI systems to improve their accuracy, reliability, and performance.

How do you create such feedback loops? Through interactive workflows that enable seamless collaboration between a mostly automated workflow and human reviewers. In this article, I’ll show how you can use Prefect to create a robust data ingestion flow, improving performance and accuracy in various data ingestion and machine learning tasks by adding human-in-the-loop approval steps easily.

Creating robust workflows with human in the loop hooks

Generative AI applications have gained popularity recently, unlocking new use cases within data science and engineering. Human in the loop workflows enable human interaction to be included within many steps of a machine learning and data pipeline. Human input in an automated Python workflow is unintuitive and at worst - impossible with the wrong tools.

Prefect is a powerful tool that fits into the MLOps ecosystem by providing a robust platform for orchestrating and observing Python workflows. It helps teams to build, run, and monitor data pipelines at scale with ease.

Prefect's design is centered around the concept of "flows" and "tasks.” This architecture enables a plug-and-play deployment system for complex data workflows that can handle dependencies, conditional logic, and stateful operations efficiently. It also includes features for infrastructure management, error handling, retries, notification systems, and event-driven workflows. These features all help ensure that ML pipelines are not only efficient but also resilient to failures.

By leveraging Prefect within MLOps practices, teams can achieve more reliable and seamless automation of their machine learning workflows. This enables faster iteration and delivery of machine learning solutions from data preparation to model training, deployment, and monitoring.

Prefect enables pausing, suspending, and resuming the execution of a flow while providing opportunities for human reviewers to intervene and provide input when necessary. With Prefect, you can set guardrails for these workflow runs in a Pythonic way to quickly deploy code.

Evolving workflows to include human in the loop

Oftentimes, the amount of data you need for any analysis or model can change, even if the input source stays consistent. It’s common to have a simple data ingestion workflow that takes in human input to help clean and manipulate the data.

You can use Prefect to suspend a workflow, accept human input, and resume the flow in seconds. We can use interactive workflows, a new way to involve human interaction within a flow execution.

You can find more examples of interactive workflows in the Prefect documentation. In this example, I’ll show how to generate a dataset of fake user data, clean it on specific features, save this object within Prefect, and conduct some analysis. You can find the entire codebase on Github.

For example, let’s say you want to ask the user for input on the number of users you want in the dataset. You can do this in Prefect by predefining the type of input you’re looking for. Populate the input values and mark the input as optional by the user with the following Prefect code:

class UserInput(RunInput):
	    number_of_users: int  
	    description_md = """How many users would you like to create?"""
    user_input = pause_flow_run(
        wait_for_input=UserInput.with_initial_data(
            description=description_md, number_of_users=2
        )
    )

Prefect generates this command prompt once you try and resume a paused flow run.

You can take this data consolidation stage one step further by offering an interactive step to remove unnecessary features. Additionally, any data refinement steps such as choosing the number of features to drop can be beneficial in cleaning the dataset even further.

Oftentimes, a data pipeline is a continuous cycle of cleaning data, generating insights, and then cleaning the data further. Prefect offers an interactive input that allows for a natural way to interact with your workflows.

Similarly, you can pause the execution and resume once you know which features you want pulled:

features = JSON.load("all-users-json").value
description_md = (
        "## Features available:"
        f"\\n```json{features}\\n```\\n"
        "Which columns would you like to drop?"
    )
 
    user_input = pause_flow_run(
        wait_for_input=CleanedInput.with_initial_data(
            description=description_md, features_to_drop=DEFAULT_FEATURES_TO_DROP
        )
        )

You can move the features to fit the final table pattern you need. Additionally, you can easily remove unnecessary features from the final dataset while the workflow is paused.

From this interaction, you can pull the number of users from this specific API and determine which features we would like to keep in the final dataset.

Persisting the data within Prefect

Now, let me show you how to save this dataset to Prefect so you can visualize the data within the UI.

There are many ways you can persist data within Prefect for later use:

Blocks are a Prefect primitive that enables storing of configuration and provides an interface for interacting with external systems. You can use the JSON block to house the user dataset you’ll use throughout your workflow.
An alternate storage method is Artifacts. Artifacts are persisted outputs such as tables, markdown, or links that make it easy to track data lineage across events and flows.

You can combine both ideas by pulling the JSON block value into the input box to help facilitate a data quality check simply through the UI. Additionally, you can offer a way to create the artifact as a final input from the user. (Ordinarily, data quality checks can be very cumbersome and require a user to juggle many different platforms to ensure the data has come in as expected.)

You can surface the newly created JSON block to the user and ask them whether to create an artifact for it.

To create this prompt, change the input to a toggle button to provide a better UX for interacting with our data ingestion flow.

class CreateArtifact(RunInput):
    create_artifact: bool = Field(description="Would you like to create an artifact?")
...
 
@flow(name="Create Artifact")
def create_artifact():
    description_md = f"""
    Information pulled: {JSON.load("all-users-json")}
    Would you like to create an artifact?
    """
    logger = get_run_logger()
    create_artifact_input = pause_flow_run(
    wait_for_input=CreateArtifact.with_initial_data(
            description=description_md, create_artifact=False
        )
    )
    if create_artifact_input.create_artifact == True:
        logger.info("Report approved! Creating artifact...")
        create_table_artifact(key="name-table", table=JSON.load("all-users-json").value)
    else:
        raise Exception("User did not approve")

Your code can handle whichever toggle the user chooses and populate exceptions upward to the flow level. This enables easy notification on user input and provides a mechanism for managing flow failures.

Artifacts are a centralized way to house markdown, tables, and other data types in a collaborative way. They offer revision history, type information, parent flow name, and other relevant metadata to understand the timeline of this artifact.

Learn more about your data through Marvin

Even after pulling data from various sources and cleaning it based on biases needed for any machine learning job, you can often enrich the dataset with new features based on EDA or other factors.

Marvin is a toolkit that enables using generative AI within your codebase without overhauling a large framework to suit your use cases. Marvin empowers data engineers by bringing tightly-scoped "AI magic" into any traditional software project with just a few extra lines of code.

You can use Marvin in your Prefect-powered human in the loop data pipeline to simplify any data enrichment and updates to your dataset without further cleaning steps. Refer to the Marvin documentation for setup. Additionally, you can find the test files used in this article in my Github repository.

Once set up, use the extract function from Marvin to pull any necessary information from your dataset. You can provide other user inputs if needed.

DEFAULT_EXTRACT_QUERY = (
    "Create a table of a users name, location, coordinates, and continent the user is located"
)
 
@flow(name="Extract User Insights")
def extract_information():
	description_md = f"""
	    The most recent user information: {JSON.load("all-users-json")}
	    What would you like to gain insights on?
	    """
	    user_input = pause_flow_run(
	    wait_for_input=InputQuery.with_initial_data(
	            description=description_md,
	            input_instructions=DEFAULT_EXTRACT_QUERY,  
	    )
	
	    result = marvin.extract(
	        JSON.load("all-users-json"),
	        target=str,
	        instructions=user_input.input_instructions,
	    )
	    logger.info(f"Query results: {result}")
	    return result

Use the extract method to generate insights on the dataset you saved with Prefect. Additionally, you can generate new features, such as continent, based on the location values provided from the original data:

import marvin_extension as ai_functions
...
if __name__ == "__main__":
    list_of_names = create_names()
    create_artifact()
    ai_functions.extract_information()

Here’s what you’ll see as the output:

12:04:08.977 | INFO    | Flow run 'ultramarine-python' - Pausing flow, execution will continue when this flow run is resumed.
12:04:40.776 | INFO    | Flow run 'ultramarine-python' - Resuming flow run execution!
12:04:41.020 | INFO    | Flow run 'ultramarine-python' - 
    Extracting user insights... 
 
    User input: Create a table of a users name, location, coordinates, and continent the user is located
 
12:04:48.394 | INFO    | Flow run 'ultramarine-python' - Query results: ["Name: Ms M\\\\u00e9lina Leclercq, Location: 721 Place de L'Europe, Nice, Haute-Loire, France, Coordinates: Latitude -16.5068, Longitude 74.2098, Continent: Europe", 'Name: Mrs Annie Gordon, Location: 2500 E Pecan St, Medford, Nevada, United States, Coordinates: Latitude 81.6889, Longitude 150.2452, Continent: North America']
12:04:49.179 | INFO    | Flow run 'ultramarine-python' - Finished in state Completed()

Feel free to change the input query and see what else you can gain from this dataset!

Human in the loop with Prefect: A cohesive data match

Human in the loop interactive workflows coupled with Prefect offer a robust and scalable solution to managing complex data pipelines in today's fast-paced digital landscape. By leveraging Prefect's cutting-edge scheduling and automation capabilities within interactive environments, organizations can significantly enhance operational efficiency, reduce manual errors, and foster a culture of data-driven decision-making. This seamless integration not only simplifies the management of intricate workflows. It also empowers developers and data scientists to focus more on strategic tasks rather than being bogged down by the intricacies of workflow orchestration.

By leveraging blocks, Prefect enables the encapsulation of reusable logic, allowing users to build, share, and execute complex workflows with unprecedented ease. This not only streamlines the development process but also fosters a modern approach to workflow design - one that enhances maintainability and enables scaling across infrastructures.

Furthermore, Prefect’s artifacts enrich this ecosystem by offering a mechanism to generate, store, and visualize outputs from various stages of the workflow. This capability ensures that insights derived from data are easily accessible, promoting transparency and aiding in informed decision-making across teams.

The integration of these Prefect features within interactive workflows empowers organizations to navigate the complexities of data orchestration with confidence, driving innovation and operational excellence in an increasingly data-centric world.

Prefect makes complex workflows simpler, not harder. Try Prefect Cloud for free for yourself or talk to one of our engineers to learn more.