Custom prediction routines

Custom prediction routines (CPR) lets you build custom containers with preprocessing and postprocessing code, without dealing with the details of setting up an HTTP server or building a container from scratch. You can use preprocessing to normalize and transform the inputs or make calls to external services to get additional data, and use postprocessing to format the model prediction or run business logic.

The following diagram depicts the user workflow both with and without custom prediction routines.

The main differences are:

  • You don't need to write a model server or a Dockerfile. The model server, which is the HTTP server that hosts the model, is provided for you.

  • You can deploy and debug the model locally, speeding up the iteration cycle during development.

Build and deploy a custom container

This section describes how to use CPR to build a custom container with preprocessing and postprocessing logic and deploy to both a local and online endpoint.

Setup

You must have Vertex AI SDK for Python and Docker installed in your environment.

Write custom Predictor

Implement the Predictor interface.

class Predictor(ABC):
    """Interface of the Predictor class for Custom Prediction Routines.
    The Predictor is responsible for the ML logic for processing a prediction request.
    Specifically, the Predictor must define:
    (1) How to load all model artifacts used during prediction into memory.
    (2) The logic that should be executed at predict time.
    When using the default PredictionHandler, the Predictor will be invoked as follows:
      predictor.postprocess(predictor.predict(predictor.preprocess(prediction_input)))
    """

    @abstractmethod
    def load(self, artifacts_uri: str) -> None:
        """Loads the model artifact.
        Args:
            artifacts_uri (str):
                Required. The value of the environment variable AIP_STORAGE_URI.
        """
        pass

    def preprocess(self, prediction_input: Any) -> Any:
        """Preprocesses the prediction input before doing the prediction.
        Args:
            prediction_input (Any):
                Required. The prediction input that needs to be preprocessed.
        Returns:
            The preprocessed prediction input.
        """
        return prediction_input

    @abstractmethod
    def predict(self, instances: Any) -> Any:
        """Performs prediction.
        Args:
            instances (Any):
                Required. The instance(s) used for performing prediction.
        Returns:
            Prediction results.
        """
        pass

    def postprocess(self, prediction_results: Any) -> Any:
        """Postprocesses the prediction results.
        Args:
            prediction_results (Any):
                Required. The prediction results.
        Returns:
            The postprocessed prediction results.
        """
        return prediction_results

For example, see Sklearn's Predictor implementation.

Write custom Handler (optional)

Custom handlers have access to the raw request object, and thus, are useful in rare cases where you need to customize web server related logic, such as supporting additional request and response headers or deserializing non-JSON formatted prediction requests.

Here is a sample notebook that implements both Predictor and Handler.

Although it isn't required, for better code organization and reusability, we recommend that you implement the web server logic in the Handler and the ML logic in the Predictor as shown in the default handler.

Build custom container

Put your custom code and an additional requirements.txt file, if you need to install any packages in your images, in a directory.

Use Vertex AI SDK for Python to build custom containers as follows:

from google.cloud.aiplatform.prediction import LocalModel

# {import your predictor and handler}

local_model = LocalModel.build_cpr_model(
    {PATH_TO_THE_SOURCE_DIR},
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}",
    predictor={PREDICTOR_CLASS},
    handler={HANDLER_CLASS},
    requirements_path={PATH_TO_REQUIREMENTS_TXT},
)

You can inspect the container specification to get useful information such as image URI and environment variables.

local_model.get_serving_container_spec()

Run the container locally (optional)

This step is required only if you want to run and test the container locally which is useful for faster iteration. In the following example, you deploy to a local endpoint and send a prediction request (format for request body).

with local_model.deploy_to_local_endpoint(
    artifact_uri={GCS_PATH_TO_MODEL_ARTIFACTS},
    credential_path={PATH_TO_CREDENTIALS},
) as local_endpoint:
    health_check_response = local_endpoint.run_health_check()
    predict_response = local_endpoint.predict(
        request_file={PATH_TO_INPUT_FILE},
        headers={ANY_NEEDED_HEADERS},
    )

Print out the health check and prediction response.

print(health_check_response, health_check_response.content)
print(predict_response, predict_response.content)

Print out all the container logs.

local_endpoint.print_container_logs(show_all=True)

Upload to Vertex AI Model Registry

Your model will need to access your model artifacts (the files from training), so make sure you've uploaded them to Google Cloud Storage.

Push the image to the Artifact Registry.

local_model.push_image()

Then, upload to Model Registry.

from google.cloud import aiplatform

model = aiplatform.Model.upload(
    local_model=local_model,
    display_name={MODEL_DISPLAY_NAME},
    artifact_uri={GCS_PATH_TO_MODEL_ARTIFACTS},
)

Once your model is uploaded to Model Registry, it may be used to get batch predictions or deployed to a Vertex AI endpoint to get online predictions.

Deploy to Vertex AI endpoint

endpoint = model.deploy(machine_type="n1-standard-4")

Once your model is deployed, you can get online predictions.

Notebook Samples

The samples showcase the different ways you can deploy a model with custom preprocessing and postprocessing using Vertex AI Prediction.