This multipart series will dive into the crucial aspects of making Generative AI (GenAI) applications robust and ready for real-world use. API Management plays a central role in controlling access to LLMs, ensuring security, and managing performance within a GenAI ecosystem.
Modern platforms thrive on delivering lightning-fast, personalized experiences. For ecommerce features like search and recommendations, Google Cloud's Vertex AI provides powerful machine learning capabilities. But to handle demanding production workloads, seamlessly scale, and ensure high availability, you need a robust API management solution. It is a nightmare for platform administrators to keep away abusive users querying a model. This is where the API Management platform can help.
GCP Load Balancers and Cloud Armor provide a strong foundation for Vertex AI endpoint network access and WAF security, but this alone might not be sufficient for comprehensive defense.
Why Load Balancer and Cloud Armor alone are not enough to protect Vertex AI endpoints?
Lets recap what GCP Load Balancer and Cloud Armor do:
The Gaps: Why It's Not Enough
Apigee is the API Management solution that complements GCP Load Balancer and Cloud Armor, adding essential layers.
2. Apigee as a Load Balancer for Vertex AI Endpoints
Let's see how this works:
Code Sample: Vertex AI Load Balancing (Target Endpoint)
<TargetEndpoint name="vertex-endpoint-us-east1">
<HTTPTargetConnection>
<URL>https://us-east1-my-project.cloud.google.com/aiplatform/v1/endpoints/1234567890:predict</URL>
</HTTPTargetConnection>
</TargetEndpoint>
<TargetEndpoint name="vertex-endpoint-us-west1">
<HTTPTargetConnection>
<URL>https://us-west1-my-project.cloud.google.com/aiplatform/v1/endpoints/1234567890:predict</URL>
</HTTPTargetConnection>
</TargetEndpoint>
<RouteRule name="default">
<TargetEndpoint>vertex-endpoint-us-east1</TargetEndpoint>
<TargetEndpoint>vertex-endpoint-us-west1</TargetEndpoint>
</RouteRule>
This Apigee Target Endpoint definition routes requests across Vertex AI endpoints in different regions (e.g., 'us-east1' and 'us-west1') with a round-robin algorithm. Apigee handles health checks and failover, ensuring your users always receive responses from available Vertex AI endpoints.
Apigee's Health Check and Failover Process
Let's explore how Apigee incorporates health checks and failover mechanisms, and how to reflect those in code.
Within your Target Endpoint configuration, you'd include a HealthMonitor section.
Code Sample: Health Monitor
<TargetEndpoint name="vertex-predictions">
...
<HealthMonitor>
<IsEnabled>true</IsEnabled>
<IntervalInSec>10</IntervalInSec>
<TimeoutInSec>5</TimeoutInSec>
</HealthMonitor>
</TargetEndpoint>
Apigee supports more advanced health checks:
Apigee's health checks work seamlessly with its load-balancing mechanisms. You can view real-time health status within the Apigee UI or integrate health check data with external monitoring tools.
Limitations
It's important to understand that Apigee's health checks may not immediately detect failures deep within your Vertex AI model. It's crucial to have robust monitoring of your Vertex AI model itself for more granular health assessments.
A sluggish experience can lead to frustrated users and lost sales. That's where prefetching and predictive search, powered by Apigee caching and Vertex AI, will help. By anticipating user queries and caching results, you can dramatically boost the responsiveness of your queries.
The Prefetching and Predictive Search Workflow
Code Sample: Apigee's caching policies
<TargetEndpoint name="search-predictions">
... <HTTPTargetConnection> ... </HTTPTargetConnection>
<PreFlow name="PreFlow">
<Request>
<Step>
<Name>CacheLookup</Name>
<KeyFragment ref="request.queryparam.q"/>
</Step>
</Request>
<Response/>
</PreFlow>
<PostFlow name="PostFlow">
<Request/>
<Response>
<Step>
<Name>CachePopulate</Name>
<KeyFragment ref="request.queryparam.q"/>
</Step>
</Response>
</PostFlow>
</TargetEndpoint>
In the code snippet above:
Benefits of Apigee Caching with Vertex AI
Apigee Developer Portal is a game-changer for managing and consuming Vertex AI endpoints. It streamlines processes, fosters collaboration, and drives innovation within your organization.
The Problem with Non-Self Service Approaches
How an Apigee Developer Portal Helps
The Best Defense is Layered. Think of it this way:
With Apigee as your API management layer and Vertex AI handling the machine learning intelligence, you build a platform designed for scale, performance, and the personalized experiences that drive customer satisfaction and loyalty.
Ready to unlock the full potential of Apigee and Vertex AI in your environment? Explore Google Cloud's documentation and experiment with Apigee’s capabilities that best suit your unique business needs.
Hi everyone,
The post mentions Apigee's ability to load balance between Vertex AI backends in multiple regions. While the concept is accurate, the provided code sample seems to have a limitation: The RouteRule can only specify one TargetEndpoint at a time. Even if we list multiple endpoints, only one will be chosen.
The suggestion to use separate Target Servers for each region with a load balancer seems promising. However, the Vertex AI API URL requires specifying the region in both the hostname (us-east1-aiplatform.googleapis.com) and resource path (/locations/us-east1).
Unfortunately, Apigee doesn't allow dynamic modification of the region within the URL based on the chosen load balancer target (access to loadbalancing.targetserver variable is limited to the response flow).
Given this limitation, what are alternative approaches for achieving load balancing across regions with Vertex AI backends in Apigee while making use of other benefits such as backend health checks?