Kubernetes Clusters Running at Just 10% CPU Utilization, Cast AI Report FindsKubernetes Clusters Running at Just 10% CPU Utilization, Cast AI Report Finds
Cloud waste persists in Kubernetes environments, while automation offers both cost savings and operational benefits.

One of the core promises of virtualization is that it helps to promote better utilization of hardware resources.
That promise might not necessarily be entirely accurate when it comes to how many organizations are deploying the Kubernetes container orchestration platform.
Cast AI's 2025 Kubernetes Cost Benchmark Report reveals persistent inefficiencies in cloud resource utilization and provides insights into GPU availability across major cloud providers.
The research, based on data from over 2,100 organizations across AWS, Google Cloud, and Microsoft Azure, offers critical insights for IT operations teams seeking to optimize Kubernetes deployments. The methodology involved analyzing production clusters with at least 50 CPUs, focusing on resource utilization patterns, cost optimization opportunities, and GPU availability across regions and availability zones. The report excludes data from organizations already using Cast AI's automation tools to provide an unbiased view of typical Kubernetes environments.
Key findings include:
Average CPU utilization across Kubernetes clusters has declined to 10%, down from 13% in the previous year's report.
Memory utilization remains suboptimal at 23%, representing only a modest 3% increase year-over-year.
5.7% of containers experienced memory underprovisioning within a 24-hour period, leading to application instability.
Partial implementation of spot instances reduces compute costs by 59%, while full implementation yields 77% savings.
For Laurent Gil, co-founder and president of Cast AI, said the biggest surprise in the report was the occasional underprovisioning of memory. He noted that even though CPU and memory are vastly overprovisioned most of the time, occasional underprovisioning of memory is a real problem — and it is much more common than initially thought.
"Over a 24-hour period, 5.7% of containers exceed their requested memory at some point," Gil told ITPro Today. "This leads to instability, out-of-memory errors, and frequent restarts — significantly more disruption than anticipated — because these workloads simply lack the resources they need to run reliably."
Kubernetes Resource Utilization Remains Problematic
Despite the maturing cloud-native ecosystem, the report reveals persistent inefficiencies in how organizations manage their Kubernetes environments. The decline in average CPU utilization from 13% to 10% indicates that overprovisioning issues are actually worsening, leading to significant cloud waste.
Gil noted that many enterprise organizations are moving to Kubernetes frameworks. In his view, experienced DevOps engineers are not hired fast enough and the lack of talented DevOps engineers results in a decline in efficiency.
The results also reinforce Cast AI's own value proposition, which is to use agentic frameworks to help autonomously manage Kubernetes workloads.
"These AI engines predict and prevent overprovisioning and underprovisioning before they impact performance and cost," he said.
Spot Instances Offer Substantial Cost Savings
The report presents compelling evidence of financial benefits from utilizing spot instances in Kubernetes environments.
Organizations that partially leverage spot instances achieved an average compute cost reduction of 59%, while those running exclusively on spot instances saw an even greater cost reduction of 77%.
These findings suggest that IT teams could significantly reduce their cloud expenditure without sacrificing performance by incorporating spot instance strategies into their Kubernetes deployments.
In Gil's view, managing spot instances should be done autonomously.
"Any manual management of spot instances leads to frequent failures and outages," he said. "The only way to balance cost with reliability is to use smart autonomous automation."
GPU Availability and Cost Optimization
A new addition to this year's benchmark is an analysis of GPU availability and pricing across different cloud providers. The report examines various regions and availability zones to identify where specific GPU chips are most readily available and compares the cost of running workloads on high-demand GPUs.
The research reveals substantial cost variation depending on location. Organizations that can strategically place their workloads in more cost-effective locations can achieve:
2x to 7x savings compared to the average spot instance price worldwide
3x to 10x savings compared to the average on-demand instance price
The Best Automation Techniques to Optimize Kubernetes Costs
According to Gil, rightsizing at each workload level, combined with a Kubernetes autoscaler, is the key to Kubernetes optimization.
"Our findings show that both autoscalers need to work in sync so that as soon as a workload autoscales, it immediately triggers a node autoscale event," he said. "When these two autoscalers are working together in a platform, we see organizations reaping extreme benefits, with costs reduced by 5x in the case of a large car manufacturer."
About the Author
You May Also Like