GPU autoscaling on Kubernetes with KEDA: Building an external scaler
If you run GPU workloads on Kubernetes — vLLM, Triton, training jobs, or the newer agentic inference stacks — you’ve probably hit a familiar problem: the default autoscaling path still reasons about CPU and memory, while...

