1 post with tag autoscaling
LLM Inference on OVH MKS: Prometheus, Grafana, and KEDA
Scrape vLLM and DCGM metrics with kube-prometheus-stack, visualise TTFT and tokens/s in Grafana, and autoscale to zero with KEDA. Part 4 of 6.
· 8 minutes reading time