1 post with tag autoscaling

LLM Inference on OVH MKS: Prometheus, Grafana, and KEDA

Scrape vLLM and DCGM metrics with kube-prometheus-stack, visualise TTFT and tokens/s in Grafana, and autoscale to zero with KEDA. Part 4 of 6.

· 8 minutes reading time