LLM Inference on OVH MKS: LiteLLM API Gateway

LiteLLM gateway on top of vLLM: per-user API keys, budget limits, and automatic fallback to commercial APIs when the local GPU node is cold. Part 6 of 6.

· 8 minutes reading time

LLM Inference on OVH MKS: Prometheus, Grafana, and KEDA

Scrape vLLM and DCGM metrics with kube-prometheus-stack, visualise TTFT and tokens/s in Grafana, and autoscale to zero with KEDA. Part 4 of 6.

· 8 minutes reading time

LLM Inference on OVH MKS: Models, AWQ, and OpenAI API

Which models fit on a 16 GB GPU, why AWQ is required for 7B+ models on the RTX5000-28, and how to use the OpenAI-compatible API from Python. Part 3 of 6.

· 9 minutes reading time

SigNoz on OVH MKS: Metrics, Traces & Logs with Istio Ambient

Send Istio metrics, OTLP traces, and pod logs to SigNoz on OVH MKS. Verify ClickHouse S3 cold-tier and estimate 7-year storage costs. Part 2 of 3.

· 9 minutes reading time

SigNoz on OVH MKS: Access Log Reports with Vector and ClickHouse

Collect Envoy access logs with Vector into ClickHouse, generate monthly awffull reports, and serve them via Istio ExternalName route to S3. Part 3 of 3.

· 8 minutes reading time