1 post with tag #huggingface

Architecture diagram: LLM inference pipeline on OVH Managed Kubernetes Service

vllm llm kubernetes ovh gpu quantization awq openai inference huggingface

LLM Inference on OVH MKS: Models, AWQ, and OpenAI API

Which models fit on a 16 GB GPU, why AWQ is required for 7B+ models on the RTX5000-28, and how to use the OpenAI-compatible API from Python. Part 3 of 6.

2026-06-029 min

LLM Inference on OVH MKS: Models, AWQ, and OpenAI API