1 post with tag awq
LLM Inference on OVH MKS: Models, AWQ, and OpenAI API
Which models fit on a 16 GB GPU, why AWQ is required for 7B+ models on the RTX5000-28, and how to use the OpenAI-compatible API from Python. Part 3 of 6.
· 9 minutes reading time