GoModel can use vLLM through vLLM’s OpenAI-compatible HTTP server. Flow:Documentation Index
Fetch the complete documentation index at: https://gomodel-docs-providers-restructure.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Client -> GoModel -> vLLM
Before you start
- Start vLLM with its OpenAI-compatible server.
- Note the vLLM base URL, including
/v1. - Decide whether vLLM should require an upstream API key.
1. Configure GoModel
For a single keyless vLLM server, Docker env vars are enough:VLLM_API_KEY only when the upstream vLLM server was started with
--api-key:
VLLM_BASE_URL registers provider vllm. VLLM_TEST_BASE_URL registers
provider vllm-test. The suffix is normalized to lowercase and underscores
become hyphens.
Use YAML only when generated names such as vllm-test are not enough or you
need larger structured provider blocks.
These examples assume GoModel runs in Docker and your vLLM server is reachable
from the host at
localhost:8000, so GoModel uses
host.docker.internal:8000 to reach it. If both services run in the same
Docker network, replace host.docker.internal with the vLLM service name.
If GoModel runs directly on the host, use http://localhost:8000/v1.2. Verify the model registry
/models endpoint. Hugging Face
model IDs can contain slashes. With a provider named vllm-test, a vLLM model
such as meta-llama/Llama-3.1-8B-Instruct is exposed through GoModel as:
vllm-test is the provider and meta-llama/Llama-3.1-8B-Instruct remains the
upstream model ID.
3. Send a chat request
4. Use vLLM passthrough
vLLM passthrough is enabled by default. Use it for vLLM-specific endpoints such as/tokenize, /detokenize, /pooling, and /rerank:
VLLM_API_KEY to upstream requests when configured.
Passthrough routes are provider-type scoped at
/p/vllm/.... When you need
to target one named vLLM instance in a multi-provider setup, use translated
/v1/... requests with provider-qualified model IDs such as
vllm-test/meta-llama/Llama-3.1-8B-Instruct.Current support
Integrated:- chat completions
- streaming chat completions
- Responses API
- streaming Responses API
- embeddings
- provider-native passthrough
- native vLLM batch APIs
- OpenAI-compatible files lifecycle
- Responses lifecycle utility endpoints