Documentation Index
Fetch the complete documentation index at: https://gomodel-docs-providers-restructure.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Benchmark snapshot
This page is a short reference for one public benchmark run comparing GoModel and LiteLLM on OpenAI-compatible traffic. The full article contains the complete write-up, all charts, and the original discussion: GoModel vs LiteLLM Benchmark: Speed, Throughput, and Resource Usage.This benchmark is a point-in-time snapshot published on March 5, 2026. Treat
it as data, not dogma. Gateway performance depends on workload, provider mix,
deployment setup, and tuning.
Visual snapshot

At a glance
In this benchmark run, GoModel came out ahead on the main operational signals most teams care about:- Added latency
- Throughput under concurrency
- CPU overhead
- Memory overhead
Test shape
The comparison used a simple like-for-like setup:- OpenAI-compatible
/v1/chat/completions - The same prompt and request shape on both sides
- Concurrency levels of
1,4, and8 - A focus on clean runs with
0%errors - Metrics including req/s, latency percentiles, CPU usage, and RSS memory
Reference table
| Gateway | Concurrency | Success | Error % | Req/s | p50 ms | p95 ms | p99 ms | CPU avg % | RSS avg MB |
|---|---|---|---|---|---|---|---|---|---|
| GoModel | 1 | 12/12 | 0.00 | 9.61 | 86.4 | 141.1 | 144.4 | 0.81 | 45.4 |
| GoModel | 4 | 12/12 | 0.00 | 44.66 | 56.1 | 139.5 | 139.5 | 0.23 | 46.0 |
| GoModel | 8 | 12/12 | 0.00 | 52.75 | 98.4 | 130.6 | 131.1 | 1.13 | 46.0 |
| LiteLLM | 1 | 12/12 | 0.00 | 8.64 | 96.2 | 190.3 | 213.9 | 9.21 | 320.3 |
| LiteLLM | 4 | 12/12 | 0.00 | 36.82 | 104.7 | 149.5 | 149.5 | 5.20 | 320.8 |
| LiteLLM | 8 | 12/12 | 0.00 | 35.81 | 188.7 | 244.4 | 244.9 | 5.95 | 321.5 |
Key readouts
Some useful reads from that March 5, 2026 run:- Lower p95 latency at every tested concurrency level.
- Higher throughput across the benchmark matrix.
45-46 MBRSS, while LiteLLM stayed near320-321 MB.- Less CPU in these runs.
52.75 req/s versus
LiteLLM at 35.81 req/s.
Reproduce it yourself
All the tooling used in the published benchmark is available in this repository.Prerequisites
- Go 1.26.3+
- Python 3.10+ with
matplotlibandnumpy jq,curl- A Groq API key (or any OpenAI-compatible provider — adjust the script)
litellm[proxy](pip install "litellm[proxy]")
Scripts
The benchmark suite lives indocs/about/benchmark-tools/:
| File | Purpose |
|---|---|
compare.sh | Builds GoModel, starts both gateways, runs the full benchmark matrix, and writes a REPORT.md |
bench_main.go | Source for the bench CLI that sends requests and collects latency + process metrics |
plot_benchmark_charts.py | Generates per-metric charts and a combined dashboard from the JSON results |
Quick start
benchmark-results/ containing
JSON result files, gateway logs, and a REPORT.md with the results table.
Tuning
You can override defaults via environment variables:compare.sh for the full list of knobs.