Back to Cookbook
IntermediateServer / VPS 11 min read
Nginx Reverse Proxy for Local LLM APIs
Put Ollama or llama.cpp behind Nginx with TLS, rate limiting, and a stable /v1 endpoint for your apps.
NginxAPITLSOllamallama.cpp
Basic proxy to Ollama
Ollama exposes an OpenAI-compatible /v1/chat/completions endpoint. Proxy it with long timeouts — LLM responses are slow.
nginx
server {
listen 443 ssl;
server_name llm.example.com;
ssl_certificate /etc/letsencrypt/live/llm.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm.example.com/privkey.pem;
location /v1/ {
proxy_pass http://127.0.0.1:11434/v1/;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
client_max_body_size 10m;
}
}Rate limiting
Add a limit_req zone to prevent abuse on a public-facing VPS. Adjust rate for your expected users.
nginx
limit_req_zone $binary_remote_addr zone=llm:10m rate=10r/m;
location /v1/ {
limit_req zone=llm burst=5 nodelay;
proxy_pass http://127.0.0.1:11434/v1/;
proxy_read_timeout 300s;
}Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.