Back to Cookbook
IntermediateDocker 9 min read
Docker: Ollama with NVIDIA GPU Passthrough
Containerised Ollama with GPU access — isolate models, pin versions, and run alongside other services.
DockerOllamaNVIDIAGPUCompose
docker-compose.yml
Requires NVIDIA Container Toolkit on the host. The deploy.resources block requests one GPU. Persist models in a named volume.
yaml
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
volumes:
ollama_data:Run and pull models
Use docker compose (v2) on Linux. On Windows Docker Desktop, enable WSL2 backend and GPU support in settings first.
bash
docker compose up -d
docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama run llama3.1:8b
# API test
curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Hello","stream":false}'Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.