Back to Cookbook
BeginnerMac / Apple 6 min read
Mac M3 Max: The Ultimate Local LLM Setup
Maximise your Apple Silicon with Ollama. Run multiple models, set up an OpenAI-compatible API, and tune Metal GPU layers.
OllamaMacApple SiliconMetalGGUF
Install Ollama
One command install. Ollama automatically detects your Apple Silicon and uses Metal GPU acceleration.
bash
curl -fsSL https://ollama.com/install.sh | shPull and run a model
Ollama defaults to Q4_K_M GGUF which is ideal for most use cases.
bash
# Pull Llama 3.1 8B (default Q4_K_M, ~5.7 GB)
ollama pull llama3.1:8b
# Or pull Qwen2.5 7B for better multilingual
ollama pull qwen2.5:7b
# Run interactively
ollama run llama3.1:8bOpenAI-compatible API
Ollama exposes an OpenAI-compatible API at port 11434 — drop-in replacement for any app using OpenAI SDK.
bash
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"Hello!"}]}'Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.