Llama 3.2 11B Vision Instruct

11B

Meta Llama 3.2

Multimodal Llama with image understanding. Vision encoder adds ~2GB VRAM overhead.

Consumer GPUMac / Apple Silicon

131K

Max Context

Quant Variants

GGUF Q8_0

Best Quality

99.5%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	9.5 GB	3.5%	88 tok/s	Calc HF
GGUF	Q8_0	8.5	14.2 GB	0.5%	72 tok/s	Calc HF