Back to Quant Hub

Llama 3.2 11B Vision Instruct

11B

Meta Llama 3.2

Multimodal Llama with image understanding. Vision encoder adds ~2GB VRAM overhead.

Consumer GPUMac / Apple Silicon

131K

Max Context

2

Quant Variants

GGUF Q8_0

Best Quality

99.5%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.859.5 GB3.5%88 tok/s
GGUFQ8_08.514.2 GB0.5%72 tok/s