Back to Quant Hub

Llama 3.3 70B Instruct

70B

Meta Llama 3.3

Latest Meta 70B with improved multilingual. Drop-in upgrade from Llama 3.1 70B.

31.0K HF downloads121 likesunsloth/Llama-3.3-70B-Instruct-GGUF· stats from 6/24/2026
Pro GPUMac / Apple Silicon

131K

Max Context

3

Quant Variants

GGUF Q5_K_M

Best Quality

99.0%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8543.5 GB2.6%38 tok/s
GGUFQ5_K_M5.6850.1 GB1.0%32 tok/s
AWQINT4438.2 GB3.7%54 tok/s