Back to Quant Hub

DeepSeek-R1-Distill-Llama-70B

70B

DeepSeek

R1 reasoning in Llama 70B architecture. Top open reasoning model for dual-GPU setups.

47.7K HF downloads42 likesbartowski/DeepSeek-R1-Distill-Llama-70B-GGUF· stats from 6/24/2026
Pro GPU

131K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8543.5 GB2.4%36 tok/s
AWQINT4438.2 GB3.5%52 tok/s