DeepSeek-R1-Distill-Llama-70B

70B

DeepSeek

R1 reasoning in Llama 70B architecture. Top open reasoning model for dual-GPU setups.

⬇ 47.7K HF downloads♥ 42 likesbartowski/DeepSeek-R1-Distill-Llama-70B-GGUF· stats from 6/24/2026

Pro GPU

131K

Max Context

Quant Variants

GGUF Q4_K_M

Best Quality

97.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	43.5 GB	2.4%	36 tok/s	Calc HF
AWQ	INT4	4	38.2 GB	3.5%	52 tok/s	Calc HF