Back to Quant Hub

Zephyr 7B Beta

7B

HuggingFaceH4

DPO-aligned Mistral 7B. Classic choice for helpful, harmless chat baselines.

Consumer GPUMac / Apple SiliconCPU / VPS

33K

Max Context

2

Quant Variants

GGUF Q6_K

Best Quality

99.0%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.855.2 GB3.3%155 tok/s
GGUFQ6_K6.566.8 GB1.0%132 tok/s