Back to Quant Hub

Jamba 1.5 Mini

12B

AI21 Labs

Hybrid SSM-Transformer with 256K context. Efficient long-document QA on 16GB.

Consumer GPUMac / Apple Silicon

262K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

96.6%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.858.5 GB3.4%95 tok/s
AWQINT447.5 GB4.8%125 tok/s