Open Source · Edge Deployment · Geek-First

Quantize
Everything.

Bridge the gap between research papers and real-world deployment. Run state-of-the-art LLMs on consumer hardware.

GGUF

AWQ

EXL2

GPTQ

HQQ

& more

10Models Indexed

5Formats Tracked

33GPUs in Database

99.2%Avg Accuracy Retained

Today's Quant Feed

Latest community quantization releases

NEW
Qwen2.5-72B-InstructGGUF
Q4_K_M · 43.6 GB · bartowski
RTX 4090 ×2·2h ago
HOT
DeepSeek-R1-Distill-Qwen-14BEXL2
4.65bpw · 9.8 GB · turboderp
RTX 4090·5h ago
NEW
Llama-3.3-70B-InstructGGUF
Q5_K_M · 50.1 GB · unsloth
A100 80G·8h ago
UPD
Mistral-Small-24B-InstructAWQ
INT4 · 14.2 GB · city96
RTX 3090·12h ago
HOT
Qwen2.5-Coder-32B-InstructGGUF
Q4_K_M · 22.0 GB · bartowski
RTX 4090·18h ago

Community adoption · this week

vs last week

Precise memory requirements for any model × quant × context combination. Red/yellow/green hardware verdict.

Generate ready-to-run llama.cpp, Ollama, vLLM commands. One-liner or Docker Compose.

Compare quantization formats across 6 key dimensions