Back to Cookbook
IntermediateServer / VPS 8 min read

TabbyAPI: ExLlamaV2 with a Web UI

Wrap ExLlamaV2 in TabbyAPI for a polished OpenAI-compatible server with streaming and model hot-swap.

TabbyAPIExLlamaV2APIEXL2

Install TabbyAPI

TabbyAPI is the most popular ExLlamaV2 server wrapper with a built-in UI.

bash
git clone https://github.com/theroyallab/tabbyAPI
cd tabbyAPI
pip install -r requirements.txt

Configure and run

Place EXL2 models in the models/ directory and start the API server.

bash
python main.py --port 5000
# API: http://localhost:5000/v1/chat/completions
Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.