Servary

Run open-weight LLMs on your own infrastructure.

VM or Kubernetes, one console
Your weights, your hardware
60+ models, day-one support

servary.example.com / deployments

Deployment

Runtime

Environment

Status

How it works

Three steps from a model name on Hugging Face to a stable endpoint your clients can hit. Servary owns everything in between.

1

Register a model

Point Servary at Hugging Face, an S3-compatible bucket, or your private registry. Every revision is content-addressed and reproducible across environments.
2

Pick an environment

Target a single VM for a quick proof, or a managed Kubernetes cluster for production. The control plane is the same; the runner adapts.
3

Ship a stable URL

Servary handles the rollout, the gateway, the audit log, and the lifecycle. Your clients integrate once with a URL that survives pod churn and model upgrades.

Features

The pieces of an LLM deployment, in one place.

One registry, every model

Pull from Hugging Face or any S3 bucket. Every revision is content-addressed and reproducible: the same hash deploys identically across clusters, regions, and rollbacks.
Secrets, scoped and rotated

Encrypted at rest, attached to environments by reference, and rotated without rebuilding deployments. HF tokens, model API keys, and registry credentials never sit in YAML, git, or a teammate's clipboard.
Deployments that stay in sync

Each deployment carries a spec hash; Servary's reconciler keeps the cluster in lockstep. Drift is detected in seconds, fixed automatically, and surfaced in the UI before customers notice.
A live view of every deployment

Pods, logs, events, and metrics on one tab. The pieces you'd otherwise stitch from kubectl, Grafana, and a terminal land where you ship from, with warm-up traces and status changes pushed live.
Endpoints your clients can trust

Every deployment gets a gateway-managed URL that survives pod churn, model upgrades, and environment moves. Your clients integrate once; you replace what's behind the URL whenever you need to.
Runs anywhere you do

Spin up Servary against a single VM for a quick proof, or against a managed Kubernetes cluster for production. The console and the API don't change; only the runner adapts.
Audit-ready by default

Every API call lands in an immutable audit log: who, what, when, which spec hash. Compliance, postmortems, and security reviews get the trail they need with no extra tooling.
And much more…

Per-deployment metrics, multi-runtime support, traffic management, multi-LoRA, scale-to-zero, and cost estimation are all on the way.

Supported models

Any open-weight LLM supported by vLLM or SGLang runs on day one, with more runtimes on the way. The list below is what the team uses in production; the long tail of community models follows the same code path.

Llama 3.3 70B
Llama 4 Scout / Maverick
Qwen 3 0.6B → 235B MoE
DeepSeek V3 / R1
Mistral Small 3.1
Mistral Large 2
Phi-4 14B
Gemma 3 1B → 27B
Command-R+

Need something not listed? Open an issue or message us. Most additions are a runtime config change, not a code change.

Questions

What is Servary, exactly?

Servary is a self-hosted control plane for serving open-weight LLMs on your own infrastructure. Point it at a model registry and a target environment, and it handles the full lifecycle: registration, rollout, a stable gateway, and an audit trail. No SaaS, no proxy, no shared tenancy. Your weights and traffic never leave your network.

Which models can I serve?

Any open-weight LLM supported by the inference runtimes Servary drives today, vLLM and SGLang, with more runtimes on the roadmap. That covers Llama 3.3 / 4, the full Qwen 3 family including the MoE variants, DeepSeek V3 / R1, Mistral Small / Large, Phi-4, Gemma 3, Command-R+, and many more. The supported-models list is updated continuously.

Do I need Kubernetes?

No. Single-VM installs and managed Kubernetes are both first-class targets. The same control plane drives both, so a project can graduate from one to the other without changing tools.

Is Servary open source?

Not on day one, but it is on our roadmap. Servary is self-host by default, so from the start you keep the weights, the traffic, and the audit log. We plan to open up the core over time and will share specifics as we get closer.

When can I use it?

We're finalising the public preview. Click Notify me at the top, and you'll hear from us the day it ships.

Servary

How it works

Register a model

Pick an environment

Ship a stable URL

Features

One registry, every model

Secrets, scoped and rotated

Deployments that stay in sync

A live view of every deployment

Endpoints your clients can trust

Runs anywhere you do

Audit-ready by default

And much more…

Supported models

Questions