Your models. Your hardware. No meter running.
Workstations and home servers for developers, researchers, prosumers and homelabs — engineered around what you actually want to run.
Local inference is an engineering problem: parameters, quantization, context and concurrency translate into memory, bandwidth and thermals. We build machines around those numbers — not around marketing tiers.
What do you want to run?
Three answers. We translate them into memory, bandwidth and thermals — and recommend the right class of machine.
Guidance, not gospel: real requirements depend on architecture, KV-cache settings and serving stack. Every recommendation is confirmed by an engineer before you commit.
Approximate memory needed for model weights
| Model size | Q4 | Q5–Q6 | Q8 | FP16 |
|---|---|---|---|---|
| 13B | ~8 GB | ~10 GB | ~14 GB | ~26 GB |
| 34B | ~20 GB | ~25 GB | ~36 GB | ~68 GB |
| 70B | ~40 GB | ~50 GB | ~74 GB | ~140 GB |
| 120B+ | ~68 GB+ | ~85 GB+ | ~126 GB+ | ~240 GB+ |
Weights only — add headroom for KV cache (grows with context length and concurrent sessions), the OS and your serving stack. That headroom is exactly what we engineer for.
Edge case? Good.
Unusual models, strict noise limits, rack constraints, multi-node ambitions — tell an engineer what you're building.
