Skip to content
APEX SYSTEMS

No products in the cart.

AI & local LLM systems

Your models. Your hardware. No meter running.

Workstations and home servers for developers, researchers, prosumers and homelabs — engineered around what you actually want to run.

Local inference is an engineering problem: parameters, quantization, context and concurrency translate into memory, bandwidth and thermals. We build machines around those numbers — not around marketing tiers.

Guided selection

What do you want to run?

Three answers. We translate them into memory, bandwidth and thermals — and recommend the right class of machine.

Guidance, not gospel: real requirements depend on architecture, KV-cache settings and serving stack. Every recommendation is confirmed by an engineer before you commit.

The honest math

Approximate memory needed for model weights

Model sizeQ4Q5–Q6Q8FP16
13B~8 GB~10 GB~14 GB~26 GB
34B~20 GB~25 GB~36 GB~68 GB
70B~40 GB~50 GB~74 GB~140 GB
120B+~68 GB+~85 GB+~126 GB+~240 GB+

Weights only — add headroom for KV cache (grows with context length and concurrent sessions), the OS and your serving stack. That headroom is exactly what we engineer for.

Edge case? Good.

Unusual models, strict noise limits, rack constraints, multi-node ambitions — tell an engineer what you're building.