Local AI

Private AI, on your own hardware.

Cloud AI transmits your data to a third party's servers. Local AI runs on infrastructure you own, on your premises. Nothing leaves the building. We select the model, specify the hardware, handle the installation, and train your team.

Bound by secrecy

If you can't paste it
into ChatGPT.

Wealth managers, fiduciaries, lawyers, doctors — anyone bound by professional secrecy (Art. 321 StGB) — cannot send client data to a cloud model hosted abroad. A model that runs on your own premises gives you the same AI leverage while the confidential data never leaves the building.

Built for regulated Swiss work: Vermögensverwalter · Fiduciaries · Legal · Healthcare · Finance
001 Local vs cloud

Same answers.
Different trust.

Cloud AI is operated by a third party on infrastructure you don't control. Local AI runs on infrastructure you own. For organisations handling sensitive, regulated, or confidential data, where the model runs is the deciding factor.

Dimension
Local · on-prem
Cloud · API
Where data lives
In your building. Always.Never transmitted
On a vendor's servers, often abroad.Cross-border transfer
Cost model
One-time hardware + setup.No per-token bill
Pay per token, every request, forever.Scales with usage
Compliance
Air-gappable. Audit-friendly.Health · legal · finance
Depends on the provider's terms & region.Shared responsibility
Connectivity
Operates fully offline.No internet required
Requires a live connection per request.Dependent on provider uptime
Control
You own the model & the box.No silent changes
Model can change or retire without notice.Vendor's roadmap
Best suited to
Privacy and predictable cost.Steady, sensitive workloads
Access to the absolute frontier model.Variable, non-sensitive tasks
Our recommendation: Frontier cloud models remain the most capable overall. · For confidential data, a capable model you own outperforms a stronger one you rent.
002 Models for on-premise

One default,
a full range.

For most clients our go-to is Qwen 3.6 — frontier-class quality on a single server. Need lighter? We scale down to Gemma. Need the absolute ceiling? We scale up to DeepSeek or Kimi. All open-weight, all yours, with real 4-bit memory figures shown.

Our go-to
Qwen 3.6 Alibaba
Apache 2.0 · unrestricted commercial use

The balanced flagship and strongest open all-rounder. Frontier-class reasoning, coding, and 100+ languages, running on a single high-memory server rather than a cluster. The default we deploy unless your needs push smaller or bigger.

235B-A22B · 4-bit server · 120 GB
235B-A22B · FP16 server · 470 GB
↓ Scale down · smaller
Gemma 3 Google
Gemma terms · commercial use OK

When a full server is overkill. Google's efficient, multimodal open family runs on a laptop or a single accelerator — capable enough for most everyday tasks on hardware you likely already have.

  • 12B Lightweight assistant, 4-bit laptop · 8 GB
  • 27B Best small footprint, 4-bit workstation · 14 GB
↑ Scale up · bigger
DeepSeek · Kimi Heavyweights
MIT · unrestricted commercial use

When you want the absolute ceiling. Trillion-parameter-class models for the highest-stakes reasoning — Kimi K2.5 currently tops the independent leaderboard. These need dedicated server hardware.

  • DeepSeek V3.2 685B, frontier reasoning, 4-bit server · 351 GB
  • Kimi K2.5 1T, leaderboard #1, 4-bit server · 542 GB
Sizing & rankings: 4-bit VRAM and benchmarks per the Onyx Self-Hosted LLM Leaderboard · final model selection is part of the audit.
003 Private install · pricing

Installed,
not rented.

Fixed-price engagements to put a private model into production on your premises. Hardware is procured by you to our specification, so it is yours from day one. Choose the depth of engagement you need.

Tier 01

Setup

Up and running

CHF 3'500
  • We spec the hardware, you procure it
  • Model installed & tuned on your box
  • Private chat interface for your team
  • Runs fully on-premise, offline-ready
  • Workflow audit
  • Team training
  • First agent built
Tier 03

Automate

Adopt + first agent live

CHF 7'500
  • Everything in Adopt, plus:
  • One agent built for your workflow, live in production
Every engagement: Runs on-premise · Open-weight models you own · Hardware to our specification, procured by you · No per-token fees
[ Next step ]

Get in touch now.

Get in touch