↳ Local AI

Private AI, on your own hardware.

Cloud AI transmits your data to a third party's servers. Local AI runs on infrastructure you own, on your premises. Nothing leaves the building. We select the model, specify the hardware, handle the installation, and train your team.

Local vs cloud → See install pricing

↳ Bound by secrecy

If you can't paste it
into ChatGPT.

Wealth managers, fiduciaries, lawyers, doctors — anyone bound by professional secrecy (Art. 321 StGB) — cannot send client data to a cloud model hosted abroad. A model that runs on your own premises gives you the same AI leverage while the confidential data never leaves the building.

Built for regulated Swiss work: Vermögensverwalter · Fiduciaries · Legal · Healthcare · Finance

001 Local vs cloud

Same answers.
Different trust.

Cloud AI is operated by a third party on infrastructure you don't control. Local AI runs on infrastructure you own. For organisations handling sensitive, regulated, or confidential data, where the model runs is the deciding factor.

Dimension

Local · on-prem

Cloud · API

Where data lives

→In your building. Always.Never transmitted

On a vendor's servers, often abroad.Cross-border transfer

Cost model

→One-time hardware + setup.No per-token bill

Pay per token, every request, forever.Scales with usage

Compliance

→Air-gappable. Audit-friendly.Health · legal · finance

Depends on the provider's terms & region.Shared responsibility

Connectivity

→Operates fully offline.No internet required

Requires a live connection per request.Dependent on provider uptime

Control

→You own the model & the box.No silent changes

Model can change or retire without notice.Vendor's roadmap

Best suited to

→Privacy and predictable cost.Steady, sensitive workloads

Access to the absolute frontier model.Variable, non-sensitive tasks

Our recommendation: Frontier cloud models remain the most capable overall. · For confidential data, a capable model you own outperforms a stronger one you rent.

002 Models for on-premise

One default,
a full range.

For most clients our go-to is Qwen 3.6 — frontier-class quality on a single server. Need lighter? We scale down to Gemma. Need the absolute ceiling? We scale up to DeepSeek or Kimi. All open-weight, all yours, with real 4-bit memory figures shown.

Our go-to

Qwen 3.6 Alibaba

Apache 2.0 · unrestricted commercial use

The balanced flagship and strongest open all-rounder. Frontier-class reasoning, coding, and 100+ languages, running on a single high-memory server rather than a cluster. The default we deploy unless your needs push smaller or bigger.

235B-A22B · 4-bit server · 120 GB

235B-A22B · FP16 server · 470 GB

↓ Scale down · smaller

Gemma 3 Google

Gemma terms · commercial use OK

When a full server is overkill. Google's efficient, multimodal open family runs on a laptop or a single accelerator — capable enough for most everyday tasks on hardware you likely already have.

12B Lightweight assistant, 4-bit laptop · 8 GB
27B Best small footprint, 4-bit workstation · 14 GB

↑ Scale up · bigger

DeepSeek · Kimi Heavyweights

MIT · unrestricted commercial use

When you want the absolute ceiling. Trillion-parameter-class models for the highest-stakes reasoning — Kimi K2.5 currently tops the independent leaderboard. These need dedicated server hardware.

DeepSeek V3.2 685B, frontier reasoning, 4-bit server · 351 GB
Kimi K2.5 1T, leaderboard #1, 4-bit server · 542 GB

Sizing & rankings: 4-bit VRAM and benchmarks per the Onyx Self-Hosted LLM Leaderboard · final model selection is part of the audit.

003 Private install · pricing

Installed,
not rented.

Fixed-price engagements to put a private model into production on your premises. Hardware is procured by you to our specification, so it is yours from day one. Choose the depth of engagement you need.

Tier 01

Setup

Up and running

CHF 3'500

We spec the hardware, you procure it
Model installed & tuned on your box
Private chat interface for your team
Runs fully on-premise, offline-ready
Workflow audit
Team training
First agent built

Start with Setup→

Tier 02 Most chosen

Adopt

Setup + audit + training

CHF 5'000

Everything in Setup, plus:
Workflow audit — where to use it
Hands-on training for your team
First agent built

Choose Adopt→

Tier 03

Automate

Adopt + first agent live

CHF 7'500

Everything in Adopt, plus:
One agent built for your workflow, live in production

Go Automate→

Every engagement: Runs on-premise · Open-weight models you own · Hardware to our specification, procured by you · No per-token fees

[ Next step ]

Get in touch now.

hello@mont3.ch Get in touch→