Building a Lean, Secure AI Server on an Intel NUC (Docker, Ollama, Open WebUI)

By Kevin Wells · 22 September 2025 · Estimated read time: 12-15 minutes

This article documents a compact, CPU-only Artificial Intelligence (AI) stack deployed on an Intel NUC server.

It is written for IT professionals who want something reliable, maintainable, and security-aware – without handing attackers a treasure map. Identifying details (hostnames, IPs, cert paths) are intentionally generalized.

Why a NUC AI Server?

The Intel NUC is small, quiet, and cheap to run. With 16 GB RAM it will not brute-force 70B models, but it will serve 7B/8B models comfortably for local workflows. The goal here is a workable, production-style setup that respects basic security and good sys ops practice.

Principles: keep the attack surface small, keep state in volumes, prefer declarative config, automate the boring parts, and write the run-book before you forget how anything works.

Target Architecture

[Client Browser over LAN]
        │  HTTPS (TLS)
        ▼
  [Reverse Proxy: Apache or Nginx]
        │  proxy_pass →
        ▼
     [Open WebUI container]
        │  HTTP (local)
        ▼
      [Ollama container]
        │
        └─[Docker volume: ollama AI engine & LLM models]

[Knowledge Base / RAG]
   Ingested via Open WebUI; attached to a model profile so
   users do not need to prefix chats with #kb tags.

Sensitive items such as internal DNS names, IPs, and certificate locations are omitted by design.

You’ll need to substitute your own safe values.

Core Components

Host OS

Ubuntu LTS (any recent stable release). Patches applied. Unnecessary services disabled.
Docker Engine + Docker Compose v2.

Ollama AI Engine

Runs LLMs locally; API on port 11434 (container-internal).
Models stored in a dedicated Docker volume to avoid brittle host paths.
7B/8B CPU models recommended for 16 GB RAM systems.

Open WebUI

Front-end for chat, RAG, prompts, and model profiles.
Configured to talk to the Ollama API over the Docker bridge network.
Knowledge Base (RAG) enabled and bound at the profile level.

Reverse Proxy (TLS)

Apache HTTP Server (or Nginx) terminates TLS and proxies to Open WebUI.
Use a private CA or trusted certificates to avoid noisy browser warnings.

Docker Compose: Baseline

Declare your stack in a single file and keep state in volumes. Redact sensitive values before committing to source control.

# docker-compose.yml (example - customise safely)
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    networks: [ ai-net ]

  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    # Expose only to reverse proxy, not the whole LAN
    networks: [ ai-net ]

networks:
  ai-net:
    driver: bridge

volumes:
  ollama-data:
    name: ollama

Security note: avoid binding Open WebUI directly on a host port. Keep it on the Docker bridge and reach it only through the reverse proxy.

Reverse Proxy & TLS (Apache)

Terminate TLS at the reverse proxy and forward to Open WebUI. Replace placeholders with your internal DNS name and certificate locations.

<VirtualHost *:443>
  ServerName ai.example.internal

  # TLS (either private CA issued or a well-known CA)
  SSLEngine on
  SSLCertificateFile    /path/to/certs/ai.example.internal.crt
  SSLCertificateKeyFile /path/to/private/ai.example.internal.key
  SSLCertificateChainFile /path/to/certs/ca-chain.crt

  # Hardened defaults (adjust to your baseline)
  Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
  SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1
  SSLHonorCipherOrder on

  ProxyPreserveHost On
  ProxyRequests Off
  ProxyPass        / http://openwebui:8080/
  ProxyPassReverse / http://openwebui:8080/

  # Optional: block direct file listings, etc.
  <Location />
    Require all granted
  </Location>
</VirtualHost>

Trust model: For internal deployments, a small private CA is cleaner than pin-trusting a self-signed leaf on every client. Distribute the CA root to managed endpoints and rotate leaves normally.

Models, Profiles, and Knowledge Base

Model guidance

RAM budget: 16 GB limits you to 7B/8B models on CPU for sensible latency. Larger models will thrash swap and regress throughput.
Pull models explicitly and prune unused ones. Keep an offline backup (see below) before removing.

Profiles (also called wrappers)

Create CPU-tuned profiles in Open WebUI (threads, context size) instead of hard-coding flags per session.
Maintain clear naming (e.g., cpu8b-standard, cpu8b-rag) and retire obsolete entries to avoid user confusion.

Knowledge Base (RAG) without chat prefixes

Convert and upload documents via Open WebUI’s Knowledge feature.
Open the relevant profile, enable Knowledge, and attach the KB set.
Save and make that profile the default for your workspace. Users no longer need #kb tags.

Operations: Run-Book

Lifecycle

# Bring stack up / down
docker compose up -d
docker compose ps

docker compose logs -f ollama
docker compose logs -f openwebui

Model administration

# List models
ollama list

# Show details for a model
ollama show <model:tag>

# Pull / remove (prune after backup)
ollama pull <model:tag>
ollama rm   <model:tag>

Locate the model volume safely

# Identify the volume that stores /root/.ollama (inside the container)
docker inspect -f '{{range .Mounts}}{{if eq .Destination "/root/.ollama"}}{{.Name}}{{end}}{{end}}' ollama

docker volume inspect <that-volume-name> -f '{{.Mountpoint}}'

Backups (safe snapshot)

Stop the model service while copying its blobs to avoid half-written files and index drift.

# 1) Stop Ollama only
docker compose stop ollama

# 2) Copy the volume to external storage (example path)
sudo rsync -aH --info=progress2 \
  /var/lib/docker/volumes/ollama/_data/ \
  /mnt/backups/llm_models/

# 3) Restart
docker compose start ollama

Do not hand-delete files inside the Docker volume. Use ollama rm so internal indices remain consistent.

Health checks and troubleshooting

# Is the API listening?
curl -s http://localhost:11434/api/tags | jq .

# Common incident triage
# - Web UI loads but no responses: is ollama up? (compose ps, logs)
# - TLS warnings: client trust not installed for your CA/leaf.
# - Slow responses: you are oversizing the model for the hardware.

Performance Reality (CPU-Only)

Model size: 7B/8B is the practical ceiling on 16 GB RAM for workable latency.
Context window: Large contexts multiply CPU time. Use what you need, not what looks impressive.
Threads: Match physical cores; experiment, do not guess. Measure end-to-end latency, not just token rate.
RAG: Retrieval and chunking beat brute-forcing bigger models. Clean documents produce cleaner answers.

If you need a step-change, add RAM and/or introduce a GPU node on the network, then route requests with a gateway (HAProxy/Traefik). No amount of wishful flags will turn a NUC into a datacenter.

Security Precautions

Expose only the reverse proxy on the LAN. Keep Open WebUI and Ollama on an internal Docker network.
Use a private CA for internal TLS and distribute its root to managed clients.
Keep Docker and base OS patched. Remove unused images and stale containers.
Run admin commands as a normal user with sudo. Avoid interactive root shells unless necessary.
Segment the host with a local firewall; permit only expected ingress/egress.
Log and rotate reverse-proxy access logs; avoid logging sensitive prompts verbatim.
Back up model blobs and knowledge stores; test restore procedures quarterly.

Upgrades and Housekeeping

# Update images deliberately, then roll forward
docker compose pull
docker compose up -d

# Re-check profile → KB bindings after major WebUI upgrades

Keep a CHANGELOG. If something breaks, you want a single source of truth about what changed, when, and why.

Roadmap: Sensible Next Steps

Profile cleanup: retire duplicate or legacy entries; standardise naming.
Attach KB by default: ensure the main profile has Knowledge enabled so users do not need chat prefixes.
Gateway for multi-node: introduce a lightweight gateway (for example, HAProxy) to route requests to the best node as your fleet grows.
GPU node: add a discrete GPU box and route heavy jobs there while the NUC handles lighter tasks and UI.

Appendix

A. Nginx vhost variant (optional)

server {
  listen 443 ssl http2;
  server_name ai.example.internal;

  ssl_certificate     /path/to/certs/ai.example.internal.crt;
  ssl_certificate_key /path/to/private/ai.example.internal.key;

  location / {
    proxy_pass http://openwebui:8080/;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

B. Quick sanity checklist

Reverse proxy answers on HTTPS with a certificate the clients trust.
Open WebUI reachable only via proxy, not exposed on host ports.
Ollama reachable from WebUI; models pulled and listed.
Profile created and set as default; Knowledge attached and working.
Backups exist, restores tested, and pruning policy defined.

Redactions: internal hostnames, IP subnets, certificate paths, and any inventory identifiers have been intentionally generalized to avoid increasing your exposure. Swap your own values in – and keep them out of screenshots and public commits.