By Kevin Wells · 22 September 2025 · Estimated read time: 12-15 minutes
This article documents a compact, CPU-only Artificial Intelligence (AI) stack deployed on an Intel NUC server.
It is written for IT professionals who want something reliable, maintainable, and security-aware – without handing attackers a treasure map. Identifying details (hostnames, IPs, cert paths) are intentionally generalized.
Why a NUC AI Server?
The Intel NUC is small, quiet, and cheap to run. With 16 GB RAM it will not brute-force 70B models, but it will serve 7B/8B models comfortably for local workflows. The goal here is a workable, production-style setup that respects basic security and good sys ops practice.
Target Architecture
[Client Browser over LAN] │ HTTPS (TLS) ▼ [Reverse Proxy: Apache or Nginx] │ proxy_pass → ▼ [Open WebUI container] │ HTTP (local) ▼ [Ollama container] │ └─[Docker volume: ollama AI engine & LLM models] [Knowledge Base / RAG] Ingested via Open WebUI; attached to a model profile so users do not need to prefix chats with #kb tags.
Sensitive items such as internal DNS names, IPs, and certificate locations are omitted by design.
You’ll need to substitute your own safe values.
Core Components
Host OS
- Ubuntu LTS (any recent stable release). Patches applied. Unnecessary services disabled.
- Docker Engine + Docker Compose v2.
Ollama AI Engine
- Runs LLMs locally; API on port
11434
(container-internal). - Models stored in a dedicated Docker volume to avoid brittle host paths.
- 7B/8B CPU models recommended for 16 GB RAM systems.
Open WebUI
- Front-end for chat, RAG, prompts, and model profiles.
- Configured to talk to the Ollama API over the Docker bridge network.
- Knowledge Base (RAG) enabled and bound at the profile level.
Reverse Proxy (TLS)
- Apache HTTP Server (or Nginx) terminates TLS and proxies to Open WebUI.
- Use a private CA or trusted certificates to avoid noisy browser warnings.
Docker Compose: Baseline
Declare your stack in a single file and keep state in volumes. Redact sensitive values before committing to source control.
# docker-compose.yml (example - customise safely)
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama-data:/root/.ollama
networks: [ ai-net ]
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=http://ollama:11434
# Expose only to reverse proxy, not the whole LAN
networks: [ ai-net ]
networks:
ai-net:
driver: bridge
volumes:
ollama-data:
name: ollama
Reverse Proxy & TLS (Apache)
Terminate TLS at the reverse proxy and forward to Open WebUI. Replace placeholders with your internal DNS name and certificate locations.
<VirtualHost *:443>
ServerName ai.example.internal
# TLS (either private CA issued or a well-known CA)
SSLEngine on
SSLCertificateFile /path/to/certs/ai.example.internal.crt
SSLCertificateKeyFile /path/to/private/ai.example.internal.key
SSLCertificateChainFile /path/to/certs/ca-chain.crt
# Hardened defaults (adjust to your baseline)
Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1
SSLHonorCipherOrder on
ProxyPreserveHost On
ProxyRequests Off
ProxyPass / http://openwebui:8080/
ProxyPassReverse / http://openwebui:8080/
# Optional: block direct file listings, etc.
<Location />
Require all granted
</Location>
</VirtualHost>
Models, Profiles, and Knowledge Base
Model guidance
- RAM budget: 16 GB limits you to 7B/8B models on CPU for sensible latency. Larger models will thrash swap and regress throughput.
- Pull models explicitly and prune unused ones. Keep an offline backup (see below) before removing.
Profiles (also called wrappers)
- Create CPU-tuned profiles in Open WebUI (threads, context size) instead of hard-coding flags per session.
- Maintain clear naming (e.g.,
cpu8b-standard
,cpu8b-rag
) and retire obsolete entries to avoid user confusion.
Knowledge Base (RAG) without chat prefixes
- Convert and upload documents via Open WebUI’s Knowledge feature.
- Open the relevant profile, enable Knowledge, and attach the KB set.
- Save and make that profile the default for your workspace. Users no longer need
#kb
tags.
Operations: Run-Book
Lifecycle
# Bring stack up / down
docker compose up -d
docker compose ps
docker compose logs -f ollama
docker compose logs -f openwebui
Model administration
# List models
ollama list
# Show details for a model
ollama show <model:tag>
# Pull / remove (prune after backup)
ollama pull <model:tag>
ollama rm <model:tag>
Locate the model volume safely
# Identify the volume that stores /root/.ollama (inside the container)
docker inspect -f '{{range .Mounts}}{{if eq .Destination "/root/.ollama"}}{{.Name}}{{end}}{{end}}' ollama
docker volume inspect <that-volume-name> -f '{{.Mountpoint}}'
Backups (safe snapshot)
Stop the model service while copying its blobs to avoid half-written files and index drift.
# 1) Stop Ollama only
docker compose stop ollama
# 2) Copy the volume to external storage (example path)
sudo rsync -aH --info=progress2 \
/var/lib/docker/volumes/ollama/_data/ \
/mnt/backups/llm_models/
# 3) Restart
docker compose start ollama
ollama rm
so internal indices remain consistent.Health checks and troubleshooting
# Is the API listening?
curl -s http://localhost:11434/api/tags | jq .
# Common incident triage
# - Web UI loads but no responses: is ollama up? (compose ps, logs)
# - TLS warnings: client trust not installed for your CA/leaf.
# - Slow responses: you are oversizing the model for the hardware.
Performance Reality (CPU-Only)
- Model size: 7B/8B is the practical ceiling on 16 GB RAM for workable latency.
- Context window: Large contexts multiply CPU time. Use what you need, not what looks impressive.
- Threads: Match physical cores; experiment, do not guess. Measure end-to-end latency, not just token rate.
- RAG: Retrieval and chunking beat brute-forcing bigger models. Clean documents produce cleaner answers.
Security Precautions
- Expose only the reverse proxy on the LAN. Keep Open WebUI and Ollama on an internal Docker network.
- Use a private CA for internal TLS and distribute its root to managed clients.
- Keep Docker and base OS patched. Remove unused images and stale containers.
- Run admin commands as a normal user with
sudo
. Avoid interactive root shells unless necessary. - Segment the host with a local firewall; permit only expected ingress/egress.
- Log and rotate reverse-proxy access logs; avoid logging sensitive prompts verbatim.
- Back up model blobs and knowledge stores; test restore procedures quarterly.
Upgrades and Housekeeping
# Update images deliberately, then roll forward
docker compose pull
docker compose up -d
# Re-check profile → KB bindings after major WebUI upgrades
Keep a CHANGELOG. If something breaks, you want a single source of truth about what changed, when, and why.
Roadmap: Sensible Next Steps
- Profile cleanup: retire duplicate or legacy entries; standardise naming.
- Attach KB by default: ensure the main profile has Knowledge enabled so users do not need chat prefixes.
- Gateway for multi-node: introduce a lightweight gateway (for example, HAProxy) to route requests to the best node as your fleet grows.
- GPU node: add a discrete GPU box and route heavy jobs there while the NUC handles lighter tasks and UI.
Appendix
A. Nginx vhost variant (optional)
server {
listen 443 ssl http2;
server_name ai.example.internal;
ssl_certificate /path/to/certs/ai.example.internal.crt;
ssl_certificate_key /path/to/private/ai.example.internal.key;
location / {
proxy_pass http://openwebui:8080/;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Real-IP $remote_addr;
}
}
B. Quick sanity checklist
- Reverse proxy answers on HTTPS with a certificate the clients trust.
- Open WebUI reachable only via proxy, not exposed on host ports.
- Ollama reachable from WebUI; models pulled and listed.
- Profile created and set as default; Knowledge attached and working.
- Backups exist, restores tested, and pruning policy defined.
Redactions: internal hostnames, IP subnets, certificate paths, and any inventory identifiers have been intentionally generalized to avoid increasing your exposure. Swap your own values in – and keep them out of screenshots and public commits.