Agentic AI promises you automation on a human scale – systems that can read, decide, and act with minimal supervision. At least in theory.
The pitch is certainly compelling. The risk profile is not. If you give an AI agent tools, memory, and autonomy, then you have effectively hired an enthusiastic junior admin who never sleeps, believes anything it reads, and is entrusted with a corporate credit card. If you do not engineer for betrayal right from day one, you will end up losing data, money, and uptime.
This article is intended as a practitioner’s rundown of the security and operational risks that currently come with agentic AI, and the controls that you need to implement to protect your infrastructure and your business operation.
1. What you are really deploying when you deploy an agent
An agent in a business context basically means deploying an AI-LLM or large language model with three added capabilities:
- Tools – APIs, browsers, databases, shell, ticketing, payments.
- Memory and context – conversation state, vector stores, logs.
- Autonomy – loops, plans, sub-tasks, sometimes multi-agent hand-offs.
That turns a text model into an actor.
Actors change risk dramatically. They can initiate network calls, mutate data, move funds, and make commitments on your behalf. The question is not so much whether the model is clever (actually they are relatively dumb).
The real question is whether you can contain its behaviour when everything around it is untrusted.
2. Why agentic AI changes the risk calculation
Traditional applications follow code paths you wrote. Agent behaviour is stochastic and suggestible. Inputs are not only user prompts – they are also documents, emails, wiki pages, PDFs, websites, logs, and other agents. Treat all of that as hostile. The security problem becomes less about protecting an API and more about protecting a decision-maker that can be socially engineered by content.
The operational problem follows close behind. Because the model is probabilistic, the same prompt can land on different tool calls, latencies, and costs. Without strict guardrails and budgets, you will see denial-of-wallet, runaway recursion, and incidents that are painful to reproduce.
3. Threat model – the main classes of failure
A) Prompt and instruction compromise
- Direct prompt injection – users or attackers type adversarial instructions that cause exfiltration, policy bypass, or unsafe tool use.
- Indirect prompt injection – the agent reads content that tells it how to misbehave. A booby-trapped page or document instructs the agent to dump secrets, rewrite a policy, or open a tunnel.
- Insecure output handling – treating model output as trusted leads to XSS, SSRF, RCE, and SQL injection when you render or execute the agent’s suggestions.
B) Tooling and excessive agency
- Over-privileged tools – a single token with wide IAM scope, filesystem access, or payment authority is an invitation to disaster.
- Unbounded actions – the agent can recurse, spawn, or call paid APIs at scale, driving cost blowouts and side effects that resemble legitimate traffic until it is too late.
C) Retrieval and external data
- RAG infection – retrieval pipelines feed the model with instructions disguised as content. If you do not sanitise and re-write, your knowledge base becomes a command channel.
- Data poisoning – if you fine-tune or learn from user content, malicious inputs can embed backdoors or shift model behaviour in quiet but material ways.
D) Supply chain
- Tainted models and artefacts – unsafe weights, repos, and sample projects can smuggle malware or hidden prompts into your stack.
- Framework drift – agent frameworks evolve quickly. New defaults can change behaviour and policy surfaces without notice.
E) Infrastructure and operations
- Secret sprawl – keys and personal data end up in prompts, traces, vector stores, and logs.
- Observability gaps – many stacks do not log the single thing you most need in incident response: what the agent saw, and why it acted.
- Incident response misfit – traditional IR does not account for non-determinism, context windows, or prompt compromise as a first-class root cause.
F) Legal and governance
- EU AI Act – expect requirements around risk management, logging, post-market monitoring, and incident reporting for higher risk use cases.
- Data protection – GDPR and UK law still apply. If personal data flows into the model or its memory, you need a lawful basis, minimisation, and rights handling. Model behaviour does not excuse weak governance.
4. Concrete failure modes to plan for on day one
- Zero-click compromise via content – an agent tasked with summarising a vendor RFP opens a PDF that includes hidden instructions which cause it to email out internal pricing. No user malice required.
- Wallet drain – a research task triggers 10,000 premium API calls across multiple tools because guardrails did not cap spend or depth. The first alert is the invoice.
- Poisoned knowledge base – user-submitted documents add subtle instructions. Weeks later, an approval agent starts auto-accepting certain vendors. The artefacts are clean. The data is compromised.
- Policy bypass through tool chaining – the agent cannot call payments directly, but it can open a browser, use email, and trigger a webhook. That is enough to route around your intended controls.
If any of these sound far-fetched, you have not red teamed your agent!
5. Controls that actually work
The right mindset is zero trust for content and default deny for capability. The right pattern is a policy gateway in front of every tool, plus strict typing on all inputs and outputs.
A) Architectural guardrails
- Policy engine before tools – do not let the agent call tools directly. Route every call through a deterministic policy layer that enforces allow-lists, argument validation, rate limits, and budgets.
- Human-in-the-loop for high risk – payments, IAM changes, data export, contract signatures. Require a step-up approval with a rendered diff of intent versus arguments.
- Typed I-O – force the agent to emit JSON that matches a schema. Reject on mismatch. Never execute free text.
B) Content and prompt hygiene
- Zero-trust for retrieved content – everything the agent reads is untrusted. Strip markup, code blocks, and instruction-like phrases. Keep only facts.
- System prompt hardening – explicitly forbid following instructions from data. Separate user intent from retrieved content in the context window.
- Guard checks – run lightweight policy checks pre- and post-tool call to catch obvious injection patterns and unsafe actions.
C) Secrets and egress
- Ephemeral scoped credentials – per tool, per tenant, short TTL, rotated with a vault. Never place keys in prompts, context, or user-visible logs.
- Network egress control – browser tools and HTTP clients live in a sandboxed VM or container with an allow-list of domains. Log DNS and outbound connections.
- Budgets and breakers – hard ceilings on tokens, tool invocations, recursion depth, and spend. A kill switch that trips on anomaly.
D) Retrieval and data protection
- RAG sanitisation – re-write retrieved content to a safe intermediate format. Remove links that can trigger navigation. Drop any imperative language that resembles a command.
- Server-side ACLs – enforce document access before retrieval. Do not assume the agent will honour permissions in the prompt.
- Ingestion hygiene – moderate and sign content that enters your corpora. Keep a clean chain of provenance.
E) Supply chain and build
- Model provenance – use signed weights from trusted sources. Scan artefacts before load. Avoid unsafe deserialisers.
- Framework hardening – pin versions, review plugins, and treat configuration rules as code with peer review.
- Continuous evaluation – maintain a bench of red team prompts and adversarial samples. Run them in CI against each new model and tool release.
F) Observability and incident response
- Full-fidelity audit trail – log the system prompt, user input, retrieved content fingerprints, tool calls with arguments, approvals, outputs, and egress.
- AI-aware incident response – playbooks for prompt compromise, model rollback, memory purges, credential rotation, and regulatory notification.
- Cost and latency telemetry – dashboards for spend, recursion depth, and tool hotspots. Early detection beats post-mortem archaeology.
6. A reference architecture that contains risk
You do not need a research lab. You need isolation and policy. A lean pattern looks like this:
- Agent runtime – the LLM and planner. No direct tool access.
- Policy gateway – a service that all tool requests pass through. Validates the action against allow-lists and schemas. Enforces budgets and rate limits.
- Sandboxed tool workers – per capability, running with least privilege. Filesystem and network scoped to the minimum.
- Secure browser microservice – headless browser in a locked-down container or VM, outbound allow-listed, input sanitised, output text-only.
- Retrieval service – applies server-side ACLs, sanitises documents, and returns safe content. No raw HTML.
- Secrets management – a vault issues short-lived credentials per request. The agent never sees long-lived keys.
- Guardrail checks – lightweight classifiers or rules to flag injection, sensitive data movement, and unsafe tool combinations.
- Audit pipeline – structured logs to a central store with tamper resistance and retention aligned to legal obligations.
If any component can see everything or do everything, you have already lost the plot. The point is segmentation.
7. Governance that does not slow you to a crawl
Governance should be crisp and testable:
- Policy as code – the same rules that gate tools are version-controlled and testable. Security signs policies, not marketing copy.
- Risk registers mapped to controls – for each identified threat, list the concrete control and the test that proves it.
- Data protection by design – DPIA where relevant, minimal retention, and deletion paths for prompts, traces, and vector entries that carry personal data.
- EU AI Act readiness – log what the model saw and did, run periodic adversarial evaluations, and stand up a channel for incident reporting. None of that is optional in high risk contexts.
8. A 30-60-90 plan that gets you to safe ground
Day 0 to 30 – Stop the obvious bleeding
- Remove direct tool access. Introduce a policy gateway.
- Add hard budgets on tokens, calls, recursion, and spend.
- Sandbox browsing in a container or VM with egress allow-lists.
- Strip and re-write retrieved content. No raw HTML.
- Start logging prompts, retrieved content hashes, tool calls, and approvals.
Day 31 to 60 – Make it secure
- Move secrets into a vault. Rotate and scope credentials.
- Define typed I-O contracts with JSON schema validation.
- Introduce human approvals for high risk actions with rendered diffs.
- Pin framework versions. Scan models and artefacts on ingest.
- Stand up first AI-aware playbooks for incident response.
Day 61 to 90 – Prove it and scale up
- Establish a red teaming cadence. Maintain a living corpus of tests.
- Add cost, recursion, and egress dashboards with alerts.
- Map your risks to controls and to regulatory expectations.
- Formalise your model registry, provenance records, and release process.
9. Here’s a practical checklist for deployment reviews
- Tools behind a policy gateway – yes or no.
- High risk actions require human approval – yes or no.
- Typed I-O with schema validation – yes or no.
- Browsing and HTTP calls are sandboxed with allow-listed egress – yes or no.
- Retrieval service sanitises content and enforces server-side ACLs – yes or no.
- Secrets are ephemeral and scoped, never present in prompts or logs – yes or no.
- Budgets exist for tokens, calls, recursion, and spend – yes or no.
- Models and dependencies are pinned, scanned, and provenance-tracked – yes or no.
- Full-fidelity audit logging in place and retained to policy – yes or no.
- Red team tests exist and run in CI – yes or no.
- DPIA completed where personal data is involved – yes or no.
- EU AI Act obligations mapped, owners named, and processes defined – yes or no.
If any of the above are a no, do not ship. Fix it or remove the capability.
10. Final position
Agentic AI is not magic. It is automation that writes its own code in real time. That is powerful – but it is also fragile and dangerous.
The single most important mindset is this: if your agent can move money or data, assume it can be tricked into moving the wrong money or the wrong data. You MUST design for containment and verification, not simply rely on trust and hope.
Build a thin, deterministic shell around the AI core. Put budgets on everything. Treat all content as hostile. Log what matters. Red team like you mean it. In that way you can get to enjoy the benefits of automation without paying the ransom of chaos.