← ALL POSTS

What is AI Traffic Governance? A Technical Primer

AI traffic governance is the emerging category for controlling the unmanaged flood of LLM requests leaving enterprise endpoints. Here is what it means, why existing tools miss it, and how we are thinking about the architecture.

The pattern is familiar by now. An engineer runs pip install openai, pastes a customer transcript into a prompt to debug a classifier, and the data is gone. Not stolen. Volunteered. Three quarters later, a compliance review asks for an audit of AI usage and finds the log does not exist.

The 2023 Samsung incident is the version of this story that made the news. Employees pasted semiconductor source code and internal meeting notes into ChatGPT. The fallout was a company-wide ban on generative AI, which is the blunt-instrument response most orgs default to when they realize they have no visibility. Bans do not hold. Work routes around them.

AI traffic governance is the category forming to fill that gap. If the term is new to you, that is partly because most vendors are still deciding what to call the thing. This post is how we think about it at Themisto Labs.

A working definition

AI traffic governance is the discipline of observing, classifying, and enforcing policy on every AI request leaving a managed endpoint — regardless of which provider, which model, or whether anyone approved it.

Three parts matter:

  1. Observing. Every request is visible: model, endpoint, prompt, response, token count, tool calls.
  2. Classifying. You know what is in the request. PII, source code, customer records, credentials, financial figures.
  3. Enforcing. You can act: redact, block, log, alert, reroute.

A stack that does the first two but not the third is an observability tool. A stack that does the third but not the first two is a firewall that gets bypassed the first time a developer discovers a new endpoint.

Why existing tools miss it

The reflex, when a security team first confronts this problem, is to reach for something already in the stack. Three categories of tools come up, and each has a specific gap:

DLP was designed for email attachments and file transfers. It inspects files, not the body of a POST /v1/chat/completions that happens to carry a customer's address inside a user-authored prompt. Most endpoint DLP cannot cleanly decrypt outbound TLS without breaking half the SaaS ecosystem it sits next to.

CASBs govern cloud applications by domain. AI usage is not a domain problem anymore. A developer running a LangChain script is hitting eight different hosts in a single session, and the list changes weekly. A local Ollama process does not register in a CASB at all.

API gateways govern your deployed outbound APIs. They do not see the curl a data scientist runs from a Jupyter notebook.

The shared failure is layer. All three sit too far from where the traffic is produced.

Process-level resolution

The central technical claim is that you cannot govern what you cannot attribute. A log entry saying "a request to api.openai.com left the network at 3:47pm" is noise. A log entry saying "the Cursor binary on Sarah's managed laptop, running under her SSO identity, sent a 2,300-token prompt containing two customer email addresses to GPT-4" is governance.

To get the second one, the control plane has to resolve:

  • Who. OS user and the SSO identity mapped to them.
  • What. The specific binary or interpreter, down to a content hash.
  • Where. Managed laptop, CI runner, production container, bastion host.
  • When. With enough session context to reconstruct intent during an incident.

With that resolution, policy stops being "allow or block the model" — which is a bad policy in either direction — and starts being useful: approved users on managed devices get redacted egress, unapproved devices get blocked, and everything produces an audit trail the compliance team can actually use.

Why the transport has to be mTLS

Once a governance proxy is in the path of every AI request, it becomes a high-value target. Two failure modes matter:

  1. A client that bypasses the proxy and talks to the model directly.
  2. A fake proxy that impersonates the real one and reads traffic in plaintext.

Bearer tokens over TLS — the default for most API gateways — are weak against both. Mutual TLS solves them together: the client proves it is the authorized agent on an enrolled device, and the proxy proves it is the real policy engine. Neither side is taking the other's word for it.

My cofounder is writing a more detailed post on the cryptographic design, so I will not go deeper here. The short version: any AI governance product that transports prompts over plain TLS with a bearer token is one leaked token away from replayed traffic, and the token will leak.

Regulatory pressure, concretely

The reason this category exists now, and did not exist eighteen months ago, is that the regulatory floor moved.

  • EU AI Act — entered into force August 2024, with prohibited-use provisions under Article 5 already enforceable. Penalties in Article 99 reach up to €35M or 7% of global annual turnover, whichever is higher, for the most serious violations. Even for less severe violations, the number is meaningful enough that legal departments are paying attention.
  • NIST AI Risk Management Framework — now the default reference point for U.S. federal guidance and a growing number of enterprise security questionnaires.
  • ISO/IEC 42001 — the first AI management system standard. Certification programs are rolling out during 2025 and 2026.

None of these require a specific product. All of them require something you cannot produce with screenshots from a vendor dashboard: an actual inventory of which data went to which model, attributable to a person and a device. That is traffic data. You have to capture it or you cannot report it.

The anatomy of a governed request

Here is what a single prompt looks like under this architecture:

  1. A developer asks a question in their IDE's AI assistant.
  2. The local agent intercepts the outbound request before DNS resolution.
  3. It attaches a device certificate and an SSO-signed user identity.
  4. The request hits the policy engine over mTLS.
  5. The engine classifies the payload and applies policy — redact the AWS key, strip the customer email, allow the rest.
  6. The sanitized request goes to the selected model.
  7. The response returns, is logged with full lineage, and reaches the developer.

The whole round trip adds low tens of milliseconds in steady state. The developer's workflow does not change. The audit trail is complete.

Where Themisto Labs fits

We are building this because the two places other vendors have tried to solve it — the SaaS layer and the network layer — both leak. The SaaS-layer approach only covers tools you have explicitly integrated. The network-layer approach cannot see inside encrypted traffic without breaking it. The defensible place to intercept is the OS, as close to the process as you can get without being inside it.

That is the bet. Upcoming posts will cover the specifics: how we resolve process identity across macOS, Windows, and Linux, how policy compiles into something that runs in the hot path without adding perceptible latency, and how the mTLS design handles rotation and revocation.

If you take one thing from this piece: if you cannot attribute a request to a process on a device, you are not governing anything. You are watching.

THEMISTO LABS

See it. Control it. Protect it.

If anything in this post resonated, we would love to show you how we handle it in practice. 30-minute demo, no slides, real product.

BOOK A DEMO →