Shadow AI: The Problem Security Teams Are Quietly Dealing With

Shadow AI is the set of AI tools and workflows employees use without the security team's knowledge, approval, or instrumentation. It is the direct descendant of shadow IT, except it moves faster because there is no procurement step. A browser tab is enough.

The structural shape in a typical company looks something like this. A subset of engineers are paying for personal ChatGPT Plus and pasting code into it. Product managers are routing through Claude, Poe, Perplexity, and a handful of wrappers built on top. A sales team has quietly standardized on an AI presentation tool that nobody has signed a DPA with. Two or three data scientists are running local models on laptops with company data mounted. Most of this is productive. None of it is visible.

The reason it matters more than shadow SaaS did is that the blast radius is different in kind, not degree.

The three structural reasons shadow AI is worse

Data leaves permanently. A prompt containing PII is sitting in a vendor's retention log you have no contract with. Depending on the provider tier, retention windows range from 30 days to indefinite. You cannot recall it. Detection has to happen before the paste, not after.

Models can memorize. Even without explicit training on your data, providers often retain prompts for safety review, abuse detection, or evaluation. Samsung's 2023 incident is the textbook case: engineers pasted semiconductor source code into ChatGPT; Samsung then banned generative AI internally and, reportedly, moved to build an internal model. The ban is the tell. When the response to a leak is "ban the category," the control gap is real.

One prompt exfiltrates more than a year of email could. A developer asking an assistant to "summarize this architecture document" hands over the entire architecture document. The analog in the shadow SaaS era would be printing the document, faxing it to a stranger, and asking for a summary back.

Network-level detection does not help, because every one of these tools speaks TLS to a generic API endpoint. api.openai.com in your egress logs tells you nothing about who sent the request, what was in it, or whether the data in the payload was regulated.

Why the obvious responses do not work

When a security team first tries to address shadow AI, it cycles through three options:

Blanket blocking. Firewall rules against major LLM domains. Survives about a week before engineering escalates, and never covers the long tail of tools nobody remembered to list. Blocking also pushes usage further underground.

Policy-based trust. Write a policy, have everyone sign it, call it governance. This is a "no diving" sign at a pool with no lifeguard. Necessary, not sufficient.

Approved-tool programs. "You can use Copilot, nothing else." This handles the tools you named. It does not touch the workflows already running, and it changes the incentive for employees: if the honest answer gets them in trouble, the answer becomes "nothing, I promise."

The thing that actually moves the needle is a technical control that is invisible when the request is fine and firm when it is not. Everything else is documentation.

The bar for a working control

A governance layer that closes the shadow AI gap needs three properties. Not two. Not a different three. These three:

Coverage without a known endpoint list. If the tool's value depends on you having a current map of which AI endpoints exist, the tool is always one week behind reality. The control has to be agnostic to the destination, observing by traffic shape rather than by allowlist.

Resolution to the person and the device. "Someone on the VPN sent a prompt" is useless. "Sarah's managed laptop sent a prompt containing a customer SSN" is actionable. The log has to be specific enough that the response is a conversation with Sarah, not a company-wide bulletin.

Interception at the layer closest to the source. Network-layer interception breaks too much TLS. Browser extensions miss non-browser traffic. SaaS integrations miss local tools. The OS layer is the only place that sees everything without breaking the things users care about.

Where we would start

If we were advising a security team walking into this fresh, the first move is visibility, not control. Not policy. Not blocking. Not procurement. Just visibility, captured cleanly enough that a conversation with leadership or legal is possible.

Once the traffic is visible, two things usually happen. The first is that the scope of usage is larger than expected, often by an order of magnitude. The second is that the shape of the risk becomes specific enough to do something about: a particular team pasting a particular kind of data into a particular tool. From there, controls are targeted rather than company-wide.

The broader point: every regulatory and board-level push we are watching (EU AI Act Article 99 penalties, NIST AI RMF adoption, SOC 2 expansions, ISO 42001) requires producing an inventory of what data went to which model. That inventory does not exist until the traffic is captured. Capture is the precondition for every serious AI governance program, and almost no company has it yet.

That is the problem Themisto Labs is built to solve. The architecture is covered in other posts on this blog. The short version: an on-device agent, mTLS to a policy engine, process-level attribution, and classification at egress. If anything in this post describes a situation you are already in, the demo is 30 minutes and the product does what this post argues should exist.