Limiting your "blast radius"
Three ways we keep AI tools from going off-script in data-sensitive environments
By: Emily Burak
AI coding assistants can be useful in the modern engineering workflow. They can do junior-level tasks effectively, write tedious boilerplate code, unblock engineers as a "pair programmer," and even catch mistakes and perform functional code reviews. But in data engineering, there's a catch. Your systems aren't just code -- they include sensitive data, access controls, and credentials that can cause real harm if something goes wrong.
A developer's local environment may be able to interact with production data, fire off queries against important data, and modify IAM policies (the rules that control who can access what). While one engineer might not have all of these capabilities, together they are powerful tools that can pose a security and governance threat when AI agents are given access to a developer's environment.
When you give an AI assistant the chance to run shell commands and issue API calls, it can inherit and elevate access through a developer's environment. Constraints are necessary on what an AI assistant can do when working in local development.
It may be helpful at this point to think in terms of blast radius, meaning “How much damage can a failure cause before it's stopped?” When using AI-assisted coding workflows (which is all the rage among engineers and seemingly where the industry as a whole is going, for a while at least), the frame of this blog post, your blast radius comes down to 3 things:
What the AI can read
What commands it can run
What's its authorization status from the perspective of your cloud (or your on-premise systems, if you roll like that) -- who is it, is it the developer running it's personal account, or something else?
Sandboxing assists with modifying the blast radius. It involves narrowing these capabilities and statuses as much as possible, proactively stopping issues in their tracks. Let's talk sandboxing, and how it can look in practice through three layers. You can surely use more, but for the purposes of this blog post, we're going to detail three.
Layer 1: Isolated environments (devcontainers)
First, an "easy" win: stop running AI tooling directly on your host. Use containers (isolated, self-contained software environments) as your development environment, and you can give the AI clean, reproducible environments without root access. Those are some basic qualities of a satisfactory sandbox. Put barriers between it and the host file system, your credentials, or system binaries that it could use to escalate. This is a way to tackle the question of what it can read primarily. Plus, with a well-designed developer sandbox setup, you standardize engineering environments across your team using them.
Layer 2: Impersonation (minimal permissions in an identity)
Even running inside a container, you may have a use case for your AI to issue cloud API calls. This might be unavoidable or just helpful if it's done in a secure way.
Don't run these with personal credentials, route through a dedicated service account. Think of it as a purpose-built robot identity, not a person's login. Letting an agent read schemas or submit certain queries, and avoiding risky operations like deleting data is key to narrowing that scope. This also helps with logging and auditing; if the service account is distinct, it leaves a distinct trail of itself -- and not your credentials being run by it -- through its actions in your systems. Make destructive things structurally impossible at the IAM layer, and you eliminate some gnarly risks.
Layer 3: Hooks / Command interception
With containers and IAM limiting out of the way, you're in a much better place already. Let's go a bit further with the shell that the AI assistant has to have at least some access to, and intercept commands you want to block before they execute. "Claude Code, for example, is an ecosystem already full of helpful safety nets in the form of hooks."
If you want to block attempts to read .env files -- which can happen even if you tell the AI not to, or if it somehow gets around your isolated environment (and AIs are great at getting around barriers to technically complete their tasks) -- hooks are a fitting tool. Hooks are deterministic, without context or intent but with pattern-matching and the ability to block or change commands. Probabilistic systems and blocks, or relying on the AI itself (for example, telling it in an instruction file "don't do bad things, also make no mistakes") lack the hard barriers of a deterministic block. Lay out in code what permissions the AI should have, and don't forget to share that among developers.
Blast Radius Control
These three layers help address many governance concerns involved in AI touching data-sensitive environments, governance concerns that we are very mindful of here at CTA as we move into an agentic era. If you care about data security, credential hygiene, auditing (for when your AI agent needs to survive that audit six months from now), and change control, you can up your game on all of these by provisioning these three layers.
These aren't guarantees, even the deterministic systems can be routed around or failure to account for scenarios and commands you just didn't think of when designing them. Through defense in depth, using a layered approach, you have multiple chances to catch an issue before it becomes a problem. You decrease your risk probability with each layer: isolation, impersonation, interception. That increased and layered security is a great goal for organizations to work towards when adopting agentic workflows.
We use systems like these, and more, here at CTA. We know no system is perfect, but the best time to secure an AI is yesterday, and the second best time is today.
Curious how CTA handles this in practice? We're happy to show you what it looks like, reach out to learn more!