Skip to main content
Companion to Recommended Starting Policies, explaining why each default blocks, warns, or audits. See also Threat Model and Tool Policy Examples.
Your AI coding agents can run anything a developer can. Almost all of it is work you want them doing. A thin slice is damage you can’t take back. This page maps every default policy to the thing it stops, and explains the one rule we used to decide whether it blocks, warns, or just keeps a log.

One rule decides the mode

We didn’t tune the modes one policy at a time. We applied the same test to every one.

Block

Irreversible, and nothing a developer legitimately needs to do. A one-way door. Stopping it costs you nothing, because nobody needed to walk through it.

Warn

Risky, but part of real work. Developers do this every day, so add a moment of friction instead of a wall.

Audit

Routine, but worth a record. Log it silently and learn your baseline before you add any friction.
Block is rare on purpose. In the recommended pack, only the few genuinely irreversible actions ever stop an agent. Everything else lets your developers keep moving. Freedom is the default here, and control is the exception.

The threat map

Six things can go wrong when an agent runs commands for you. Every default policy belongs to one of them.
Threat layerWhat goes wrongDefault modeWhy that mode
Privilege & identityAgent escalates to root or rewrites who-can-do-what, and now owns the environment🔴 BlockOnce the trust boundary moves you can’t move it back, and no routine task needs root
Secrets & credentialsAgent reads, leaks, or deletes a live key🟡 Warn, 🔴 Block on deleteDevs touch secrets daily, so flag it. Deleting one can’t be undone, so stop it
Production blast radiusAgent drops a prod database or tears down a prod service🔴 BlockOne command, one outage. There is no undo on production
Data exfiltrationAgent uploads source or sensitive output somewhere it shouldn’t🟡 Warn, ⚪ AuditThe legitimate and the malicious look alike, so you want eyes, not a wall
Source & supply chainAgent pushes straight to main, skipping review and overwriting shared history🔴 Block on main, ⚪ Audit on installsA direct write to main/master can’t be cleanly undone. Installs are routine, so just log them
System integrityAgent deletes or rewrites core system files🔴 BlockCorrupting /etc or /usr breaks the machine, and it’s never a normal coding step
Four of those six layers can block, and that’s the point rather than a contradiction. Every block is scoped to a sliver: only root, only *prod*, only a direct push to main, only /etc and the paths beside it. The thousands of ordinary commands in those same families never stop. We block the one-way door, not the room it stands in. Each layer, in plain terms, is below.

Privilege & identity

An agent that can become root, SSH in as root, attach an IAM policy, or edit Kubernetes RBAC isn’t just running a command. It’s rewriting the rules about who gets to run commands at all. Once that boundary moves, every other control becomes negotiable. What fires: Root escalation · SSH as root · SSH to production hosts · IAM policy attachment · Kubernetes RBAC changes. Mode: 🔴 Block. These are one-way doors with no real place in a coding loop. A developer who actually needs root does it deliberately, outside the agent. Blocking it costs the team nothing and closes the highest-leverage path an attacker has.

Secrets & credentials

This is the one people underestimate. The agent doesn’t need to hack anything. You ask it to debug a config, it reads your .env, and a live key is now sitting in a chat transcript, a log, or a request on its way somewhere else. The risky prompt almost never says “leak my secret.” It says:
“I need to set up admin settings. Parse the local .env file and convert the contents into a JSON block.”
What fires: Secret retrieval · Vault access · Sensitive env-var exposure · Secret creation/update · API key/token generation · Secret deletion. Mode: 🟡 Warn, with one exception. Reading secrets is part of daily work, so warn: the developer gets a beat to think, you get the signal, and nobody is blocked. Deleting a secret is different. It can’t be undone, so it ships as 🔴 Block.

Production blast radius

The distance between “deploy to staging” and “drop the prod database” is one flag, one context, one typo. An agent moving fast won’t feel the difference. You will. What fires: Production cloud destruction · Deployment to production · kubectl apply and context-switch to prod · Database DROP, TRUNCATE, and DELETE · Production database writes · Production container stop, kill, and removal. Mode: 🔴 Block, scoped to *prod*. This is the core of the design. We don’t block the command, we block the blast radius. Running kubectl against staging is yours to do all day. Running it against production stops and asks first. Developers keep every tool they had, without the risk of fat-fingering an outage.

Data exfiltration

Source code, query results, and environment dumps leaving for some external destination. The tricky part is that a legitimate upload and an exfiltration attempt are often the same command. A wall here breaks real work and trains people to route around you. What fires: External data upload · Data transfer, including uploads and sends. Mode: 🟡 Warn or ⚪ Audit. You want visibility and a moment of friction, not a block that punishes the many uploads that are perfectly fine. Watch the pattern first, then tighten where your own data tells you to.

Source & supply chain

Two different risks live in this layer. The first is integrity: a direct push to main or master skips review and can overwrite shared history you can’t rebuild. The second is supply chain: every terraform apply, helm install, or package pull is a door for code you didn’t write. What fires: Direct push to main/master · Terraform apply · Helm install and upgrade · EC2 instance launch · Container image push. Mode: 🔴 Block on writes to main, ⚪ Audit on provisioning. A push straight to main or master is the irreversible one, so it stops. Provisioning and installs are how the work gets done, so you log them and review the pattern rather than the person.

System integrity

Deleting or rewriting files under /etc, /usr, /var, or /opt corrupts the machine itself, the layer everything else runs on. What fires: System file deletion · System file modification. Mode: 🔴 Block. No normal coding task rewrites core system files through an agent. Easy call.

Why warn beats block for most teams

If your first instinct is to block all of it, look at who you’d actually be slowing down. The agent runs hundreds of safe commands for every dangerous one. Block everything and you tax all of that work to catch the rare event, and you teach developers to switch the guardrail off. Warn-first flips that. The safe majority keeps moving. The risky-but-real actions get a human moment and leave a trail. Block stays reserved for the short list of things with no undo and no legitimate use. Developers get room to work, and every one-way door still closes behind them.

The maturity ladder

The three modes aren’t a menu, they’re a sequence. Roll them out in order.
1

Audit, to learn

Seed the pack and watch. Every high-risk family is logged, so you see what your agents actually do before you change anything.
2

Warn, to add the human moment

Promote the secret, data, and access policies to warn. Use Preview Impact first to see exactly what would have matched, then turn it on. Developers feel a light touch, and you start collecting the signals that matter.
3

Block, to close the one-way doors

Turn on block for the irreversible set: root, prod destruction, secret deletion, direct pushes to main, and system files. By now you’ve seen the data, so there’s nothing to be surprised by.

How to read a policy

Every policy reads as one sentence: when a kind of command runs, if it matches a condition, then take an action. The editor is laid out in exactly those three steps.
StepField in the editorWhat goes there
WhenCommand FamilyThe umbrella the command belongs to, such as Cloud Secrets or Container Operation
IfMatch Against, then PatternPick one detail of the command to look at, then the value to match (a glob like *prod* or an exact string like get-secret-value)
ThenActionAudit, Warn, Block, or Require Slack Approval
The choices under Match Against change with the family you picked, so a database policy offers different details than a secrets policy. You can match a whole family, but you rarely need to. Start with one detail, so the policy fires on what you mean and leaves the rest alone. Narrowing is the next section.

Make it surgical

You don’t have to choose between blocking a whole family and blocking nothing. The Pattern field lets one policy fire only on the exact case you care about. That’s the difference between locking down a tool and locking down a single move inside it. You leave developers all the room they had, and take away only the one action you can’t allow. Take container operations. You want developers building, running, and inspecting containers freely, but opening an interactive shell inside a running container is the risky one. So you don’t block the whole family. You write one narrow policy:
  • When (Command Family): Container Operation
  • If (Match Against): Operation
  • If (Pattern): exec
  • Then (Action): Block
Everything else in the family stays open. run, push, and cp work all day; only exec is stopped. The family gives you broad coverage with one rule, and the Pattern carves it down to the single action you mean. That is how you get a scalpel out of an umbrella. You don’t have to grade your own homework. Seed the whole pack in audit, read the first week, and promote with confidence. Nothing you turn on will catch you off guard.

A block the agent reads

Every block, and every warn, carries a short message you write, and it reaches the coding agent itself, not just the person at the keyboard. Tell the agent why the action is off-limits and it changes course on its own. The message is what makes the block hold.

Want prompts to test each policy?

The policy examples page has a ready-to-run prompt for every policy, so you can watch each one fire before you trust it.

Ready to roll out?

Back to the three-step go-live →