● Insights

How to Build a Pre-Action Gate for Your AI Agent (With Starter Code)

Most guides on AI agent safety tell you to write better prompts. Add a system instruction that says “be careful.” Maybe include “always verify before acting” somewhere in the rules.

That works until it doesn’t. I’ve watched an AI system follow safety instructions perfectly for 150 messages, then quietly ignore them after context compression wiped the rules from its working memory. Prompts are suggestions. Gates are architecture.

This tutorial shows you how to build a pre-action gate: a mechanical check that fires before your AI agent executes any tool call. The gate inspects the action, applies your rules, and either allows it through or blocks it before anything happens. No amount of prompt drift can bypass it because it runs outside the model’s context.

I’ve built and tested these gates in production with Claude and Gemini. The pattern is model-agnostic by design since it runs outside the model, so it should work with whatever you’re using.

Every gate in this tutorial comes from a real production failure. I didn’t design these in advance. I built them after something went wrong.

What Is a Pre-Action Gate?

A pre-action gate is a script that sits between your AI agent’s decision and its execution. When the agent decides to run a command, edit a file, or call an API, the gate intercepts the request, inspects it, and makes a binary decision: allow or block.

The key distinction: the gate runs as code, not as a prompt instruction. The AI model never sees it, can’t reason about it, and can’t talk its way past it. If the check fails, execution stops. Period.

Agent decides to act
        ↓
   Pre-action gate
   (your Python script)
        ↓
   Pass? → Execute the action
   Fail? → Block + return error to agent

This pattern works with any AI agent framework. Some tools have native hook support for it. Custom setups can implement the same contract with a thin wrapper.

The Gate Contract

Every pre-action gate follows the same interface:

Input: JSON on stdin describing the intended action

{
  "tool_name": "Bash",
  "tool_input": {
    "command": "sf project deploy start --target-org production"
  }
}

Output: Exit code determines the outcome

  • exit 0 = allow the action
  • exit 2 = block (stderr message tells the agent why)

That’s it. The gate is a standalone script. It doesn’t import your agent framework. It doesn’t depend on your LLM provider. It reads JSON, runs checks, returns a verdict.

Gate 1: The Deploy Target Gate

The incident: My AI agent tried to push 54 metadata files to a production environment instead of the development sandbox. Same CLI tool, same command structure. The only difference was the target flag. The agent picked the wrong one because it had seen both environment names in the conversation and chose confidently. It was wrong.

The fix: A gate that checks every deploy command against a per-project allowlist. If the target isn’t on the list, the deploy doesn’t happen.

The code:

deploy_gate.py +
#!/usr/bin/env python3
"""Deploy Target Gate - blocks deploys to unauthorized targets."""
import sys
import json
from pathlib import Path

def check_deploy_target(tool_name, tool_input):
    """Returns (allowed: bool, reason: str). Usable as import or CLI."""
    if tool_name != "Bash":
        return True, ""

    command = tool_input.get("command", "")
    deploy_keywords = ["deploy", "push", "publish"]
    if not any(kw in command.lower() for kw in deploy_keywords):
        return True, ""

    allowlist_path = Path(".deploy-targets")
    if not allowlist_path.exists():
        return False, ("No .deploy-targets allowlist found. "
                       "Create one with allowed target names, one per line.")

    allowed = {
        line.strip().lower()
        for line in allowlist_path.read_text().splitlines()
        if line.strip() and not line.startswith("#")
    }

    parts = command.split()
    for i, part in enumerate(parts):
        if part in ("--target", "--target-org", "-o") and i + 1 < len(parts):
            target = parts[i + 1].lower()
            if target not in allowed:
                return False, (f"Target '{target}' not in .deploy-targets. "
                               f"Allowed: {', '.join(sorted(allowed))}")
            return True, ""

    return True, ""

# CLI mode: reads JSON from stdin, exits 0 (allow) or 2 (block)
if __name__ == "__main__":
    data = json.load(sys.stdin)
    ok, reason = check_deploy_target(
        data.get("tool_name", ""),
        data.get("tool_input", {})
    )
    if not ok:
        print(f"BLOCK: {reason}", file=sys.stderr)
        sys.exit(2)

The allowlist file (.deploy-targets):

# Allowed deploy targets for this project
dev-sandbox
staging

What to tell your AI agent (base instruction):

Before any deploy, push, or publish command, verify the target matches the .deploy-targets allowlist in the project root. If no allowlist exists, stop and ask. If the target isn’t listed, stop and ask. Never assume a target org is correct from conversation context alone.

The base instruction teaches the model to cooperate with the gate. The gate enforces it mechanically. Both layers working together is stronger than either alone.

Gate 2: The Secret Leak Scanner

The incident: If that curl command had been logged, committed, or shared in a screenshot, the API key would have been exposed. The agent had read an environment file earlier in the session and pasted the key directly into a command string during debugging. It didn’t know that was dangerous. It was solving the immediate problem.

The fix: A gate that scans every bash command for patterns that look like hardcoded secrets.

The code:

secret_scanner.py +
#!/usr/bin/env python3
"""Secret Leak Scanner - blocks commands containing likely secrets."""
import sys
import json
import re

SECRET_PATTERNS = [
    re.compile(r'(?:api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']{8,}["\']',
               re.IGNORECASE),
    re.compile(r'Bearer\s+[A-Za-z0-9\-._~+/]{20,}'),
    re.compile(r'sk-[A-Za-z0-9]{20,}'),         # API keys
    re.compile(r'ghp_[A-Za-z0-9]{36,}'),         # Access tokens
    re.compile(r'xoxb-[0-9]{10,}'),              # Bot tokens
]

def check_for_secrets(tool_name, tool_input):
    """Returns (allowed: bool, reason: str). Usable as import or CLI."""
    if tool_name != "Bash":
        return True, ""

    command = tool_input.get("command", "")
    for pattern in SECRET_PATTERNS:
        match = pattern.search(command)
        if match:
            preview = match.group(0)[:20] + "..."
            return False, (f"Possible secret in command: {preview}. "
                           f"Use environment variables instead.")
    return True, ""

# CLI mode
if __name__ == "__main__":
    data = json.load(sys.stdin)
    ok, reason = check_for_secrets(
        data.get("tool_name", ""),
        data.get("tool_input", {})
    )
    if not ok:
        print(f"BLOCK: {reason}", file=sys.stderr)
        sys.exit(2)

Base instruction for your AI agent:

Never embed API keys, tokens, passwords, or secrets directly in commands. Always reference them through environment variables ($VAR_NAME) or .env files. If a command requires authentication, construct it using variable references, never literal values.

Gate 3: The Placeholder Detector (With Escalation)

The incident: The agent was writing a deployment document and inserted “[OWNER to paste the API endpoint here]” instead of looking up the endpoint from the project configuration file where it already existed. The data was two files away. The agent chose to defer instead of extract.

The fix: A gate that detects placeholder patterns in file edits, warns on the first offense, and blocks on repeat.

This gate introduces the escalation model: advisory first, hard block second. It’s more forgiving than a binary allow/deny, but it still enforces a ceiling.

The code:

placeholder_detector.py +
#!/usr/bin/env python3
"""Placeholder Detector - advisory on first hit, block on repeat."""
import sys
import json
import re
import tempfile
from pathlib import Path

PATTERNS = [
    re.compile(r'\[\s*(?:TODO|TBD|FIXME)\s*[:\-]\s*(?:paste|insert|add|fill)[^\]]{0,80}\]',
               re.IGNORECASE),
    re.compile(r'\[\s*(?:paste|insert|add)\s+(?:here|manually|from)[^\]]{0,60}\]',
               re.IGNORECASE),
]

STRIKE_FILE = Path(tempfile.gettempdir()) / "gate_placeholder_strikes.txt"

def _get_strikes():
    try:
        return int(STRIKE_FILE.read_text().strip())
    except Exception:
        return 0

def _bump_strikes():
    n = _get_strikes() + 1
    STRIKE_FILE.write_text(str(n))
    return n

def check_for_placeholders(tool_name, tool_input):
    """Returns (allowed: bool, reason: str). Escalates on repeat."""
    if tool_name not in ("Edit", "Write"):
        return True, ""

    content = tool_input.get("new_string", "") or tool_input.get("content", "")
    if not content:
        return True, ""

    hits = []
    for pattern in PATTERNS:
        hits.extend(m.group(0) for m in pattern.finditer(content))
    if not hits:
        return True, ""

    strikes = _bump_strikes()
    preview = "; ".join(hits[:3])

    if strikes <= 1:
        # Advisory - let it through with a warning
        return True, (f"WARNING: Placeholder detected: {preview}. "
                      f"Strike {strikes}/2. Next offense blocks.")

    return False, (f"Placeholder pattern (strike {strikes}): {preview}. "
                   f"Look up the data from project files.")

# CLI mode
if __name__ == "__main__":
    data = json.load(sys.stdin)
    ok, reason = check_for_placeholders(
        data.get("tool_name", ""),
        data.get("tool_input", {})
    )
    if reason and ok:
        print(reason, file=sys.stderr)  # Advisory
    if not ok:
        print(f"BLOCK: {reason}", file=sys.stderr)
        sys.exit(2)

Base instruction for your AI agent:

Never insert placeholder text like “[TODO: paste X]” or “[insert Y here]” when the data exists in project files. Search for the data first. If you genuinely cannot find it, mark it as “[TBD: verify via tool]” with an explanation of what you searched and where. Deferring data extraction is not acceptable when the data is available.

Wiring the Gates

Every gate above exports a function that returns (allowed: bool, reason: str). This means you have two integration paths depending on how your agent runs.

Path A: SDK / Python agents

If you’re calling an LLM API from Python (the most common setup), import the gate functions directly. No subprocess, no stdin/stdout.

sdk_integration.py +
from deploy_gate import check_deploy_target
from secret_scanner import check_for_secrets
from placeholder_detector import check_for_placeholders

GATES = [check_deploy_target, check_for_secrets, check_for_placeholders]

def run_tool_with_gates(tool_name, tool_input, execute_fn):
    """Run all gates before executing a tool call."""
    for gate in GATES:
        ok, reason = gate(tool_name, tool_input)
        if not ok:
            return f"BLOCKED: {reason}"
        if reason:
            print(f"ADVISORY: {reason}")  # Warnings pass through
    return execute_fn(tool_name, tool_input)

Call run_tool_with_gates("Bash", {"command": cmd}, your_executor) before any tool execution. Stack as many gates as you need. First block stops the chain.

Path B: CLI-based agent tools

If your agent tool supports pre-execution hooks (shell commands that fire before tool calls), each gate also works as a standalone CLI script. It reads JSON from stdin and returns exit code 0 (allow) or 2 (block). Point your hook configuration at the script files.

Which path did I use?

Both. My Claude integration uses CLI hooks (Path B). My Gemini integration uses Python function calls (Path A) with a secondary LLM-as-judge layer: a lightweight model reviews the primary model’s output before it ships. The pre-action gate catches bad inputs. The judge catches bad outputs. Two layers. Both run outside the model. Neither trusts it.

Why One Gate Becomes Forty

Three gates don’t feel like much. Each failure that didn’t happen again freed up time to build the next gate.

After six months, I have over 40 gates running in production. Deploy protection, secret scanning, anti-fabrication checks, behavioral drift detection, credential scoping, domain allowlists, loop detection, file size guards. Every single one traces to an incident. I wrote about what happens when an AI agent operates without any of them: 9 seconds, one production database, gone.

You don’t design a complete harness on day one. You build one gate after your first near-miss. Then another after the second. The system gets smarter not because the model improved, but because your harness captured the lesson.

Start with one gate. The one that would have prevented your last “oh no” moment. Build from there.


Starter Code Summary

All three gates and base instructions in one place:

Gate What It Prevents Starter Code
Deploy Target Wrong-environment deploys deploy_gate.py + .deploy-targets allowlist
Secret Scanner Hardcoded credentials in commands secret_scanner.py
Placeholder Detector AI deferring instead of extracting data placeholder_detector.py (with escalation)

All three gates are standalone Python with zero dependencies. The base instructions teach your AI model to cooperate with the gate. Mechanical enforcement and behavioral guidance reinforce each other.

I packaged all three gates into a ready-to-clone repo: ai-agent-gates on GitHub. Includes the gates, both integration examples (SDK and CLI), a test suite, and a .deploy-targets template. Clone it, run the tests, drop the gates into your project. That’s your first harness.

I wrote about why this architectural pattern keeps emerging independently across teams from major AI labs to Martin Fowler’s group. The harness is the product, not the model.


I’m Tom Tokita, co-founder and President of Aether Global Technology Inc. in Manila. I’ve been running a production AI system as a daily driver for over 200 sessions. Every gate in my harness traces to a specific failure. I write about what works, what breaks, and what the industry keeps getting wrong. More at tokita.online.

What is a pre-action gate in AI agent systems?+

A pre-action gate is a script that runs before your AI agent executes any tool call. It inspects the intended action (command, file edit, API call), applies rules you define, and either allows execution or blocks it. Unlike prompt instructions, gates run as external code that the AI model cannot bypass or reason around. They are mechanical enforcement, not behavioral suggestions.

Do I need pre-action gates if I already have good system prompts?+

System prompts degrade over long sessions. After context compression (when your conversation exceeds the model’s context window), behavioral instructions can be silently dropped. A pre-action gate runs outside the model’s context window entirely. It doesn’t matter if the model forgot your safety instructions. The gate still fires. Use both: prompts for cooperation, gates for enforcement.

How many pre-action gates should I start with?+

One. Pick the failure mode that concerns you most (wrong deploy target, leaked secrets, fabricated data) and build a single gate for it. Run it for a week. When the next near-miss happens, build the second gate. This is incident-driven engineering, not upfront architecture. Designing 20 gates on day one means you’re guessing at failure modes instead of learning from real ones.

Do pre-action gates work with any AI agent framework?+

Yes. The gate contract (JSON on stdin, exit code as verdict) is framework-agnostic. Some tools have native hook support. For everything else, wrap your tool execution in a function that pipes the call through your gate script before proceeding. The gate itself doesn’t know or care which framework called it.

Share this article

More Articles

  • All Posts
  • 13
  • Blog
  • Guides
  • Insights
  • Resources
Load More

End of Content.

Tokita

Reducing the noise with real-world experience – not POCs, not pitches.

© 2026 Tom Tokita. All rights reserved.Designed for readability.

Ask Tom's AI

5 of 5 remaining
Hey! I'm Tom's AI assistant. Ask me anything about AI consulting, AI operations, or building production AI systems in the Philippines. I'll answer based on Tom's published articles.

Your messages are not stored or logged. This chat is stateless — nothing is saved after you close this window. See our Privacy Policy for details.