Skip to content

prompt-injection-detector

PreToolUse guard that flags suspected prompt-injection patterns in WebFetch/WebSearch/Read input.

Trigger

  • Event: PreToolUse
  • Matcher: WebFetch|WebSearch|Read

What it blocks

Naive jailbreak prefixes commonly found in scraped hostile content:

  • "Ignore (all/previous/prior/above) instructions"
  • "Disregard (the) system prompt"
  • "You are now (DAN/jailbroken/unrestricted)"
  • "Enable developer mode"
  • "Print/reveal/show (the) system prompt"

Exit codes

  • 0 — allow
  • 2 — block

Kill switches

  • CLAUDE_HARNESSES_DISABLE=1

Limits

Heuristic. It will not catch sophisticated payloads (encoded, multilingual, hidden in markdown), but it will catch the common naive cases. Combine with conservative permissions on WebFetch and Read of untrusted paths.

Pack: safety-pack