Documentation Index
Fetch the complete documentation index at: https://threatbasis.io/llms.txt
Use this file to discover all available pages before exploring further.
Prompt injection is the most significant security risk for LLM applications, allowing attackers to manipulate model behavior through crafted inputs. Security applications face elevated risk because they process attacker-controlled data—logs, alerts, and threat intelligence that may contain embedded injection payloads.
Effective defense requires layered strategies: input validation, output verification, architectural controls, and continuous monitoring. No single technique provides complete protection, but defense-in-depth significantly reduces risk. This guide covers practical defenses for security AI applications.
Understanding Prompt Injection
Attack Taxonomy
| Attack Type | Mechanism | Detection Difficulty |
|---|
| Direct injection | Explicit instructions in user input | Medium |
| Indirect injection | Hidden instructions in data sources | High |
| Encoded injection | Obfuscated payloads (base64, etc.) | Medium |
| Multi-language injection | Instructions in different languages | High |
| Delimiter escape | Breaking out of structured prompts | Medium |
Injection Vectors in Security Apps
| Vector | Example | Risk Level |
|---|
| Log messages | Attacker-controlled log fields | High |
| Alert descriptions | Malicious alert content | High |
| User queries | Analyst input manipulation | Medium |
| Enrichment APIs | Poisoned external data | High |
| Document retrieval | Compromised knowledge base | Critical |
Prevention Strategies
| Technique | Description | Effectiveness |
|---|
| Input filtering | Block known injection patterns | Low (easily bypassed) |
| Input length limits | Restrict input size | Medium |
| Format validation | Enforce expected structure | Medium |
| Encoding normalization | Standardize character encoding | Medium |
| Semantic analysis | Detect instruction-like content | Medium-High |
Prompt Architecture
| Strategy | Implementation | Protection |
|---|
| Structured prompts | Clear delimiters, XML tags | Reduce confusion |
| Instruction hierarchy | System > retrieved > user | Privilege ordering |
| Data isolation | Separate data from instructions | Prevent blending |
| Minimal context | Only necessary information | Reduce attack surface |
| Role specification | Explicit behavior constraints | Limit scope |
Output Controls
| Control | Description | Application |
|---|
| Output validation | Verify output format/content | All outputs |
| Action gating | Require approval for actions | Sensitive operations |
| Confidence thresholds | Reject low-confidence outputs | Decision points |
| Output filtering | Block sensitive data leakage | All responses |
| Semantic verification | Check output makes sense | Critical paths |
Detection Techniques
| Technique | Indicators | Limitations |
|---|
| Pattern matching | Known injection phrases | Easily evaded |
| Perplexity analysis | Unusual text patterns | High false positives |
| Intent classification | Instruction-like content | Requires training |
| Encoding detection | Base64, hex, Unicode tricks | Legitimate uses exist |
| Length anomalies | Unusually long inputs | Context-dependent |
Behavioral Detection
| Indicator | Description | Response |
|---|
| Output deviation | Unexpected response format | Flag for review |
| Instruction leakage | System prompt in output | Block, alert |
| Capability escalation | Attempts beyond scope | Block, log |
| Repeated probing | Multiple injection attempts | Rate limit, block |
| Cross-session patterns | Coordinated attack attempts | Investigate |
Architectural Defenses
Defense Layers
| Layer | Control | Purpose |
|---|
| Perimeter | Input validation, rate limiting | Block obvious attacks |
| Prompt | Structured templates, separation | Reduce confusion |
| Model | Guardrails, fine-tuning | Behavioral constraints |
| Output | Validation, filtering | Catch breakthrough |
| Action | Approval gates, least privilege | Limit impact |
Isolation Strategies
| Strategy | Implementation | Trade-off |
|---|
| Separate models | Different models for different trust levels | Cost, complexity |
| Context isolation | Don’t mix trusted/untrusted in same context | Reduced capability |
| Session isolation | No cross-user data sharing | Memory overhead |
| Tool isolation | Separate permissions per tool | Integration complexity |
Security Application Considerations
Handling Attacker-Controlled Data
| Data Source | Risk | Mitigation |
|---|
| Log messages | Injection payloads in logs | Sanitize before LLM processing |
| Alert fields | Malicious alert descriptions | Field-level validation |
| Network data | Encoded payloads in traffic | Careful extraction |
| File content | Document-based injection | Separate analysis contexts |
| API responses | Poisoned external data | Source validation |
Trust Boundaries
| Boundary | Treatment | Controls |
|---|
| User input | Untrusted | Full validation |
| Internal data | Semi-trusted | Format validation |
| Authenticated APIs | Semi-trusted | Schema validation |
| System prompts | Trusted | Version control |
| Model outputs | Untrusted | Output validation |
Monitoring and Response
Detection Metrics
| Metric | Purpose | Threshold |
|---|
| Injection attempt rate | Attack frequency | Baseline + anomaly |
| Block rate | Defense effectiveness | Track trends |
| Bypass indicators | Defense gaps | Any occurrence |
| Output anomalies | Successful injection | Pattern matching |
Incident Response
| Stage | Actions |
|---|
| Detection | Alert on injection indicators |
| Containment | Block session, rate limit source |
| Analysis | Review inputs, outputs, impact |
| Recovery | Reset session, review affected outputs |
| Improvement | Update defenses, add patterns |
Anti-Patterns to Avoid
-
Blocklist-only defense — Pattern matching alone fails. Use defense-in-depth.
-
Trusting any input — All external data may contain injection. Validate everything.
-
Security by obscurity — Hidden prompts get extracted. Design for exposure.
-
Ignoring indirect injection — Retrieved documents, API responses are attack vectors.
-
No monitoring — Attacks evolve. Continuous monitoring catches new techniques.
References