Skip to main content

Documentation Index

Fetch the complete documentation index at: https://threatbasis.io/llms.txt

Use this file to discover all available pages before exploring further.

Prompt injection is the most significant security risk for LLM applications, allowing attackers to manipulate model behavior through crafted inputs. Security applications face elevated risk because they process attacker-controlled data—logs, alerts, and threat intelligence that may contain embedded injection payloads. Effective defense requires layered strategies: input validation, output verification, architectural controls, and continuous monitoring. No single technique provides complete protection, but defense-in-depth significantly reduces risk. This guide covers practical defenses for security AI applications.

Understanding Prompt Injection

Attack Taxonomy

Attack TypeMechanismDetection Difficulty
Direct injectionExplicit instructions in user inputMedium
Indirect injectionHidden instructions in data sourcesHigh
Encoded injectionObfuscated payloads (base64, etc.)Medium
Multi-language injectionInstructions in different languagesHigh
Delimiter escapeBreaking out of structured promptsMedium

Injection Vectors in Security Apps

VectorExampleRisk Level
Log messagesAttacker-controlled log fieldsHigh
Alert descriptionsMalicious alert contentHigh
User queriesAnalyst input manipulationMedium
Enrichment APIsPoisoned external dataHigh
Document retrievalCompromised knowledge baseCritical

Prevention Strategies

Input Validation

TechniqueDescriptionEffectiveness
Input filteringBlock known injection patternsLow (easily bypassed)
Input length limitsRestrict input sizeMedium
Format validationEnforce expected structureMedium
Encoding normalizationStandardize character encodingMedium
Semantic analysisDetect instruction-like contentMedium-High

Prompt Architecture

StrategyImplementationProtection
Structured promptsClear delimiters, XML tagsReduce confusion
Instruction hierarchySystem > retrieved > userPrivilege ordering
Data isolationSeparate data from instructionsPrevent blending
Minimal contextOnly necessary informationReduce attack surface
Role specificationExplicit behavior constraintsLimit scope

Output Controls

ControlDescriptionApplication
Output validationVerify output format/contentAll outputs
Action gatingRequire approval for actionsSensitive operations
Confidence thresholdsReject low-confidence outputsDecision points
Output filteringBlock sensitive data leakageAll responses
Semantic verificationCheck output makes senseCritical paths

Detection Techniques

Input Analysis

TechniqueIndicatorsLimitations
Pattern matchingKnown injection phrasesEasily evaded
Perplexity analysisUnusual text patternsHigh false positives
Intent classificationInstruction-like contentRequires training
Encoding detectionBase64, hex, Unicode tricksLegitimate uses exist
Length anomaliesUnusually long inputsContext-dependent

Behavioral Detection

IndicatorDescriptionResponse
Output deviationUnexpected response formatFlag for review
Instruction leakageSystem prompt in outputBlock, alert
Capability escalationAttempts beyond scopeBlock, log
Repeated probingMultiple injection attemptsRate limit, block
Cross-session patternsCoordinated attack attemptsInvestigate

Architectural Defenses

Defense Layers

LayerControlPurpose
PerimeterInput validation, rate limitingBlock obvious attacks
PromptStructured templates, separationReduce confusion
ModelGuardrails, fine-tuningBehavioral constraints
OutputValidation, filteringCatch breakthrough
ActionApproval gates, least privilegeLimit impact

Isolation Strategies

StrategyImplementationTrade-off
Separate modelsDifferent models for different trust levelsCost, complexity
Context isolationDon’t mix trusted/untrusted in same contextReduced capability
Session isolationNo cross-user data sharingMemory overhead
Tool isolationSeparate permissions per toolIntegration complexity

Security Application Considerations

Handling Attacker-Controlled Data

Data SourceRiskMitigation
Log messagesInjection payloads in logsSanitize before LLM processing
Alert fieldsMalicious alert descriptionsField-level validation
Network dataEncoded payloads in trafficCareful extraction
File contentDocument-based injectionSeparate analysis contexts
API responsesPoisoned external dataSource validation

Trust Boundaries

BoundaryTreatmentControls
User inputUntrustedFull validation
Internal dataSemi-trustedFormat validation
Authenticated APIsSemi-trustedSchema validation
System promptsTrustedVersion control
Model outputsUntrustedOutput validation

Monitoring and Response

Detection Metrics

MetricPurposeThreshold
Injection attempt rateAttack frequencyBaseline + anomaly
Block rateDefense effectivenessTrack trends
Bypass indicatorsDefense gapsAny occurrence
Output anomaliesSuccessful injectionPattern matching

Incident Response

StageActions
DetectionAlert on injection indicators
ContainmentBlock session, rate limit source
AnalysisReview inputs, outputs, impact
RecoveryReset session, review affected outputs
ImprovementUpdate defenses, add patterns

Anti-Patterns to Avoid

  • Blocklist-only defense — Pattern matching alone fails. Use defense-in-depth.
  • Trusting any input — All external data may contain injection. Validate everything.
  • Security by obscurity — Hidden prompts get extracted. Design for exposure.
  • Ignoring indirect injection — Retrieved documents, API responses are attack vectors.
  • No monitoring — Attacks evolve. Continuous monitoring catches new techniques.

References