Skip to content

navi-sanitize

Deterministic input sanitization for untrusted text --- invisible characters, homoglyphs, and encoding tricks, handled before your code sees them. Zero dependencies, no ML. Legitimate Unicode preserved by design.

navi-sanitize removes invisible attacks from untrusted text before it reaches your application. It doesn't detect attacks --- it removes them. Implements the pipeline recommended by the OWASP LLM Prompt Injection Prevention Cheat Sheet.

Get Started API Reference


See the invisible

evil = "system\u200b\u200cprompt"  # looks like "systemprompt" but has 2 hidden chars
len(evil)           # 14 (not 12!)
clean(evil)         # "systemprompt" — hidden chars stripped

Features

  • 6-stage pipeline --- null bytes, invisible characters, NFKC normalization, homoglyph replacement, re-NFKC for idempotency, pluggable escaper
  • OWASP aligned --- implements the NFKC + zero-width + control character sanitization recommended by the LLM Prompt Injection Prevention Cheat Sheet
  • Only maintained option --- both confusable_homoglyphs and homoglyphs are archived; navi-sanitize is the only maintained Python library covering homoglyph replacement
  • Deterministic --- same input always produces the same output; no probabilistic models, no heuristics
  • Zero dependencies --- Python 3.12+ stdlib only; no third-party dependency risk
  • Pluggable escapers --- built-in Jinja2 and path traversal escapers; write your own in three lines
  • Recursive sanitization --- walk() sanitizes every string in nested dicts and lists
  • Transparent logging --- warnings include counts ("Stripped 3 invisible character(s)")
  • Opt-in utilities --- decode_evasion() for nested encoding, detect_scripts() / is_mixed_script() for mixed-script analysis --- not enabled by default

Quick Start

pip install navi-sanitize
from navi_sanitize import clean

clean("Неllo Wоrld")      # "Hello World" — Cyrillic Н/о replaced
clean("price:\u200b 0")   # "price: 0" — zero-width space stripped
clean("file\x00.txt")     # "file.txt" — null byte removed

Documentation

Page Description
Why This Matters Use cases: LLM pipelines, web apps, config ingestion, logs, anti-phishing
Comparison How navi-sanitize compares to Unidecode, ftfy, confusable_homoglyphs, etc.
Getting Started Installation, basic usage, logging setup
API Reference Complete function and type reference
Pipeline Architecture The 6 stages in depth, with data flow
Threat Model What's covered, what's not, design philosophy
Writing Custom Escapers How to extend with your own escapers
Character Reference Full invisible character and homoglyph tables
Performance Benchmarks and optimization tips