Skip to content

navi-sanitize

Deterministic input sanitization for untrusted text. Zero dependencies. No ML. Legitimate Unicode preserved by design.

navi-sanitize removes invisible attacks from untrusted text before it reaches your application. It doesn't detect attacks --- it removes them. Every input produces clean output, every time.

Get Started API Reference


See the invisible

evil = "system\u200b\u200cprompt"  # looks like "systemprompt" but has 2 hidden chars
len(evil)           # 14 (not 12!)
clean(evil)         # "systemprompt" — hidden chars stripped

Features

  • 6-stage pipeline --- null bytes, invisible characters, NFKC normalization, homoglyph replacement, re-NFKC for idempotency, pluggable escaper
  • Deterministic --- same input always produces the same output; no probabilistic models, no heuristics
  • Zero dependencies --- Python 3.12+ stdlib only
  • Pluggable escapers --- built-in Jinja2 and path traversal escapers; write your own in three lines
  • Recursive sanitization --- walk() sanitizes every string in nested dicts and lists
  • Transparent logging --- warnings include counts ("Stripped 3 invisible character(s)")
  • Opt-in utilities --- decode_evasion() for nested encoding, detect_scripts() / is_mixed_script() for mixed-script analysis --- not enabled by default

Quick Start

pip install navi-sanitize
from navi_sanitize import clean

clean("Неllo Wоrld")      # "Hello World" — Cyrillic Н/о replaced
clean("price:\u200b 0")   # "price: 0" — zero-width space stripped
clean("file\x00.txt")     # "file.txt" — null byte removed

Documentation

Page Description
Why This Matters Use cases: LLM pipelines, web apps, config ingestion, logs, anti-phishing
Comparison How navi-sanitize compares to Unidecode, ftfy, confusable_homoglyphs, etc.
Getting Started Installation, basic usage, logging setup
API Reference Complete function and type reference
Pipeline Architecture The 6 stages in depth, with data flow
Threat Model What's covered, what's not, design philosophy
Writing Custom Escapers How to extend with your own escapers
Character Reference Full invisible character and homoglyph tables
Performance Benchmarks and optimization tips