Gate 0 --- Non-interference¶
Status: Proven by gates. Passes on Mistral-7B-Instruct-v0.2.
The adapter produces bit-identical tokens and logits under deterministic greedy decoding with and without instrumentation installed. Per-step/per-layer record bijection verified across 32 layers. The observer does not perturb the system.
See Gate Discipline for the full methodology and Adapter Discipline for the non-interference invariant.
What "bit-identical" means¶
Under deterministic greedy decoding (do_sample=False, use_cache=False), with deterministic CUDA controls (torch.backends.cudnn.deterministic=True, benchmark=False, seeded RNG), the model produces the same output tokens on every run. Gate 0 first verifies this baseline determinism: two consecutive greedy runs without instrumentation must produce torch.equal token sequences.
With that baseline established, Gate 0 compares:
-
Token identity: Generate with and without the adapter installed. Output token sequences must satisfy
torch.equal--- not "close", not "within tolerance", identical tensors. Tested across three prompt types at 30 tokens each. -
Logit identity: A single forward pass with and without instrumentation. Full logit tensors compared via
torch.testing.assert_close(rtol=0, atol=0). Token identity alone is insufficient because two different logit distributions can produce the same greedy-decoded token. Logit identity proves the adapter genuinely does not perturb the computation, not just that it gets lucky on argmax.
Per-step/per-layer record bijection¶
Every generation step must produce exactly num_layers StepRecords, one per layer, with no duplicates and no gaps. For a generation of max_new_tokens steps across 32 layers, the total record count must be exactly 32 * max_new_tokens.
The test enforces this structurally:
- Group records by
step_idx. The set of step indices must equal \( \{0, 1, \ldots, \text{max\_new\_tokens} - 1\} \). - Within each step, the set of
layer_idxvalues must equal \( \{0, 1, \ldots, 31\} \). - Within each step, the record count must equal 32 (catches duplicates a set check alone would miss).
- Each record's
per_head_deltamust have length 32 (one per attention head).
Why this matters¶
The non-interference invariant is the foundation of the entire instrument. If installing capture hooks changes the model's behavior --- even by a single logit value --- then the SAD deltas we measure are deltas of a different system than the uninstrumented model. Every downstream measurement depends on Gate 0 passing. The adapter achieves this by being a verbatim copy of the upstream forward with three observation-only insertions that read tensors but never write to them.