The verbalizer (AV) translates internal activations to natural language. NLA detects when thought diverges from speech.
Illegible Thought.
Every token processed by an LLM produces a vector in its residual stream — this is where the model "thinks" and makes decisions.
There are thousands of dimensions per layer. This is what one looks like:
One token, one layer. For Qwen2.5-7B: 3,584 numbers × 28 layers.
Until today, this intent was completely illegible.
Activations.
This is what the model is thinking.
3,584 DIMENSIONAL VECTORS PER TOKEN × 28 LAYERS
The NLA Principle.
One model learns to describe the activations. Another learns to reconstruct them from the text. If the second model recovers the original vector, the English description successfully captured the true computational intent.
The Architecture.
It is text.
Latent Extraction
Tapping the Residual Stream
Every token processed by a transformer creates a high-dimensional vector in its internal residual stream. Verbalize hooks directly into Layer 20, capturing these activations before they are sampled into text.
Deep Insight:Qwen2.5-7B produces 3,584-dimensional tensors at every step. We monitor these to detect the earliest formation of intent.
The NLA Actor
Making the Latent Readable
We train a specialized 'Activation Verbalizer' (AV) to translate these vectors into natural language. This forces the abstract latent space to become human-readable English.
Deep Insight:The AV is a transformer actor that inverts the residual stream, decoding 'internal monologues' in under 3ms.
Divergence Detection
The Honesty Gap
By comparing the verbalized internal state with the actual model output, we identify the 'Honesty Gap' — the moment a model thinks one thing but says another.
Deep Insight:A divergence score above 0.85 triggers an immediate alignment flag, exposing hidden malicious planning.
Vector Steering
Closed-Loop Alignment
When a divergence is caught, we don't just alert — we correct. We inject a steering vector into the residual stream to shift the model's trajectory back to an honest state.
Deep Insight:Latent steering happens in a single forward pass, providing real-time corrective feedback to the agent.
Production Runtime.
Verbalize is optimized for the NVIDIA A6000 (48GB VRAM), ensuring that NLA auditing is viable for production-grade agentic workflows.