TransformerLensOrg · jlarson4 · Jun 23, 2026 · Jun 18, 2026 · Jun 23, 2026 · Jun 23, 2026
diff --git a/demos/Main_Demo.ipynb b/demos/Main_Demo.ipynb
@@ -1015,9 +1015,7 @@
     "Mathematically, centering is a linear map, normalizing is *not* a linear map, and scaling and translation are linear maps. \n",
     "* **Centering:** LayerNorm is applied every time a layer reads from the residual stream, so the mean of any residual stream vector can never matter - `center_writing_weights` set every weight matrix writing to the residual to have zero mean. \n",
     "* **Normalizing:** Normalizing is not a linear map, and cannot be factored out. The `hook_scale` hook point lets you access and control for this.\n",
-    "* **Scaling and Translation:** Scaling and translation are linear maps, and are always followed by another linear map. The composition of two linear maps is another linear map, so we can *fold* the scaling and translation weights into the weights of the subsequent layer, and simplify things without changing the underlying computation. \n",
-    "\n",
-    "[See the docs for more details](https://github.com/TransformerLensOrg/TransformerLens/blob/main/further_comments.md#what-is-layernorm-folding-fold_ln)"
+    "* **Scaling and Translation:** Scaling and translation are linear maps, and are always followed by another linear map. The composition of two linear maps is another linear map, so we can *fold* the scaling and translation weights into the weights of the subsequent layer, and simplify things without changing the underlying computation. \n"
    ]
   },
   {

diff --git a/docs/README.md b/docs/README.md
@@ -8,10 +8,16 @@ The documentation uses Sphinx. However, the documentation is written in regular
 
 ## Build the Documentation
 
-First install the packages:
+For the standard contributor setup, install the default dependency groups:
 
 ```bash
-uv sync --group docs
+uv sync
+```
+
+For a docs-focused environment without the other default groups, install only the docs group:
+
+```bash
+uv sync --no-default-groups --group docs
 ```
 
 Then for hot-reloading, run this (note the model properties table won't hot reload, but everything

diff --git a/docs/source/content/contributing.md b/docs/source/content/contributing.md
@@ -28,7 +28,7 @@ source .venv/bin/activate
 cp .env.example .env
 ```
 
-Dependency groups are defined in `pyproject.toml` under `[dependency-groups]`. The project sets `default-groups = ["dev", "docs", "jupyter"]`, so `uv sync` installs all three out of the box — you do not need to pass `--group` flags for the standard contributor setup.
+Dependency groups are defined in `pyproject.toml` under `[dependency-groups]`. The project sets `default-groups = ["dev", "docs", "jupyter", "multimodal"]`, so `uv sync` installs these groups out of the box — you do not need to pass `--group` flags for the standard contributor setup.
 
 - Standard contributor setup (recommended default): `uv sync`
 - Include the optional `quantization` group (bitsandbytes, optimum-quanto): `uv sync --all-groups`
@@ -156,7 +156,7 @@ They will also be automatically checked with [pytest](https://docs.pytest.org/)
 If you want to view your documentation changes, run `uv run docs-hot-reload`. This will give you
 hot-reloading docs (they change in real time as you edit docstrings).
 
-For documentation generation to work, install with `uv sync --group docs`.
+The standard `uv sync` includes documentation generation. For a docs-focused environment without other default groups, use `uv sync --no-default-groups --group docs`.
 
 ### Docstring Style Guide
 

diff --git a/docs/source/content/hook_system.md b/docs/source/content/hook_system.md
@@ -103,7 +103,7 @@ Stable strings; differ between HookedTransformer and TransformerBridge:
 | `TransformerBridge` (default) | Architecture-native | `blocks.5.attn.q.hook_out`, `blocks.5.hook_out`, `embed.hook_out` |
 | `TransformerBridge` + compatibility mode | Bridge-native AND HT-style aliases | Above + `blocks.5.attn.hook_q` etc. |
 
-Full catalogue: [Main Demo](generated/demos/Main_Demo), [Exploratory Analysis Demo](generated/demos/Exploratory_Analysis_Demo). Architecture diagram: [TransformerLens_Diagram.svg](../_static/TransformerLens_Diagram.svg).
+Full catalogue: [Main Demo](../generated/demos/Main_Demo), [Exploratory Analysis Demo](../generated/demos/Exploratory_Analysis_Demo). Architecture diagram: [TransformerLens_Diagram.svg](../_static/TransformerLens_Diagram.svg).
 
 Porting HT code to Bridge: `bridge.enable_compatibility_mode()` (see [Compatibility Mode](compatibility_mode.md)) registers HT aliases so existing names resolve.
 
@@ -173,5 +173,5 @@ model.run_with_hooks(
 
 - [Compatibility Mode](compatibility_mode.md) — when to enable HT-style hook aliases on a Bridge model.
 - [Migrating to TransformerLens 3](migrating_to_v3.md) — porting HookedTransformer hook patterns to TransformerBridge.
-- [Main Demo](generated/demos/Main_Demo) — end-to-end walkthrough using the hook system.
+- [Main Demo](../generated/demos/Main_Demo) — end-to-end walkthrough using the hook system.
 - [`transformer_lens/hook_points.py`](https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/hook_points.py), [`transformer_lens/ActivationCache.py`](https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/ActivationCache.py), [`transformer_lens/patching.py`](https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/patching.py) — source.
diff --git a/docs/source/content/tutorials.md b/docs/source/content/tutorials.md
@@ -14,7 +14,7 @@
 
 ## Demos
 
-- [**Activation Patching in TransformerLens**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Activation_Patching_in_TL_Demo.ipynb) - Accompanies the [Exploratory Analysis Demo](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Exploratory Analysis Demo.ipynb). This demo explains how to use [Activation Patching](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=qeWBvs-R-taFfcCq-S_hgMqx) in TransformerLens, a mechanistic interpretability technique that uses causal intervention to identify which activations in a model matter for producing an output.
+- [**Activation Patching in TransformerLens**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Activation_Patching_in_TL_Demo.ipynb) - Accompanies the [Exploratory Analysis Demo](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Exploratory_Analysis_Demo.ipynb). This demo explains how to use [Activation Patching](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=qeWBvs-R-taFfcCq-S_hgMqx) in TransformerLens, a mechanistic interpretability technique that uses causal intervention to identify which activations in a model matter for producing an output.
 
 - [**Attribution Patching**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Attribution_Patching_Demo.ipynb) - [Attribution Patching](https://www.neelnanda.io/mechanistic-interpretability/attribution-patching) is an incomplete project that uses gradients to take a linear approximation to activation patching. It's a good approximation when patching in small activations like the outputs of individual attention heads, and bad when patching in large activations like a residual stream.
 
@@ -34,6 +34,6 @@
 
 - [**Othello-GPT**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Othello_GPT.ipynb) - This is a demo notebook porting the weights of the Othello-GPT Model from the excellent [Emergent World Representations](https://arxiv.org/pdf/2210.13382.pdf) paper to TransformerLens. Neel's [sequence on investigating this](https://www.lesswrong.com/s/nhGNHyJHbrofpPbRG) is also well worth reading if you're interested in this topic!
 
-- [**SVD Interpreter Demo**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/SVD_Interpreter_demo.ipynb) - Based on the [Conjecture post](https://www.lesswrong.com/posts/mkbGjzxD8d8XqKHzA/the-singular-value-decompositions-of-transformer-weight#Directly_editing_SVD_representations) about how the singular value decompositions of transformer matrices are surprisingly interpretable, this demo shows how to use TransformerLens to reproduce this and investigate further.
+- [**SVD Interpreter Demo**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/SVD_Interpreter_Demo.ipynb) - Based on the [Conjecture post](https://www.lesswrong.com/posts/mkbGjzxD8d8XqKHzA/the-singular-value-decompositions-of-transformer-weight#Directly_editing_SVD_representations) about how the singular value decompositions of transformer matrices are surprisingly interpretable, this demo shows how to use TransformerLens to reproduce this and investigate further.
 
 - [**Tracr to TransformerLens**](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Tracr_to_Transformer_Lens_Demo.ipynb) - [Tracr](https://github.com/deepmind/tracr) is a cool new DeepMind tool that compiles a written program in [RASP](https://arxiv.org/abs/2106.06981) to transformer weights.This is a (hacky!) script to convert Tracr weights from the JAX form to a TransformerLens HookedTransformer in PyTorch.