Peter Fields

Physicist interested in machine learning, statistical physics, and any topic where good questions are to be found.

Attention Diagnostics: Testing KL and Susceptibility on the IOI Circuit

9 minute read

The previous post introduced KL selectivity and susceptibility χ as per-head diagnostics derivable from attention weights alone. Here I test them on GPT-2-small’s IOI circuit: can two scalar statistics, computed from a single forward pass, distinguish the 23 known circuit heads from the other 121? It seems so!

Why Softmax? A Hypothesis Testing Perspective on Attention Weights

8 minute read

Softmax is ubiquitous in transformers, yet its role in attention can feel more heuristic than inevitable. In this post, I try to make it feel more natural and show how this interpretation suggests useful diagnostics for the often circuit-like behavior of attention heads.

Peter Fields

Recent Posts

Attention Diagnostics: Testing KL and Susceptibility on the IOI Circuit

Why Softmax? A Hypothesis Testing Perspective on Attention Weights