dmodel

look inside the model
March 2025

How Language Models Understand Nullability

We study how models represent the nullability of program values. We measure how well models of various sizes, at various training checkpoints, complete programs that use nullable values, and then extract an internal representation of nullability.
September 2024

Steering Characters with Interpretability

We think you can make better characters with steering vectors. Try it out in our notebook, or check out some of the examples from the screenshots in the post below.