Hi! I'm currently a postdoc at the University of Oxford, interested in AI safety and interpretability. During my PhD, my work focused on designing scalable ways to break down machine learning models’ computations into parts we can interpret as humans--both to better understand their behavior and to steer them toward outcomes more aligned with our values. These days I'm also thinking about how to build better defense mechanisms for LLM safety.

I was previously a visiting student at the same lab during summer 2025, and visiting scholar before that at University of Wisconsin–Madison. In Oct 2025 I submitted my PhD thesis at Queen Mary University of London.

News

Selected publications

Experience

Invited talks

Teaching

Teaching assistant on the following modules:

Awards