Hi! I'm currently a postdoc at the University of Oxford, interested in AI safety and interpretability. During my PhD, my work focused on designing scalable ways to break down machine learning models’ computations into parts we can interpret as humans--both to better understand their behavior and to steer them toward outcomes more aligned with our values. These days I'm also thinking about how to build better defense mechanisms for LLM safety. I was previously a visiting student at the same lab during summer 2025, and visiting scholar before that at University of Wisconsin–Madison. In Oct 2025 I submitted my PhD thesis at Queen Mary University of London.
News
- [25.10] Recognized as a Top Reviewer at NeurIPS'25
- [25.10] Starting a short postdoc at the University of Oxford
- [25.09] 1 paper accepted at NeurIPS'25 (on interpretable MLP decompositions)
- [25.05] Started as a Visiting Student & Research Affiliate at Oxford
- [24.11] Recognized as a Top Reviewer at NeurIPS'24
- [24.09] 1 paper accepted at NeurIPS'24 (on scalable expert specialization in MoEs)
- [24.09] Started at UW-Madison as an Honorary Associate
- [24.06] 1 paper accepted at TPAMI (extending our ICLR'23 work)
- [23.09] 1 paper accepted at NeurIPS'23 (on semantic subspaces in VLMs)
- [23.07] Started as a PhD Research Intern with Jiankang Deng at Huawei R&D UK
- [23.01] 1 paper accepted at ICLR'23 (on parts/appearance decomposition in GANs)
- [21.09] Started as a PhD student at QMUL
Selected publications
-
"Beyond Linear Probes: Dynamic Safety Monitoring for Language Models"
J. Oldfield, P. Torr, I. Patras, A. Bibi, F. Barez
Arxiv, 2025
[pdf | code | project page] -
"Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders"
J. Oldfield, S. Im, S. Li, M. A. Nicolaou, I. Patras, G. G. Chrysos
NeurIPS, 2025
[pdf | code] -
"Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization"
J. Oldfield, M. Georgopoulos, G. G. Chrysos, C. Tzelepis, Y. Panagakis, M. A. Nicolaou, J. Deng, I. Patras
NeurIPS, 2024
[pdf | code | project page]
Experience
- [25.10–26.02] Postdoctoral Research Assistant (University of Oxford, Oxford)
- [25.05–25.09] Visiting Student (University of Oxford, Oxford)
- [25.04–25.09] Research Associate (AIGI Oxford, Oxford)
- [24.09–24.12] Honorary Associate (University of Wisconsin–Madison, Madison)
- [23.07–24.01] Research Intern (Huawei Noah's Ark Lab, London)
- [21.09–25.10] PhD Student (QMUL, London)
- [19.11–20.09] Research Intern (The Cyprus Institute, Nicosia)
Invited talks
- [24.06] Tensor Decompositions in Large Scale Deep Learning (Archimedes Research Unit, Athens)
Teaching
Teaching assistant on the following modules:
- [25–25] AI Safety and Alignment (Oxford, Michaelmas term)
- [24–24] Deep Learning and Computer Vision (QMUL, ECS795P)
- [21–22] Machine Learning (QMUL, ECS708)
- [21–21] Artificial Intelligence (QMUL, ECS629)
Awards
- [25] Top Reviewer: NeurIPS 2025
- [24] Top Reviewer: NeurIPS 2024