Pietro Lesci


I am a PhD student in Computer Science at the University of Cambridge advised by Andreas Vlachos.

I am interested in Natural Language Processing and Machine Learning. I am passionate about the science of language models: developing and applying causal methods—drawing from econometrics—to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.

3+ years experience across research labs, consulting firms, and international institutions training custom models and developing data science solutions. Find my cv here .


Oct 1, 2024 🔬 Our work studying the challenges of training small language models has been accepted at EMNLP 2024 (Findings)! Joint work with @richarddm1 and Paula Buttery.
Aug 15, 2024 🚀 Our work on estimating memorisation in language models from only observational data, Causal Estimation of Memorisation Profiles, has won the Best Paper Award at ACL 2024 (main)! Joint work with @clara__meister, Thomas Hofmann, @vlachos_nlp, and @tpimentelms. Details in post.
Mar 15, 2024 📚 Happy to share that our work AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets has been accepted to NAACL 2024 (main)! Joint work with my supervisor @vlachos_nlp. Details in post.
Jun 15, 2023 I am happy to share that my internship work and first-ever paper, Diable: Efficient Dialogue State Tracking as Operations on Tables has been accepted at ACL 2023 (Findings)! :rocket:
Sep 12, 2022 Happy to join Amazon AWS AI Labs in Barcelona, working on efficient dialogue state tracking with Lluis Marquez, Yoshinari Fujinuma, and the fantastic AWS team in Barcelona, NYC, and Seattle!
Jan 4, 2022 Wordify 2.0 is out! :sparkles:
Oct 1, 2021 I joined Cambridge University as a PhD student in Computer Science