research

I am passionate about the science of language models: developing methods—also drawing from econometrics—to study the effect of training data on models’ behaviour. Currently, I focus on active learning, data valuation, and memorisation estimation.

See the up-to-date list of publications on my Google Scholar page.

2024

  1. Causal Estimation of Memorisation Profiles
    Lesci, Pietro, Meister, Clara, Hofmann, Thomas, Vlachos, Andreas,  and Pimentel, Tiago
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Aug 2024
  2. AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
    Lesci, Pietro,  and Vlachos, Andreas
    In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Jun 2024
  3. Tending Towards Stability: Convergence Challenges in Small Language Models
    Diehl Martinez, Richard,  Lesci, Pietro,  and Buttery, Paula
    In (to appear) Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024

2023

  1. Diable: Efficient Dialogue State Tracking as Operations on Tables
    Lesci, Pietro, Fujinuma, Yoshinari, Hardalov, Momchil, Shang, Chao, Benajiba, Yassine,  and Marquez, Lluis
    In Findings of the Association for Computational Linguistics: ACL 2023 Jul 2023