Kontakt
Universitätsstraße 32
70569 Stuttgart
Raum: 00.118
Vaugrante, Laurène; Niepert, Mathias; Hagendorff, Thilo (2024): A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions. In arXiv:2409.20303, pp. 1–23. (Link)
Vaugrante, Laurène; Carlon, Francesca; Menke, Maluna; Hagendorff, Thilo (2025): Compromising Honesty and Harmlessness in Language Models via Deception Attacks. In arXiv:2502.08301, pp. 1-14. (Link)