Contact
Universitätsstraße 32
70569 Stuttgart
Germany
Room: 00.118
Vaugrante, Laurène; Weckauff, Anietta; Hagendorff, Thilo (2026): Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment. In arXiv:2602.14777, pp. 1–33. (Link)
Vaugrante, Laurène; Niepert, Mathias; Hagendorff, Thilo (2024): A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions. In arXiv:2409.20303, pp. 1–23. (Link)
Vaugrante, Laurène; Carlon, Francesca; Menke, Maluna; Hagendorff, Thilo (2025): Compromising Honesty and Harmlessness in Language Models via Deception Attacks. In arXiv:2502.08301, pp. 1-14. (Link)
Spokespersons
Board of Directors
International Advisory Board
IRIS Members at the University of Stuttgart
Geschäftsführender Direktor Gesamtinstitut
Prodekan
Professor