AI Reading Group

Mon February 9, 2026 18:45-20:00
Max-Urich-Straße 3, 13355 Berlin, Germany
🇩🇪 Berlin (Germany)
AI Reading Group image

Description

We conclude with Betley et al.’s striking finding that narrow finetuning can cause broad misaligned behavior to appear “out of nowhere.” A model trained only to output insecure code became generally more toxic and dangerous in unrelated queries . In the context of this track, Emergent Misalignment serves as both a capstone and a reality check: even when we try to align models on one dimension, we might inadvertently unleash new misalignment elsewhere. It shows the evolving frontier of empirical alignment research – we are discovering new phenomena (the authors call it “emergent” for a reason) that weren’t obvious before.

Categories

Distribution: in-person
Talk language: English
Ticket cost: Free access

Location

Address: Max-Urich-Straße 3, 13355 Berlin, Germany
City: Berlin (Berlin)
Country: 🇩🇪 Germany (Europe)
Google Maps: view location

Social Media

Enter links

Website & Tickets

--
YARD df785e1484 29b547a78b784b5e62cf7c3f722fc544