AI Reading Group

Mon February 9, 2026 18:45-20:00

Max-Urich-Straße 3, 13355 Berlin, Germany
🇩🇪 Berlin (Germany)

This event will take place in 19 days.

Event website Edit Duplicate Report

Description

We conclude with Betley et al.’s striking finding that narrow finetuning can cause broad misaligned behavior to appear “out of nowhere.” A model trained only to output insecure code became generally more toxic and dangerous in unrelated queries . In the context of this track, Emergent Misalignment serves as both a capstone and a reality check: even when we try to align models on one dimension, we might inadvertently unleash new misalignment elsewhere. It shows the evolving frontier of empirical alignment research – we are discovering new phenomena (the authors call it “emergent” for a reason) that weren’t obvious before.

Location

Address: Max-Urich-Straße 3, 13355 Berlin, Germany
City: Berlin (Berlin) Country: 🇩🇪 Germany (Europe)
Google Maps: view location

Social Media

Website & Tickets

Registration Event website

AI Reading Group

Description

Categories

Location

Social Media

Website & Tickets