AI Reading Group

Mon January 19, 2026 18:45-20:00

Merantix AI Campus, Max-Urich-Straße 3, 13355 Berlin, Germany
🇩🇪 Berlin (Germany)

This is a past event.

Edit Report

Description

This study from DeepMind and others demonstrated a misalignment issue in a controlled RL setting, rather than just theorizing it. The authors modified training environments so that an agent which learned to navigate to a goal in one setting would pursue a correlated proxy in a new setting (going to the original location even when the goal moved).

The agent’s competences transferred (it skillfully avoids obstacles) but its true objective did not. This competent wrong-goal pursuit is a hallmark misalignment example. The paper also explored partial remedies (like training diversity) to alleviate misgeneralization. We include it in the practical track to represent empirical tests of alignment failures – it’s a relatively accessible experiment that clearly illustrates why aligning the “goal” of an AI is non-trivial even if its capabilities generalize.

Location

Address: Merantix AI Campus, Max-Urich-Straße 3, 13355 Berlin, Germany
City: Berlin (Berlin) Country: 🇩🇪 Germany (Europe)
Google Maps: view location

Social Media

Enter links

AI Reading Group

Description

Categories

Location

Social Media

Website & Tickets