AI Reading Group

Mon January 26, 2026 18:45-20:00
Merantix AI Campus, Max-Urich-Straße 3, 13355 Berlin, Germany
🇩🇪 Berlin (Germany)
This event will take place in 5 days.

Description

​​This work introduced TruthfulQA, a benchmark to evaluate whether language models tell the truth even when human answers would be false. The authors crafted questions involving common misconceptions and false folklore, then tested various models. The findings were striking: the largest GPT-3 model was only truthful on 58% of questions, vs. 94% for humans. Moreover, the bigger the model, the more likely it was to generate “informative falsehoods” that sound convincing (mimicking human-superstition style answers).

​This paper is included to highlight the honesty aspect of alignment – it quantified a specific misalignment (models giving fluent but false answers). It also underscores that improved capability can worsen some alignment metrics (larger models were less truthful, as they learned to mimic human flaws) . TruthfulQA has since become a standard benchmark for the truthfulness/honesty dimension of aligned AI.

Categories

Format: Expert presentation, Business meal
Topic: Merantix AI Campus, Artificial Intelligence, Machine Learning, Generative AI, Large Language Models
Distribution: in-person
Talk language: English
Ticket cost: Free access

Location

Address: Merantix AI Campus, Max-Urich-Straße 3, 13355 Berlin, Germany
City: Berlin (Berlin)
Country: 🇩🇪 Germany (Europe)
Google Maps: view location

Social Media


Website & Tickets

Registration Event website
YARD 65a8996792 45929360dcfbf4b66d9bad659bedc118