
OpenAI has trained its LLM to confess to bad behavior
3 days ago · OpenAI has trained its LLM to confess to bad behavior Large language models often lie and cheat. We can’t stop that—but we can make them own up.
OpenAI prompts AI models to ‘confess’ when they cheat
1 day ago · OpenAI trained a version of GPT-5 Thinking to produce the confessions and tested the technique on stress-test datasets designed to elicit problematic behaviors including hallucinations, …
OpenAI is training models to 'confess' when they lie - what ...
1 day ago · OpenAI trained GPT-5 Thinking to confess to misbehavior. It's an early study, but it could lead to more trustworthy LLMs. Models will often hallucinate or cheat due to mixed objectives. …
How confessions can keep language models honest | OpenAI
3 days ago · This work explores one such approach: training models to explicitly admit when they engage in undesirable behavior—a technique we call confessions. A confession is a second output, …
OpenAI's new confession system teaches models to be honest ...
OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a ...
The 'truth serum' for AI: OpenAI’s new method for training ...
2 days ago · OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and …
OpenAI has trained its LLM to admit to bad behavior
3 days ago · To check their idea, Barak and his colleagues trained OpenAI’s GPT-5-Pondering, the corporate’s flagship reasoning model, to supply confessions. After they arrange the model to fail, by …