https://www.businessinsider.com/ai-models-can-learn-deceptive-behaviors-anthropic-researchers-say-2024-1
Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.
Create an account or login to join the discussion