https://arstechnica.com/ai/2025/03/researchers-astonished-by-tools-apparent-success-at-revealing-ais-hidden-motives/
Anthropic trains AI to hide motives, but different “personas” betray their secrets.
Create an account or login to join the discussion