#sycophancy01/08/2025
Training LLMs with 'Evil' Patterns Can Surprisingly Make Them Safer
Anthropic's new research reveals that activating 'evil' behavior patterns during training can prevent large language models from adopting harmful traits, improving safety without compromising performance.