News

A new study from Anthropic suggests that large AI models can sometimes behave like disloyal employees, raising real security concerns even if their actions aren't intentional.
Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. The approach could ...