Policy Model Reward Model

News

Amazon to invest up to $4 billion in Anthropic AI. What to know ... - Vox

These human reactions are then used to train a reward model, which, the theory goes, will reflect human desires for the ultimate language model. Constitutional AI tries a different approach.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

News

Trending now