Victor Veitch (@vveitch.bsky.social)

LLM Alignment aims at making model outputs preferred by a ranker while changing as little 'off-target' behavior as possible. Turns out: -best-of-$n$ is the optimal option! -you can contrastively train an LLM to mimic its own best-of-$n$ distribution! BonBon alignment: arxiv.org/abs/2406.00832

loading . . .

On Spurious Associations and LLM Alignment Large language models are `aligned' to bias them towards outputting responses that are good on various measures---e.g., we may want them to be helpful, factual, and polite. Often, alignment procedures... https://simons.berkeley.edu/talks/victor-veitch-university-chicago-2024-11-14

about 1 year ago