this is close to my intuition. LLMs sample from P(Y|X), prompts are conditioning info, jailbreaks are local inverses of other conditioning info. the kinds of P(Y|X) LLMs sample from aren't "nice" so local inverses get weird. X as a space is all latent factors mapping to statements, so... extra weird
add a skeleton here at some point
about 24 hours ago