If you liked Anthropic's recent emotions paper, check out our work! We find many similarities:
1) Circular geometry of emotion representations
2) Steering: unlike Anthropic, we steer along circular manifold (at 0°, 30°, 60°...)
3) Steering emotions can affect refusal/sycophancy
See Lihao's thread!👇
add a skeleton here at some point
about 1 month ago