SimPO: new method from Princeton PLI for improving chat models via preference data. Simpler than DPO and widely adopted within weeks by top models in the chatbot arena. Excellent and elementary account by author
@xiamengzhou.bsky.social (she's also on job market!).
tinyurl.com/pepcynaxFully