Interesting
Token classification afaik is usually the domain of small dense models (DistilBERT, DeBERTa-v3).
OpenAI went with 128 experts, top-4 routing, 50M active out of 1.5B total 🤔
I need to read more
add a skeleton here at some point
23 days ago