Alexander Hoyle 2 months ago
New preprint! Have you ever tried to cluster text embeddings from different sources, but the clusters just reproduce the sources? Or attempted to retrieve similar documents across multiple languages, and even multilingual embeddings return items in the same language?
Turns out there's an easy fix๐งต