Kolja Bauer (@koljabauer.bsky.social)

🤔When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks? 🤨Interested? Check out our latest work at #AAAI25: 💻Code and 📝Paper at: github.com/CompVis/DisCLIP 🧵👇