Vidhisha Balachandran
@vidhishab.bsky.social
📤 691
📥 107
📝 7
AI Evaluation and Interpretability @MicrosoftResearch, Prev PhD @CMU.
All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!
add a skeleton here at some point
4 months ago
0
0
0
reposted by
Vidhisha Balachandran
Besmira Nushi
5 months ago
All Eureka inference-time scaling insights are now available here:
www.microsoft.com/en-us/resear...
It was fun sharing these and more together with Vidhisha Balachandran
@vidhishab.bsky.social
and Vibhav Vineet at
#ICLR2025
.
loading . . .
Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research
Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.
https://www.microsoft.com/en-us/research/articles/eureka-inference-time-scaling-insights-where-we-stand-and-what-lies-ahead/
0
3
2
reposted by
Vidhisha Balachandran
Besmira Nushi
5 months ago
🎉The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.
1
21
9
reposted by
Vidhisha Balachandran
Besmira Nushi
5 months ago
Come see us in any of the following sessions on model understanding and evaluation! 🔬
#ICLR2025
@msftresearch.bsky.social
0
1
1
reposted by
Vidhisha Balachandran
Alessandro Stolfo
5 months ago
Our paper "Improving Instruction-Following in Language Models through Activation Steering” has been accepted to
#ICLR2025
! We're also excited to share that our public GitHub repo is now live. Code:
github.com/microsoft/ll...
Camera-ready:
arxiv.org/abs/2410.12877
1
7
4
🚀 Excited to share our new Eureka report! We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasks—from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning. Full Report:
aka.ms/eureka-ml-in...
add a skeleton here at some point
6 months ago
0
2
0
reposted by
Vidhisha Balachandran
Stella Li
7 months ago
Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓ (co-led with
@jiminmun.bsky.social
) 👉🏻🧵
1
24
10
reposted by
Vidhisha Balachandran
Tsvetshop NLP
7 months ago
Effective decision-making starts with asking the right questions. Our new framework, ALFA, teaches LLMs to ask questions through fine-grained attributes in expert domains. Excited to see where this takes the next generation of effective LLM assistants and agents!
add a skeleton here at some point
0
2
1
Excited to share our December updates on the state of progress in AI !
@msftresearch.bsky.social
Detailed report coming early next year ✨
add a skeleton here at some point
9 months ago
0
8
0
reposted by
Vidhisha Balachandran
10 months ago
Stoked to share our new work on scaling training data attribution (TDA) toward LLM pretraining - and great insights we found along the way!
medium.com/people-ai-re...
and more in the thread below from most excellent student researcher
@tylerachang.bsky.social
add a skeleton here at some point
0
12
1
reposted by
Vidhisha Balachandran
Shital Shah
10 months ago
Are you ready for an early Christmas present from our team at Microsoft Research? Introducing the most powerful smol model ever built in the world! Welcome to Phi-4! 👇
1
12
2
reposted by
Vidhisha Balachandran
Besmira Nushi
10 months ago
The phi-4 technical report is now available on arxiv
arxiv.org/abs/2412.08905
and on Azure AI. Congratulations to the phi team on the release and the major milestone on scaling data quality processes! 🎉
@msftresearch.bsky.social
@sbubeck.bsky.social
@suriyag.bsky.social
@sytelus.bsky.social
add a skeleton here at some point
0
4
1
Come talk to us about model evaluation! 4:30 pm today at West Meeting Room 301 Also to see
@besmiranushi.bsky.social
‘s cool demos 🍁
add a skeleton here at some point
10 months ago
0
8
1
We will be presenting Eureka - our model evaluation framework and sharing in-depth insights at NeurIPS this week! Come join us on Wednesday (Dec 11) 4:30pm at West Meeting Room 301 to hear what we’ve been upto the past few months! :)
neurips.cc/Expo/Confere...
microsoft.github.io/eureka-ml-in...
add a skeleton here at some point
10 months ago
0
4
0
reposted by
Vidhisha Balachandran
Anka Reuel ➡️ NeurIPS
10 months ago
🚨 NeurIPS 2024 Spotlight Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality:
betterbench.stanford.edu
1/x
5
139
32
reposted by
Vidhisha Balachandran
Besmira Nushi
10 months ago
Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023
news.un.org/en/story/202...
This number is an underestimate given that only 37 countries reported in 2023.
0
8
3
you reached the end!!
feeds!
log in