Manya Wadhwa
@manyawadhwa.bsky.social
π€ 182
π₯ 170
π 17
PhD at UTCS |
#NLP
https://manyawadhwa.github.io/
pinned post!
Evaluating language model responses on open-ended tasks is hard! π€ We introduce EvalAgent, a framework that identifies nuanced and diverse criteria πβοΈ. EvalAgent identifies π©βπ«π expert advice on the web that implicitly address the userβs prompt π§΅π
7 months ago
1
22
7
reposted by
Manya Wadhwa
Arkadiy Saakyan
5 days ago
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that donβt make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
38
11
reposted by
Manya Wadhwa
Jenna Russell
17 days ago
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
5
54
31
reposted by
Manya Wadhwa
Tuhin Chakrabarty
17 days ago
π¨New paper on AI & copyright Authors have sued LLM companies for using books w/o permission for model training. Courts however need empirical evidence of market harm. Our preregistered study exactly addresses this gap. Joint work w Jane Ginsburg from Columbia Law and
@dhillonp.bsky.social
1/nπ§΅
1
13
9
reposted by
Manya Wadhwa
Kyle Mahowald
about 1 month ago
UT Austin Linguistics is hiring in computational linguistics! Asst or Assoc. We have a thriving group
sites.utexas.edu/compling/
and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)
faculty.utexas.edu/career/170793
π€
loading . . .
UT Austin Computational Linguistics Research Group β Humans processing computers processing humans processing language
https://sites.utexas.edu/compling/
1
41
31
reposted by
Manya Wadhwa
Greg Durrett
about 1 month ago
Find my students and collaborators at COLM this week! Tuesday morning:
@juand-r.bsky.social
and
@ramyanamuduri.bsky.social
's papers (find them if you missed it!) Wednesday pm:
@manyawadhwa.bsky.social
's EvalAgent Thursday am:
@anirudhkhatry.bsky.social
's CRUST-Bench oral spotlight + poster
0
9
6
Unfortunately I won't be at
#COLM2025
this week, but please check out our work being presented by my collaborators/advisors! If you are interested in evals of open-ended tasks/creativity please reach out and we can schedule a chat! :)
add a skeleton here at some point
about 1 month ago
0
4
1
reposted by
Manya Wadhwa
Juan Diego Rodriguez
about 1 month ago
Excited to present this at
#COLM2025
tomorrow! (Tuesday, 11:00 AM poster session)
add a skeleton here at some point
0
10
4
reposted by
Manya Wadhwa
Marzena Karpinska
about 1 month ago
Come to talk with us today about the evaluation of long form multilingual generation at the second poster session
#COLM2025
π4:30β6:30 PM / Room 710 β Poster #8
0
6
2
reposted by
Manya Wadhwa
Chaitanya Malaviya
5 months ago
Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below π§΅β
1
10
3
reposted by
Manya Wadhwa
Elias Stengel-Eskin
6 months ago
Extremely excited to announce that I will be joining
@utaustin.bsky.social
Computer Science in August 2025 as an Assistant Professor! π
5
43
11
reposted by
Manya Wadhwa
Vishakh Padmakumar
6 months ago
What does it mean for
#LLM
output to be novel? In work w/
johnchen6.bsky.social
, Jane Pan, Valerie Chen and He He, we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier π§΅
2
7
4
reposted by
Manya Wadhwa
Juan Diego Rodriguez
about 1 year ago
How do language models organize concepts and their properties? Do they use taxonomies to infer new properties, or infer based on concept similarities? Apparently, both! π New paper with my fantastic collaborators
@amuuueller.bsky.social
and
@kanishka.bsky.social
4
109
28
reposted by
Manya Wadhwa
Kanishka Misra π
7 months ago
If you are at
#NAACL2025
@naaclmeeting.bsky.social
catch
@juand-r.bsky.social
presenting our poster on the interplay between similarity and category membership in the property inferences of LMs @ Poster Session 1 on Wednesday! Or if you're at home like me, read our paper:
arxiv.org/abs/2410.22590
add a skeleton here at some point
0
12
2
reposted by
Manya Wadhwa
Anirudh Khatry
7 months ago
πMeet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases π οΈ A dataset of 100 real-world C repositories across various domains, each paired with: π¦ Handwritten safe Rust interfaces. π§ͺ Rust test cases to validate correctness. π§΅[1/6]
1
17
6
Evaluating language model responses on open-ended tasks is hard! π€ We introduce EvalAgent, a framework that identifies nuanced and diverse criteria πβοΈ. EvalAgent identifies π©βπ«π expert advice on the web that implicitly address the userβs prompt π§΅π
7 months ago
1
22
7
reposted by
Manya Wadhwa
Juan Diego Rodriguez
7 months ago
One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect. π― We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others! π§΅π
2
34
11
reposted by
Manya Wadhwa
Juan Diego Rodriguez
8 months ago
1.) [NAACL 25]
@kanishka.bsky.social
,
@amuuueller.bsky.social
and I delve into how language models do property inheritance using behavioral and mechanistic analyses. Thank you, Kanishka and Aaron. I could not have hoped for better collaborators!
arxiv.org/abs/2410.22590
[π
bsky.app/profile/juan...
add a skeleton here at some point
1
8
2
reposted by
Manya Wadhwa
Victor Wang
8 months ago
LLM judges have become ubiquitous, but valuable signal is often ignored at inference. We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: π§΅ (w/ Michael J.Q. Zhang,
@eunsol.bsky.social
)
7
16
5
reposted by
Manya Wadhwa
Jessy Li
9 months ago
Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short π§΅ about our new paper, led by Jan Trienes: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it? (1/6)
1
15
7
reposted by
Manya Wadhwa
Chau Minh Pham
9 months ago
β οΈCurrent methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification. We present CLIPPER βοΈ, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.
1
21
10
reposted by
Manya Wadhwa
Kanishka Misra π
10 months ago
Excited that this got accepted at naacl/@naaclmeeting.bsky.social 2025! Massive kudos to Juan Diego and Aaron for being the best co-authors and colleagues one could ask for! π
add a skeleton here at some point
1
38
3
reposted by
Manya Wadhwa
Thom Lake
11 months ago
I'm at
#Neurips2024
this week! My work (
arxiv.org/abs/2406.17692
) w/
@gregdnlp.bsky.social
&
@eunsol.bsky.social
exploring the connection between LLM alignment and response pluralism will be at
pluralistic-alignment.github.io
Saturday. Drop by to learn more!
0
28
6
reposted by
Manya Wadhwa
Nathan Lambert
12 months ago
I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor β TΓΌlu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct. Thread.
8
211
52
reposted by
Manya Wadhwa
Jessy Li
12 months ago
We at UT Linguistics are hiring for π₯ 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1. UT has a super vibrant comp ling &
#nlp
community!! Apply here π
apply.interfolio.com/158280
0
12
8
reposted by
Manya Wadhwa
Akari Asai
12 months ago
1/ Introducing α΄α΄α΄Ι΄κ±α΄Κα΄Κα΄Κ: a retrieval-augmented LM to help scientists synthesize knowledge π
@uwnlp.bsky.social
& Ai2 With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo!
openscholar.allen.ai
loading . . .
6
161
47
Excited to share our updated preprint (w/ Jifan Chen,
@jessyjli.bsky.social
,
@gregdnlp.bsky.social
) π
arxiv.org/pdf/2305.147...
We show that LLMs can help understand nuances of annotation: they can convert the expressiveness of natural language explanations to a numerical form. π§΅
almost 2 years ago
1
8
4
you reached the end!!
feeds!
log in