Jirui Qi
@jiruiqi.bsky.social
๐ค 82
๐ฅ 59
๐ 31
Ph.D Candidate @GroNLP, University of Groningen
#NLProc
https://betswish.github.io
pinned post!
[1/]๐กNew Paper Large reasoning models (LRMs) are strong in English โ but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy ๐Link:
arxiv.org/abs/2505.22888
4 months ago
1
7
7
Our paper on multilingual reasoning is accepted to Findings of
#EMNLP2025
! ๐ (OA: 3/3/3.5/4) We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy. ๐
arxiv.org/abs/2505.22888
See you in Suzhou!
#EMNLP
add a skeleton here at some point
about 1 month ago
0
7
3
reposted by
Jirui Qi
Gabriele Sarti
4 months ago
๐ข New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? ๐ง We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings! ๐งต 1/
1
16
5
reposted by
Jirui Qi
Francesca Padovani
4 months ago
โChild-Directed Language Does Not Consistently Boost Syntax Learning in Language Modelsโ Iโm happy to share that the preprint of my first PhD project is now online! ๐ Paper:
arxiv.org/abs/2505.23689
loading . . .
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
Seminal work by Huebner et al. (2021) showed that language models (LMs) trained on English Child-Directed Language (CDL) can reach similar syntactic abilities as LMs trained on much larger amounts of ...
https://arxiv.org/abs/2505.23689
2
62
19
[1/]๐กNew Paper Large reasoning models (LRMs) are strong in English โ but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy ๐Link:
arxiv.org/abs/2505.22888
4 months ago
1
7
7
โจ New Paper โจ [1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt? ๐ Check it out:
arxiv.org/abs/2504.00597
(w/
@arianna-bis.bsky.social
@Raquel_Fernรกndez)
#NLProc
6 months ago
1
4
6
๐ First post on Blue: Our paper on **efficient prompt engineering** has been accepted by NAACL2025 Main Conference! ๐ Key Point: LLMs tend to generate better responses when the likelihood of the question segment is higher. I.e. p(question) โ Performance Paper available at:
arxiv.org/abs/2411.07773
loading . . .
Likelihood as a Performance Gauge for Retrieval-Augmented Generation
Recent work finds that retrieval-augmented generation with large language models is prone to be influenced by the order of retrieved documents in the context. However, the lack of in-depth analysis li...
https://arxiv.org/abs/2411.07773
8 months ago
1
2
0
you reached the end!!
feeds!
log in