Zhihang Xie
@zhihangxie.bsky.social
π€ 21
π₯ 14
π 7
reposted by
Zhihang Xie
MT Group at FBK
8 days ago
1οΈβ£ "Do What I Say: A Spoken Prompt Dataset for Instruction-Following" π₯
@maikezufle.bsky.social
,
@sarapapi.bsky.social
, Fabian Retkowski, Szymon Mazurek, Marek Kasztelnik, Alexander Waibel,
@luisabentivogli.bsky.social
,
@jan-niehues.bsky.social
πͺπΊ Meetween EU project π
arxiv.org/abs/2603.09881
loading . . .
Do What I Say: A Spoken Prompt Dataset for Instruction-Following
Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios where ...
https://arxiv.org/abs/2603.09881
1
4
5
reposted by
Zhihang Xie
MT Group at FBK
8 days ago
We're excited to share that three papers from our lab have been accepted at
#Interspeech2026
!! πΎ
#SpeechTranslation
#SpeechAI
#NLProc
#FBK
#Interspeech
1
7
5
π New paper: Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps π
arxiv.org/abs/2604.19565
π§© Lightweight inference-time detection for SpeechLLM hallucinations using audio-focused attention features. β¨ Attention classifiers outperform uncertainty baselines on ASR and S2TT.
loading . . .
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on gold-standard outputs that are costly or impractical to obtain. Mor...
https://arxiv.org/abs/2604.19565
about 2 months ago
0
0
1
reposted by
Zhihang Xie
MT Group at FBK
3 months ago
π Weβre very happy to welcome our new postdoc
@lucacorbucci.bsky.social
, who will be working on multimodal LLMs. Looking forward to the exciting research ahead! π
0
10
6
reposted by
Zhihang Xie
MT Group at FBK
5 months ago
Last week at the
@fbk-mt.bsky.social
seminars, we hosted Elizabeth Salesky from Google DeepMind, presenting her work on "Translation and Language Modeling with Pixels"
#NLProc
#tokenization
#MT
0
13
7
reposted by
Zhihang Xie
MT Group at FBK
7 months ago
π JOB ALERT 3: The FBK's MT Unit is hiring! Join us as a Researcher in Responsible & Trustworthy NLP and advance ethical, fair, and transparent language technologies. If you care about building safe and accountable AI systems, you can apply here: π
jobs.fbk.eu/Annunci/Offe...
loading . . .
Jobs | Science and Technology Hub - Trento | A Researcher in Responsible and Trustworthy NLP
https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_A_Researcher_in_Responsible_and_Trustworthy_NLP_241757983.html
0
6
6
reposted by
Zhihang Xie
Beatrice Savoldi
7 months ago
π We're hiring a Researcher in Responsible & Trustworthy NLP! Join our research group
@fbk-mt.bsky.social
at Fondazione Bruno Kessler to work on fairness and trustworthiness in multilingual technologies. π Deadline: Dec 10, 2025 π Apply:
jobs.fbk.eu/Annunci/Offe...
loading . . .
Jobs | Science and Technology Hub - Trento | A Researcher in Responsible and Trustworthy NLP
https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_A_Researcher_in_Responsible_and_Trustworthy_NLP_241757983.htm
0
8
8
π New paper: Speech Discrete Tokens or Continuous Features? π
aclanthology.org/2025.emnlp-m...
π§© A comprehensive benchmark of SpeechLLMs using HuBERT/WavLM with Qwen & LLaMA. β¨ Continuous features outperform overall, while discrete tokens excel at phoneme-level detail.
loading . . .
https://aclanthology.org/2025.emnlp-main.1266.pdf
7 months ago
0
1
1
reposted by
Zhihang Xie
MT Group at FBK
8 months ago
π Exciting news from the
@fbk-mt.bsky.social
group!
@bsavoldi.bsky.social
,
@linaconti.bsky.social
,
@matteo-negri.bsky.social
&
@luisabentivogli.bsky.social
are attending
#EMNLP2025
in Suzhou π¨π³! Come to our sessions & let's connect: π
mt.fbk.eu/fbk-mt-at-em...
Weβre also hiring postdocs!β‘
0
7
3
π SimulMEGA: MoE Routers as advanced policy makers for Simultaneous Speech Translation π§π Mixture-of-Experts routing β smarter decisions on when & how to translate, balancing latency vs quality in real-time speech. Paper link at
arxiv.org/pdf/2509.012...
loading . . .
https://arxiv.org/pdf/2509.01200v1
10 months ago
0
0
1
π AdvST: Adversarial training aligns speech and text distributions without parallel data! Combines adversarial learning + hidden-state swapping to fix length mismatch & boost low-resource speech translation.
ieeexplore.ieee.org/document/108...
loading . . .
Adversarial Speech-Text Pre-Training for Speech Translation
Large-scale pre-training has been shown to benefit speech translation tasks. However, existing multimodal pre-training efforts rely on parallel corpora for semantic alignment, potentially limiting per...
https://ieeexplore.ieee.org/document/10888294
12 months ago
0
1
1
π Boost rare-phrase translation in speech! Uses **bilingual dictionaries** (e.g., "climate change"β"Klimawandel") to dynamically bias outputs. β **+21%** recall in streaming ST β **+85%** in multimodal LLMs π:
arxiv.org/abs/2506.09175
loading . . .
PHRASED: Phrase Dictionary Biasing for Speech Translation
Phrases are essential to understand the core concepts in conversations. However, due to their rare occurrence in training data, correct translation of phrases is challenging in speech translation task...
http://arxiv.org/abs/2506.09175
12 months ago
0
1
1
reposted by
Zhihang Xie
Beatrice Savoldi
about 1 year ago
π Stiamo studiando come l'AI viene usata in Italia e per farlo abbiamo costruito un sondaggio! π
bit.ly/sondaggio_ai...
(Γ¨ anonimo, richiede ~10 minuti, e se partecipi o lo fai girare ci aiuti un saccoπ) Ci interessa anche raggiungere persone che non si occupano e non sono esperte di AI!
loading . . .
Qualtrics Survey | Qualtrics Experience Management
The most powerful, simple and trusted way to gather experience data. Start your journey to experience management and try a free account today.
https://bit.ly/sondaggio_ai_rita
1
16
18
reposted by
Zhihang Xie
MT Group at FBK
about 1 year ago
π’ Come and join our group! We offer a fully funded 3-year PhD position: π Automatic translation with large multimodal models:
iecs.unitn.it/education/ad...
πFull details for application:
iecs.unitn.it/education/ad...
π Deadline May 12, 2025
#NLProc
#FBK
loading . . .
Reserved topic scholarships | Doctoral Program - Information Engineering and Computer Science
https://iecs.unitn.it/education/admission/reserved-topic-scholarships#A4
1
9
9
ReShape Attention bridges speech & text models without extra parameters. Achieves +8.5% BLEU in translation by leveraging acoustic cues, outperforming cascade/E2E methods. Efficient & scalable. Check the paper by Kano et al. (2025) at:
ieeexplore.ieee.org/stamp/stamp....
.
loading . . .
IEEE Xplore Full-Text PDF:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10888650&casa_token=Kruk-pUrXgAAAAAA:8YIAYbDEVjIAsXZRGHynjbWqnsIUPoZO1cdRPqUhiYS4sEjkCMC10kiEV1W32QvLk9ysHgrHqA
about 1 year ago
0
2
1
New research fuels the debate between cascaded and E2E speech translation! The challenge of error propagation is addressed by incorporating multiple ASR candidates, along with HuBERT features to preserve acoustic information lost after ASR. Check the paper by Min et al. at:
arxiv.org/pdf/2502.00377
.
loading . . .
https://arxiv.org/pdf/2502.00377
over 1 year ago
0
3
1
you reached the end!!
feeds!
log in