Dirk Hovy
@dirkhovy.bsky.social
๐ค 560
๐ฅ 318
๐ 38
Professor
@milanlp.bsky.social
for
#NLProc
, compsocsci,
#ML
Also at
http://dirkhovy.com/
Happy to have contributed to this
add a skeleton here at some point
14 days ago
0
2
0
reposted by
Dirk Hovy
MilaNLP Lab
15 days ago
#MemoryModay
#NLProc
Countering Hateful and Offensive Speech Online - Open Challenges" by Plaza-Del-Arco, @debora_nozza, Guerini, Sorensen, Zampieri, 2024 is a tutorial on the challenges and solutions for detecting and mitigating hate speech.
loading . . .
Countering Hateful and Offensive Speech Online - Open Challenges
Flor Miriam Plaza-del-Arco, Debora Nozza, Marco Guerini, Jeffrey Sorensen, Marcos Zampieri. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts.โฆ
https://aclanthology.org/2024.emnlp-tutorials.2/
0
4
2
reposted by
Dirk Hovy
MilaNLP Lab
22 days ago
#MemoryModay
#NLProc
Uma, A. N. et al. examine AI model training in 'Learning from Disagreement: A Survey'. Disagreement-handling methods' performance is shaped by evaluation methods & dataset traits.
loading . . .
https://jair.org/index.php/jair/article/view/12752
0
4
2
reposted by
Dirk Hovy
MilaNLP Lab
19 days ago
#TBT
#NLProc
#MachineLearning
#SafetyFirst
'Safety-Tuned LLaMAs: Improving LLMs Safety' by Bianchi et al. explores training LLMs for safe refusals, warns of over-tuning.
loading . . .
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large...
Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most...
https://arxiv.org/abs/2309.07875
0
4
2
Come work with
@deboranozza.bsky.social
, me, and the lab in Milan!
add a skeleton here at some point
18 days ago
0
5
3
reposted by
Dirk Hovy
Women in AI Research - WiAIR
22 days ago
We don't actually trust AI. We trust the companies behind it. As Maria Antoniak notes, every "private" chat flows through corporate systems with long histories of data misuse. If we care about AI ethics, we need to name power, not anthropomorphize models.
loading . . .
1
57
19
reposted by
Dirk Hovy
jake hofman
22 days ago
We're hiring interns in the Computational Social Science group at Microsoft Research NYC! If you're interested in designing AIโbased systems and understanding their impact at both individual and societal scales, apply here by Jan 9, 2026:
apply.careers.microsoft.com/careers/job/...
loading . . .
Research Intern - Computational Social Science | Microsoft Careers
Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Researc...
https://apply.careers.microsoft.com/careers/job/1970393556639564
0
20
18
After I shared โHow to professorโ last year, some people asked for a similar post on writing. Now I finally got around to typing up our lab's writing workshop slides. It covers basic advice for research papers and grant applications. Curious? Read it here:
dirkhovy.com/post/2025_11...
loading . . .
How to Write Gooder | Dirk Hovy
After publishing โ How to professorโ, several people said they found it helpful, and asked whether I had a similar post on writing. Luckily, we have held an annual writing workshop in the lab for the last few years, so there already was a presentation.
https://dirkhovy.com/post/2025_11_25/
25 days ago
1
12
3
reposted by
Dirk Hovy
MilaNLP Lab
26 days ago
#TBT
#NLProc
'Respectful or Toxic?' by Plaza-del-Arco, @debora &
@dirkhovy.bsky.social
(2023) explores zero-shot learning for multilingual hate speech detection. Highlights prompt & model choice for accuracy.
#AI
#LanguageModels
#HateSpeechDetection
loading . . .
Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech
Flor Miriam Plaza-del-arco, Debora Nozza, Dirk Hovy. The 7th Workshop on Online Abuse and Harms (WOAH). 2023.
https://aclanthology.org/2023.woah-1.6
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
29 days ago
#MemoryModay
#NLProc
'Leveraging Social Interactions to Detect Misinformation on Social Media' by Fornaciari et al. (2023) uses combined text and network analysis to spot unreliable threads.
loading . . .
https://arxiv.org/pdf/2304.02983
0
3
2
reposted by
Dirk Hovy
Manoel Horta Ribeiro
28 days ago
The Center for Information Technology Policy at Princeton invites applications for a Postdoctoral Fellow to work with Andy Guess (Politics/SPIA), Brandon Stewart (Sociology), and me (CS).
puwebp.princeton.edu/AcadHire/app...
Please apply before Sunday, the 13th of December!
0
15
10
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
#MemoryModay
#NLProc
'Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models' by
@paul-rottger.bsky.social
et al. (2022). A suite of tests for 10 languages.
loading . . .
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Rรถttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.
https://aclanthology.org/2022.woah-1.15
0
3
2
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
#TBT
#NLProc
'Compromesso! Italian Many-Shot Jailbreaks Undermine LLM Safety' by Pernisi,
@dirkhovy.bsky.social
,
@paul-rottger.bsky.social
(2024). Paper highlights LLM vulnerability through Italian demos, more demos = more attack chances.
loading . . .
Compromesso! Italian Many-Shot Jailbreaks undermine the safety of Large Language Models
Fabio Pernisi, Dirk Hovy, Paul Rรถttger. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). 2024.
https://aclanthology.org/2024.acl-srw.29/
0
3
2
reposted by
Dirk Hovy
Greg Durrett
about 1 month ago
๐ข Postdoc position ๐ข Iโm recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1. (Different from NYU Faculty Fellows, which are also great but less connected to my lab.) Link in ๐งต
2
21
12
reposted by
Dirk Hovy
Tanise Ceron
about 1 month ago
I will be
@euripsconf.bsky.social
this week to present our paper as non-archival at the PAIG workshop (Beyong Regulation: Private Governance & Oversight Mechanisms for AI). Very much looking forward to the discussions! If you are at
#EurIPS
and want to chat about LLM's training data. Reach out!
add a skeleton here at some point
0
9
4
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
Another exhausting day in the labโฆ conducting very rigorous panettone analysis. Pandoro was evaluated too, because we believe in fair experimental design.
0
23
7
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
#TBT
#NLProc
'@donyarn.bsky.social &
@dirkhovy.bsky.social
's 2024 paper, 'Conversations as a Source for Teaching Scientific Concepts' turns video dialogues into effective teaching tools.'
loading . . .
https://arxiv.org/pdf/2404.10475
0
3
2
reposted by
Dirk Hovy
EACL 2026
about 1 month ago
๐ขSubmissions are open for EACL 2026 SRW **pre-submission mentorship**! Get valuable feedback on your research paper draft from experienced mentors before the deadline. Don't miss this chance to refine your work! ๐ Submit here:
openreview.net/group?id=eac...
@eaclmeeting
#NLProc
loading . . .
EACL 2026 SRW Mentorship
Welcome to the OpenReview homepage for EACL 2026 SRW Mentorship
https://openreview.net/group?id=eacl.org/EACL/2026/SRW_Mentorship
0
3
1
reposted by
Dirk Hovy
brendan oโconnor
about 1 month ago
Here at UMass Amherst CICS, weโre searching for TT faculty in NLP โ see the link from
www.cics.umass.edu/about/employ...
Iโm happy to answer questions of course, too!
loading . . .
Faculty Positions
Open tenure-track and teaching faculty positions in computer science and informatics at the Manning College of Information and Computer Sciences
https://www.cics.umass.edu/about/employment/faculty-positions
2
18
14
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
#MemoryModay
#NLProc
'Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers,' by Nguyen &
@dirkhovy.bsky.social
decodes speaker reviews for user preferences using topic models. Domain knowledge needed for market analysis.
loading . . .
Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers
Hanh Nguyen, Dirk Hovy. Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019). 2019.
https://aclanthology.org/D19-5510
0
3
2
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
โTeacher Demonstrations in a BabyLMโs Zone of Proximal Development for Contingent Multi-Turn Interactionโ selected for an Outstanding Paper Award at the BabyLM Challenge & Workshop!
0
7
3
reposted by
Dirk Hovy
MilaNLP Lab
about 1 month ago
What an inspiring week at
#EMNLP2025
in Suzhou๐จ๐ณ! Huge thanks to the organizers and everyone who stopped by our poster/talk!
1
17
5
reposted by
Dirk Hovy
Dallas Card
about 2 months ago
See also
@manoelhortaribeiro.bsky.social
's post on this same topic:
doomscrollingbabel.manoel.xyz/p/labeling-d...
loading . . .
Labeling Data with Language Models: Trick or Treat?
Large language models are now labeling data for us.
https://doomscrollingbabel.manoel.xyz/p/labeling-data-with-language-models
0
2
1
reposted by
Dirk Hovy
Dallas Card
about 2 months ago
Trying an experiment in good old-fashioned blogging about papers:
dallascard.github.io/granular-mat...
loading . . .
Language Model Hacking - Granular Material
https://dallascard.github.io/granular-material/post/language-model-hacking/
3
29
9
reposted by
Dirk Hovy
MilaNLP Lab
about 2 months ago
#TBT
#NLProc
' Attanasio et al. study asks 'Is It Worth the (Environmental) Cost?' analyzing continuous training for language models. Balances benefits, environmental impacts, for responsible use.
#Sustainability
'
loading . . .
https://arxiv.org/pdf/2210.07365
0
3
3
reposted by
Dirk Hovy
MilaNLP Lab
about 2 months ago
#MemoryModay
#NLProc
' 'State of Profanity Obfuscation in NLP Scientific Publications' probes bias in non-English papers.
@deboranozza.bsky.social
&
@dirkhovy.bsky.social
(2023) propose 'PrOf' to aid authors & improve access.
loading . . .
The State of Profanity Obfuscation in Natural Language Processing Scientific Publications
Debora Nozza, Dirk Hovy. Findings of the Association for Computational Linguistics: ACL 2023. 2023.
https://aclanthology.org/2023.findings-acl.240
0
4
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
#TBT
#NLProc
Explore 'Wisdom of Instruction-Tuned LLM Crowds' by Plaza et al. LLM labels outperform single models in tasks & languages. But few-shot can't top zero-shot. Supervised models rule.
loading . . .
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy. Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024. 2024.
https://aclanthology.org/2024.nlperspectives-1.2
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
#MemoryModay
#NLProc
'Universal Joy: A Data Set and Results for Classifying Emotions Across Languages' by Lamprinidis et al. (2021) explores how AI research affects our planet.
loading . . .
Universal Joy A Data Set and Results for Classifying Emotions Across Languages
Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy. Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2021.
https://aclanthology.org/2021.wassa-1.7
0
6
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
#TBT
#NLProc
"Explaining Speech Classification Models" by Pastor et al. (2024) makes speech classification more transparent! ๐ Their research reveals which words matter most and how tone and background noise impact decisions.
loading . . .
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long...
https://aclanthology.org/2024.eacl-long.136
0
4
2
reposted by
Dirk Hovy
MilaNLP Lab
about 2 months ago
#MemoryModay
#NLProc
'Measuring Harmful Representations in Scandinavian Language Models' uncovers gender bias, challenging Scandinavia's equity image.
loading . . .
Measuring Harmful Representations in Scandinavian Language Models
Samia Touileb, Debora Nozza. Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS). 2022.
https://aclanthology.org/2022.nlpcss-1.13
0
4
2
reposted by
Dirk Hovy
MilaNLP Lab
about 2 months ago
#TBT
#NLProc
Hessenthaler et al.'s 2022 work delves into AI's link with fairness & energy reduction in English NLP models, challenging bias reduction theories.
#AI
#sustainability
loading . . .
Bridging Fairness and Environmental Sustainability in Natural Language Processing
Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
https://aclanthology.org/2022.emnlp-main.533
0
5
2
reposted by
Dirk Hovy
EMNLP
about 2 months ago
๐ Congratulations to all
#EMNLP2025
award winners ๐ Starting with the โจBest Paper award โจ: "Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index" by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...
1/n
1
36
5
reposted by
Dirk Hovy
Gavin Abercrombie
about 2 months ago
Maybe it is time to report *intra*-annotator agreement?
aclanthology.org/2025.nlpersp...
loading . . .
Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement
Gavin Abercrombie, Tanvi Dinkar, Amanda Cercas Curry, Verena Rieser, Dirk Hovy. Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP. 2025.
https://aclanthology.org/2025.nlperspectives-1.6/
1
4
2
reposted by
Dirk Hovy
Gavin Abercrombie
about 2 months ago
Last week at
@nlperspectives.bsky.social
I presented work showing that annotators only provide the same label on ~75% of items across four NLP labelling tasks following a two week gap
1
3
2
reposted by
Dirk Hovy
Gavin Abercrombie
2 months ago
You missed one: G. Abercrombie, T. Dinkar, A. Cercas Curry, V. Rieser &
@dirkhovy.bsky.social
Consistency is Key: Disentangling label variation in NLP with Intra-Annotator Agreement.
@nlperspectives.bsky.social
0
2
1
Excited to head to Suzhou for the 30th edition of
#EMNLP2025
! ๐ Had the great honor to serve as general chair this year. Looking forward to catching up with everyone and seeing some amazing
#NLP
research! ๐ค๐
2 months ago
0
28
1
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 5 โ Main Conference Posters Personalization up to a Point ๐ง In the context of content moderation, we show that fully personalized models can perpetuate hate speech, and propose a policy-based method to impose legal boundaries. ๐ Hall C | 11:00โ12:30
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 5 โ Main Conference Posters ๐ Biased Tales A dataset of 5k short LLM bedtime stories generated across sociocultural axes with an evaluation taxonomy for character-centric attributes and context-centric attributes. ๐ Hall C | 11:00โ12:30
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 5 - Demo Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification ๐งฉ Co-DETECT โ an iterative, human-LLM collaboration framework for surfacing edge cases and refining annotation codebooks in text classification. ๐ Demo Session 2 โ Hall C3 | 14:30โ16:00
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 6 โ Findings Posters The โrโ in โwomanโ stands for rights. ๐ฌ We propose a taxonomy of social dynamics in implicit misogyny (EN,IT), auditing 9 LLMs โ and they consistently fail. The more social knowledge a message requires, the worse they perform. ๐ Hall C | 12:30โ13:30
0
3
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 7 โ Main Conference Posters Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance ๐ง Discussing different applications for LLM persona prompting, and how to measure their success. ๐ Hall C | 10:30โ12:00
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 7 โ Main Conference Posters TrojanStego: Your Language Model Can Secretly Be a Steganographic Privacy-Leaking Agent ๐ LLMs can be fine-tuned to leak secrets via token-based steganography! ๐ Hall C | 10:30โ12:00
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 8 โ WiNLP Workshops No for Some, Yes for Others ๐ค We investigate how sociodemographic persona prompts affect false refusal behaviors in LLMs. Model and task type are the dominant factors driving these refusals.
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 8 โ NLPerspectives Workshops Balancing Quality and Variation ๐งฎ For datasets to represent diverse opinions, they must preserve variation while filtering out spam. We evaluate annotator filtering heuristics and show how they often remove genuine variation.
0
3
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 8 โ BabyLM Workshop Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction ๐ถ ContingentChat, a TeacherโStudent framework that benchmarks and improves multi-turn contingency in a BabyLM trained on 100M words.
0
3
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 8 โ STARSEM Workshop Generalizability of Media Frames: Corpus Creation and Analysis Across Countries ๐ฐ We investigate how well media frames generalize across different media landscapes. The 15 MFC frames remain broadly applicable, with minor revisions of the guidelines.
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
๐๏ธ Nov 6 โ Oral Presentation (TACL) IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance โ๏ธ A foundation for measuring LLM political bias in realistic user conversations. ๐ A303 | 10:30โ12:00
0
2
2
reposted by
Dirk Hovy
MilaNLP Lab
2 months ago
Proud to present our
#EMNLP2025
papers! Catch our team across Main, Findings, Workshops & Demos ๐
12
11
6
reposted by
Dirk Hovy
Paul Rรถttger @ EMNLP
2 months ago
Thereโs plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases โ which is where bias actually matters. IssueBench, our attempt to fix this, is accepted at TACL, and I will be at
#EMNLP2025
next week to talk about it! New results ๐งต
add a skeleton here at some point
1
32
11
reposted by
Dirk Hovy
Matthias Orlikowski
9 months ago
Can LLMs learn to simulate individuals' judgments based on their demographics? Not quite! In our new paper, we found that LLMs do not learn information about demographics, but instead learn individual annotators' patterns based on unique combinations of attributes! ๐งต
1
13
4
Load more
feeds!
log in