Krithika Ramesh
@stolenpyjak.bsky.social
📤 455
📥 279
📝 13
(she/her) ¯\_(ツ)_/¯ PhD student @jhuclsp
reposted by
Krithika Ramesh
Kaiser Sun
3 months ago
Multimodal LLMs can read text in images, but why do they often perform worse than when the same text is given as tokens? Our work studies the modality gap of models perceiving text as pixels and shows how to close it. 📄
arxiv.org/abs/2603.09095
🧵👇
#NLProc
#LLM
#ComputerVision
1
3
4
reposted by
Krithika Ramesh
Ivan Habernal
3 months ago
Submission deadline extension: until March 19. Final Call for Papers: PrivateNLP workshop co-located with ACL 2026 See
sites.google.com/view/private...
for OpenReview submission link and details
loading . . .
PrivateNLP@ACL 2026
Overview Privacy-preserving data analysis has become essential in the age of Large Language Models (LLMs) where access to vast amounts of data can provide gains over tuned algorithms. A large proporti...
https://sites.google.com/view/privatenlp2026
0
2
2
📅 Deadlines (AoE): Regular submissions: March 5 Fast-track: March 24 Non-archival: April 7 For questions/queries please contact: privatenlp26-orga[at]lists.ruhr-uni-bochum.de
add a skeleton here at some point
3 months ago
0
0
0
🔐 Announcing the call for papers for the 7th Workshop on Privacy-Preserving Natural Language Processing at ACL 2026 in San Diego! If your research lies at the intersection of privacy and NLP, consider submitting to our workshop! Website:
sites.google.com/view/private...
loading . . .
LinkedIn
This link will take you to a page that’s not on LinkedIn
https://lnkd.in/eHS5BjS9
3 months ago
0
2
2
reposted by
Krithika Ramesh
Ivan Habernal
4 months ago
First call for papers - Seventh Workshop on Privacy in Natural Language Processing, co-located with ACL 2026, San Diego (CA), USA (and on Zoom)
sites.google.com/view/private...
loading . . .
PrivateNLP@ACL 2026
Overview Privacy-preserving data analysis has become essential in the age of Large Language Models (LLMs) where access to vast amounts of data can provide gains over tuned algorithms. A large proporti...
https://sites.google.com/view/privatenlp2026
0
1
2
reposted by
Krithika Ramesh
6 months ago
Frustrated with how most of the world’s low-resource languages have NO evaluation resources? 📢 Check out ChiKhaPo, a massively multilingual lexical comprehension and generation benchmark covering 2700+ languages.
www.arxiv.org/abs/2510.16928
1
2
2
reposted by
Krithika Ramesh
Anjalie Field
7 months ago
Led by
@stolenpyjak.bsky.social
, we built a user-friendly python package for generating and evaluating privacy-preserving synthetic data! See details in our EMNLP Demo paper:
add a skeleton here at some point
0
8
2
Catch
@zihaozhao.bsky.social
at today’s poster session (10:30–12) where he'll be presenting SynthTextEval! Stop by if you're interested in synthetic text for high-stakes domains. Zihao also has another EMNLP paper on private text generation, for people interested in this space!
@jhuclsp.bsky.social
add a skeleton here at some point
7 months ago
0
3
0
🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration! GitHub:
github.com/kr-ramesh/sy...
Paper 📝:
aclanthology.org/2025.emnlp-d...
#EMNLP2025
#EMNLP
#SyntheticData
loading . . .
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
https://github.com/kr-ramesh/synthtexteval
7 months ago
1
13
5
reposted by
Krithika Ramesh
Zihao Zhao
8 months ago
Thank you to
@anjalief.bsky.social
for advising. Hands-on with DP-SGD? Start with our another paper and open-source package (
arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...
)
loading . . .
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains
We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerou...
https://arxiv.org/abs/2507.07229
0
3
1
reposted by
Krithika Ramesh
Zihao Zhao
8 months ago
🔗 Paper & code Paper is accepted to EMNLP 2025 Main arXiv:
arxiv.org/abs/2509.25729
Code:
github.com/zzhao71/Cont...
#SyntheticData
#Privacy
#NLP
#LLM
#Deidentification
#HealthcareAI
#LLM
loading . . .
Controlled Generation for Private Synthetic Text
Text anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privac...
https://arxiv.org/abs/2509.25729
1
2
1
Take a look at this EMNLP 2025 paper by
@zihaozhao.bsky.social
, which proposes novel methods for generating high utility, privacy-preserving synthetic text!
add a skeleton here at some point
8 months ago
0
1
0
‼️‼️
add a skeleton here at some point
11 months ago
0
1
0
reposted by
Krithika Ramesh
Niyati Bafna
11 months ago
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
1
2
1
⁉️
add a skeleton here at some point
12 months ago
0
1
0
reposted by
Krithika Ramesh
Niyati Bafna
12 months ago
We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔 Our work
arxiv.org/abs/2506.00628
(Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
1
6
3
reposted by
Krithika Ramesh
Leshem (Legend) Choshen @EMNLP
about 1 year ago
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
loading . . .
Hplm (Historical Perspectival LM)
Org profile for Historical Perspectival LM on Hugging Face, the AI community building the future.
https://huggingface.co/Hplm
0
18
4
reposted by
Krithika Ramesh
Leshem (Legend) Choshen @EMNLP
about 1 year ago
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
1
12
1
reposted by
Krithika Ramesh
Leshem (Legend) Choshen @EMNLP
about 1 year ago
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
loading . . .
Pretraining Language Models for Diachronic Linguistic Change Discovery
Large language models (LLMs) have shown potential as tools for scientific discovery. This has engendered growing interest in their use in humanistic disciplines, such as historical linguistics and lit...
https://arxiv.org/abs/2504.05523
1
14
3
reposted by
Krithika Ramesh
Leshem (Legend) Choshen @EMNLP
about 1 year ago
How should the humanities leverage LLMs? ▶️Domain-specific pretraining! Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying 💠grammatical change 💠emergent word senses 💠who knows what more… Train on your data with our pipeline or use ours!
#AI
#LLM
🤖📈
2
45
19
reposted by
Krithika Ramesh
Niyati Bafna
over 1 year ago
Dialects lie on continua of (structured) linguistic variation, right? And we can’t collect data for every point on the continuum...🤔 📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
1
13
6
reposted by
Krithika Ramesh
over 1 year ago
Form here:
forms.gle/6DRkaP1CTMYk...
loading . . .
MASC 2025 Call for Locations
Are you able to host MASC this year, sometime in Spring 2025? Responsibilities include: Space for ~150 ish people Managing the review process (really just paper submissions) Organizing the event Choo...
https://forms.gle/6DRkaP1CTMYkPDHG6
0
1
1
reposted by
Krithika Ramesh
over 1 year ago
📢 Want to host MASC 2025? The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities and industry in the Mid-Atlantic. Please submit this very short form if you are interested in hosting! Deadline January 6th.
#MASC2025
1
10
7
reposted by
Krithika Ramesh
Mark Dredze
over 1 year ago
📢 It's PhD admissions season! 🎓 The PhD admissions process is stressful! 😅 Want a behind-the-scenes look at the process? 👀✨ You have questions, we have answers. 📝🤝 Watch my Admissions AMA for @jhuclsp.
https://youtu.be/YlwpIPFNXjo?si=O7n5QwGT5sQdpg7u
0
13
2
reposted by
Krithika Ramesh
Anjalie Field
over 1 year ago
I'm super excited about this program and happy to connect if you're interested in working with me through it!
add a skeleton here at some point
0
25
11
reposted by
Krithika Ramesh
Kate Sanders
over 1 year ago
Putting together a JHU Center for Language and Speech Processing starter pack! Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2
loading . . .
CLSP
Join the conversation
https://go.bsky.app/JtWKca2
2
22
11
you reached the end!!
feeds!
log in