andrea wang (@andreawwenyi.bsky.social)

Scrambled text: fine-tuning language models for OCR error correction using synthetic data - International Journal on Document Analysis and Recognition (IJDAR) OCR errors are common in digitised historical archives significantly affecting their usability and value. Generative Language Models (LMs) have shown potential for correcting these errors using the co... https://link.springer.com/article/10.1007/s10032-025-00522-0