A cross-border community for researchers with openness, equality and inclusion
A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic
ID:18 View protection:Participant Only Updated time:2025-12-28 01:29:31 Views:190 Online

Start Time:2025-12-29 17:00

Duration:15min

Session:[S5] Track 5: Emerging Trends of AI/ML [S5-1] Track 5: Emerging Trends of AI/ML

Abstract
The Arabic language presents unique challenges for natural language processing due to its complex grammar, diverse dialects, and frequent omission of diacritics. This paper proposes a unified token-free model based on ByT5 that simultaneously performs spelling correction (including Jordanian dialect-to-Modern Standard Arabic (MSA) translation) and diacritization. Our approach uses task-specific prefixes (“correct:” for correction and “diacritize:” for combined correction and diacritization) to enable flexible multi-task learning. The model was fine-tuned on the JODA dataset (Jordanian dialect/MSA pairs) and high-quality Tashkeela subsets (Clean-50 and Clean-400), with synthetic errors injection to enhance robustness. Automatic evaluation showed an overall evaluation score of 78.06% on JODA and 92.45% on the combined test set of JODA and Tashkeela. Manual evaluation of 200 JODA samples revealed a character error rate of 4.41% and diacritic error rate of 1.32%, demonstrating practical efficacy in handling Arabic’s complexities.
Keywords
Arabic NLP,Dialect Translation,Jordanian Dialect,Diacritization,Spelling Correction,ByT5,Transformer Models,Multi-Task Learning
Speaker
Rabie Otoum
University of Jordan

Post comments
Verification Code Change Another
All comments
Important Dates
  • Conference date

    12-29

    2025

    -

    12-31

    2025

  • 12-30 2025

    Presentation submission deadline

  • 02-10 2026

    Draft paper submission deadline

  • 02-10 2026

    Registration deadline

Sponsored By

United Societies of Science

Organized By

扎尔卡大学

Contact info
×

USS WeChat Official Account

USSsociety

Please scan the QR code to follow
the wechat official account.