A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic-2025 Asian Conference on Communication and Networks-UNITED SOCIETIES OF SCIENCE

Presentation details

A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic

ID：18 View protection：Participant Only Updated time：2025-12-28 01:29:31 Views：191 Online

Start Time：2025-12-29 17:00

Duration：15min

Session：[S5] Track 5: Emerging Trends of AI/ML [S5-1] Track 5: Emerging Trends of AI/ML

Slide

Abstract

The Arabic language presents unique challenges for natural language processing due to its complex grammar, diverse dialects, and frequent omission of diacritics. This paper proposes a unified token-free model based on ByT5 that simultaneously performs spelling correction (including Jordanian dialect-to-Modern Standard Arabic (MSA) translation) and diacritization. Our approach uses task-specific prefixes (“correct:” for correction and “diacritize:” for combined correction and diacritization) to enable flexible multi-task learning. The model was fine-tuned on the JODA dataset (Jordanian dialect/MSA pairs) and high-quality Tashkeela subsets (Clean-50 and Clean-400), with synthetic errors injection to enhance robustness. Automatic evaluation showed an overall evaluation score of 78.06% on JODA and 92.45% on the combined test set of JODA and Tashkeela. Manual evaluation of 200 JODA samples revealed a character error rate of 4.41% and diacritic error rate of 1.32%, demonstrating practical efficacy in handling Arabic’s complexities.

Keywords

Arabic NLP,Dialect Translation,Jordanian Dialect,Diacritization,Spelling Correction,ByT5,Transformer Models,Multi-Task Learning

Speaker

Rabie Otoum

University of Jordan

Post comments

All comments

Important Dates

Conference date

12-29

2025

-

12-31

2025
12-30 2025

Presentation submission deadline
02-10 2026

Draft paper submission deadline
02-10 2026

Registration deadline

Organized By

扎尔卡大学

Contact info

Miss AsianComNet
[email protected]

Previous Conferences

2024 Asian Conference on Communication and Networks

2025 Asian Conference on Communication and Networks (ASIANComNet 2025)

Presentation details

A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic

Abstract

Keywords

Speaker

Rabie Otoum

Post comments

All comments

Important Dates

Conference date

Sponsored By

Organized By

Contact info

Previous Conferences

2025 Asian Conference on Communication and Networks (ASIANComNet 2025)

Presentation details

A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic

Abstract

Keywords

Speaker

Rabie Otoum

Post comments

All comments

Important Dates

Conference date

Sponsored By

Organized By

Contact info

Previous Conferences

WeChat Share

USS WeChat Official Account