Building Open Source NLP Libraries and Tools for the Arabic Language
This special two-month Omdena Challenge is a first-of-its-kind community-driven project with 50 AI changemakers to build open-source Arabic NLP libraries. The solutions help to overcome present adoption challenges and increase accessibility of Arabic NLP applications.
The Problem
اللغة العربية تعد من اكثر اللغات انتشارا و استخداما و تتميز لغة الضاد بثراء رصيدها من الكلمات والصيغ ، وهي لغة متميزة من الناحية الصوتية ، فقد اشتملت على جميع الأصوات التي اشتملت عليها اللغات السامية الأخرى . كما تتميز بالمرونة حيث تستوعب جميع الألفاظ المشتقة والمترادفة وتضع لكل مقام مقال لها
ادركنا اهمية اللغة العربية و مكانتها بين شعوب الشرق الاوسط و العالم, و نسعى فى ادراج اللغة العربية ضمن اللغات التى يتيسر استخدامها فى تطبيقات الذكاء الاصطناعى و معالجة اللغات الطبيعية للبشر
- Arabic is the 5th most spoken language in the world and the 1st language of the Arab world countries, making it extremely important worldwide.
- Arabic is grammatically complex and has free order properties, which all pose significant challenges in Arabic NLP applications.
- There are 3 types that characterize Arabic, Classical Arabic, Modern Standard Arabic & Dialect Arabic.
- Tools built by big tech and accessible to the majority of the world are limited to translating only a few of the most popular languages.
The project outcomes
The envisioned deliverables can be broken down into two main areas:
- Build open-source Arabic NLP libraries for sentiment analysis, morphological modeling, dialect identification, and named entity recognition
- Build 5:8 core functions to support Arabic NLP (lemmatization, stop words, tokenizing text, word embedding, part of speech tagging.. etc.) like NLTK but for Modern Standard Arabic.
A Community-Driven Initiative: Omdena Country Chapter Leads
This project is facilitated by Omdena´s Country Chapter Leads in the following Arabic countries. We are welcoming partnerships to spread this initiative to as many countries and communities as possible.