Organize all modern Arabic dialects and make ties to the ancient ones.

The ultimate goal of Alankaa Project is to build a mega-dictionary for the Arabic Language with respect to all of its dialects from ancient time to present.
This goal can be achieved by realizing the following sub-goals:

1. Digitilize all classical dictionaries.70%More than 44 dictionaries have been digitalized.
2. Digitalize all etymological dictionaries.40%More than 40 dictionaries have been digitalized.
3. Make a conceptual dictionary.10%Still in its early stage.
4. Build a historical dictionary of 3000 years.30%it is a historical, geographical and tribal dictionary.
5. Digitilize specialized dictionaries.1% e.g.: General Science, Chemistry, Mathematics, Physics, ...
6. Build an automatic verbs conjugator.90%It is very advanced now.
7. Build an automatic nouns derivator.90%It is very advanced now.
8. Build a dictionary of synonyms, antonyms and antagonyms.10%
9. Build a dictionary of proverbs and reduplicatives.30%Only Reduplicatives dictionary is partially published.
10. Build a dictionary of all dialects.1%This is the final goal.

At first glance, one may think it is just a matter of annotating the existing dictionaries.

That's indeed an important part of it. But let us illustartes some of the challenges:

  • Challenge 1:
    It's hard to find a direct matching of words like مدارس، مسألة، موبقا ...
    Therefore, an algorithm is needed for extracting their roots: درس، سأل, وبق, ...
  • Challenge 2:
    One may understand that آس is a tree, فرانق is an animal, كوسج is a fish, and زمرد is a gemstone...
    Yet, without pictures one may not know how they really look like.
  • Challenge 3:
    Some Explanations may come without Tashkil which may make it difficult for some readers to understand the meaning.
title 10

Alankaa includes the followings:

SystemStatusDepolymentSourcesDate & Info
Renaissance Dictionaries 6 Range: 1903 - 2017 AD.
Classical Dictionaries 29 Range: 718 - 1790 AD.
Conceptual Dictionary ✔* 4 Range: 941 - 1790 AD.
Synonyms Dictionary ✔* 4 Range: 882 - 1274 AD.
Antonyms Dictionary ✔* 1 Range: 884 - 940 AD.
Historical Dictionary 2000+ Range: 10 - 2017 AD.
Proverbs Dictionary ~40 Range: 718 - 2017 AD.
Dialects Dictionaries ✔* 3 Moroccan, Egyptian & Mixed.
Special Dictionaries 5 Geography, Chemistry, Religion...
Etymological Dictionary 40 See note *
Verbs Conjugator Unlimitted *Conjugates any verb (real or virtual)
Words Derivator Unlimitted *Derives nouns & plurals from any verb.
✔ : Done
⏳: On-going
✔*: Partially integrated
* : You will find more details in their tabs.
✖ : Completely or partially done but not deployed in the website yet.

Alankaa includes all the following features:

FeatureUsed forStatusInfo
Auto-complete Whole system It predicts the rest of a word a user is typing.
Root auto-extraction Whole system It extracts the root of a word a user typed.
Did you mean? All dictionaries It offers suggested terms for queries with misspellings and typographical errors.
Related All dictionaries It offers related terms to the searched word.
Search by word Whole system Self-explanatory.
Search by Sentence (1) All dictionaries It searches the whole corpus.
Search by RegEx All dictionaries User can use some Regular Sxpression to perform the search.
Search by Exact Matching All dictionaries User can force the system to search for the exact matching by using " ".
n-gram viewer Etymology It shows the usage frequency of a given word from ancient time to now.
words variations Etymology e.g. variations of إبراهيم are: أبْرَهَمْ، إبْرَاهَام، إِبراهوم ...
Etymological Types Etymology 4 types: معرب، دخيل، فصيح، مجهول
Etymological Subtypes Etymology 14 subtypes: ممات، مشتق، منحوت، مركب، ...
Loan-Words Languages Etymology e.g: فردوس from Latin, فيروز from Persian, ...
Quotes System (2) Etymology It shows the oldest possible quote for the searched word.
Images Search (3) All dictionaries If needed, it shows the images for Animals, Insects, Fishes, Trees, Plants, Planets, ...
(1): β-version. It relies on TF-IDF & BM25+ Algorithms.
(2): α-version. It relies on TF Scores.
(3): α-version. It relies on Regular Expression.

In order to improve User Experience for better understading the meanings,
and also for facilitating NLP tools to parse the text efficiently,
Alankaa enriches the texts with heavy annotation:

Derivatives Possible derivatives of the entry word are highlithed inside the text.
Exact Matching The exact matching of the entry word is highlighted inside the text.
Rhymed Poetry Classical Poetry segments (أبيات) are clearly identified in the text.
Prose Poetry Prose Poetry segments (الرجز) are also recognized in the text.
Saj' Saj' (السجع) is a form of Rhymed prose in Arabic. It is also identified in the text.
Grammar (1) Masculine, Feminine, Plural, ... are highlithed in different colors.
Pronouciation (1) e.g.: الزَّبَرْجَدُ بوزن السفرجل
Honorifics e.g.: ﷺ، ﷻ، رضي الله عنه، ... are identified in the text.
Proverbs Provebs are identified in a different color in the text.
Reduplications This is called إتباع in Arabic language like حيص بيص, they are also identified in the text.
Roots (2) In the Renaissance dictionaries, roots of the entry word are identified.
Etymological Types The types of loan-words are clearly identified if they appear in the text.
Etymological Sources Some load-words came from Persian, Greek, Latin, Syriac, ... this is also identified in the text.
Quran Verses Quran verses are marked with a different color and encapsulated between {...}
Hadith Hadith also marked with different color and encapsulated.
References Marking (3) References are marked wihin square brackets inside the text. E.g. [أ] [ب] [1] [2] ...
Remarks Marking (3) Remarks are marked within parentheses inside the text. E.g. (1) (2) (ح) ...
Alphabetical Lists If the author gives different explanations in the form of an Alphabetical list, Alankaa will recognize it.
Numerical Lists If the author gives different explanations in the form of a Numerical list, Alankaa will recognize it.
Named Entity Recognition Names of Animals 🐹, Birds 🐤, Fishes 🦈, Trees 🌴, Plants 🌿, Planets 🌔, Food 🥙, Bacteria 🐛, Minerals 💎 and Medicine 💊 are marked with colors and emojis.
(1): Partially done.
(2): The annotation was partially done as Alankaa can extract them automatically.
(3): It is heavily done for the etymological system. Alankaa generates the list of references and remarks at the bottom of the page.
