Organize all modern Arabic dialects and make ties to the ancient ones.
The ultimate goal of Alankaa Project is to build a mega-dictionary for the Arabic Language with respect to all of its dialects from ancient time to present.
This goal can be achieved by realizing the following sub-goals:
Sub-goal | Progress | Remarks |
---|---|---|
1. Digitilize all classical dictionaries. | 70% | More than 44 dictionaries have been digitalized. |
2. Digitalize all etymological dictionaries. | 40% | More than 40 dictionaries have been digitalized. |
3. Make a conceptual dictionary. | 10% | Still in its early stage. |
4. Build a historical dictionary of 3000 years. | 30% | it is a historical, geographical and tribal dictionary. |
5. Digitilize specialized dictionaries. | 1% | e.g.: General Science, Chemistry, Mathematics, Physics, ... |
6. Build an automatic verbs conjugator. | 90% | It is very advanced now. |
7. Build an automatic nouns derivator. | 90% | It is very advanced now. |
8. Build a dictionary of synonyms, antonyms and antagonyms. | 10% | |
9. Build a dictionary of proverbs and reduplicatives. | 30% | Only Reduplicatives dictionary is partially published. |
10. Build a dictionary of all dialects. | 1% | This is the final goal. |
At first glance, one may think it is just a matter of annotating the existing dictionaries.
That's indeed an important part of it. But let us illustartes some of the challenges:
Alankaa includes the followings:
System | Status | Depolyment | Sources | Date & Info |
---|---|---|---|---|
Renaissance Dictionaries | ✔ | ✔ | 6 | Range: 1903 - 2017 AD. |
Classical Dictionaries | ✔ | ✔ | 29 | Range: 718 - 1790 AD. |
Conceptual Dictionary | ⏳ | ✔* | 4 | Range: 941 - 1790 AD. |
Synonyms Dictionary | ✔ | ✔* | 4 | Range: 882 - 1274 AD. |
Antonyms Dictionary | ✔ | ✔* | 1 | Range: 884 - 940 AD. |
Historical Dictionary | ⏳ | ✖ | 2000+ | Range: 10 - 2017 AD. |
Proverbs Dictionary | ✔ | ✖ | ~40 | Range: 718 - 2017 AD. |
Dialects Dictionaries | ⏳ | ✔* | 3 | Moroccan, Egyptian & Mixed. |
Special Dictionaries | ⏳ | ✖ | 5 | Geography, Chemistry, Religion... |
Etymological Dictionary | ✔ | ✔ | 40 | See note * |
Verbs Conjugator | ✔ | ✔ | Unlimitted * | Conjugates any verb (real or virtual) |
Words Derivator | ✔ | ✔ | Unlimitted * | Derives nouns & plurals from any verb. |
Alankaa includes all the following features:
Feature | Used for | Status | Info |
---|---|---|---|
Auto-complete | Whole system | ✔ | It predicts the rest of a word a user is typing. |
Root auto-extraction | Whole system | ✔ | It extracts the root of a word a user typed. |
Did you mean? | All dictionaries | ✔ | It offers suggested terms for queries with misspellings and typographical errors. |
Related | All dictionaries | ✔ | It offers related terms to the searched word. |
Search by word | Whole system | ✔ | Self-explanatory. |
Search by Sentence (1) | All dictionaries | ✔ | It searches the whole corpus. |
Search by RegEx | All dictionaries | ✔ | User can use some Regular Sxpression to perform the search. |
Search by Exact Matching | All dictionaries | ✔ | User can force the system to search for the exact matching by using " ". |
n-gram viewer | Etymology | ✔ | It shows the usage frequency of a given word from ancient time to now. |
words variations | Etymology | ✔ | e.g. variations of إبراهيم are: أبْرَهَمْ، إبْرَاهَام، إِبراهوم ... |
Etymological Types | Etymology | ✔ | 4 types: معرب، دخيل، فصيح، مجهول |
Etymological Subtypes | Etymology | ✔ | 14 subtypes: ممات، مشتق، منحوت، مركب، ... |
Loan-Words Languages | Etymology | ✔ | e.g: فردوس from Latin, فيروز from Persian, ... |
Quotes System (2) | Etymology | ✔ | It shows the oldest possible quote for the searched word. |
Images Search (3) | All dictionaries | ✔ | If needed, it shows the images for Animals, Insects, Fishes, Trees, Plants, Planets, ... |
In order to improve User Experience for better understading the meanings,
and also for facilitating NLP tools to parse the text efficiently,
Alankaa enriches the texts with heavy annotation:
Type | Status | |
---|---|---|
Derivatives | ✔ | Possible derivatives of the entry word are highlithed inside the text. |
Exact Matching | ✔ | The exact matching of the entry word is highlighted inside the text. |
Rhymed Poetry | ✔ | Classical Poetry segments (أبيات) are clearly identified in the text. |
Prose Poetry | ✔ | Prose Poetry segments (الرجز) are also recognized in the text. |
Saj' | ✔ | Saj' (السجع) is a form of Rhymed prose in Arabic. It is also identified in the text. |
Grammar (1) | ✔ | Masculine, Feminine, Plural, ... are highlithed in different colors. |
Pronouciation (1) | ✔ | e.g.: الزَّبَرْجَدُ بوزن السفرجل |
Honorifics | ✔ | e.g.: ﷺ، ﷻ، رضي الله عنه، ... are identified in the text. |
Proverbs | ✔ | Provebs are identified in a different color in the text. |
Reduplications | ✔ | This is called إتباع in Arabic language like حيص بيص, they are also identified in the text. |
Roots (2) | ✔ | In the Renaissance dictionaries, roots of the entry word are identified. |
Etymological Types | ✔ | The types of loan-words are clearly identified if they appear in the text. |
Etymological Sources | ✔ | Some load-words came from Persian, Greek, Latin, Syriac, ... this is also identified in the text. |
Quran Verses | ✔ | Quran verses are marked with a different color and encapsulated between {...} |
Hadith | ✔ | Hadith also marked with different color and encapsulated. |
References Marking (3) | ✔ | References are marked wihin square brackets inside the text. E.g. [أ] [ب] [1] [2] ... |
Remarks Marking (3) | ✔ | Remarks are marked within parentheses inside the text. E.g. (1) (2) (ح) ... |
Alphabetical Lists | ✔ | If the author gives different explanations in the form of an Alphabetical list, Alankaa will recognize it. |
Numerical Lists | ✔ | If the author gives different explanations in the form of a Numerical list, Alankaa will recognize it. |
Named Entity Recognition | ✔ | Names of Animals 🐹, Birds 🐤, Fishes 🦈, Trees 🌴, Plants 🌿, Planets 🌔, Food 🥙, Bacteria 🐛, Minerals 💎 and Medicine 💊 are marked with colors and emojis. |
Try it online, click on الخريطة