Organize all modern Arabic dialects and make ties to the ancient ones.
The ultimate goal of Alankaa Project is to build a mega-dictionary for the Arabic Language with respect to all of its dialects from ancient time to present.
This goal can be achieved by realizing the following sub-goals:
| Sub-goal | Progress | Remarks |
|---|---|---|
| 1. Digitilize all classical dictionaries. | 70% | More than 44 dictionaries have been digitalized. |
| 2. Digitalize all etymological dictionaries. | 40% | More than 40 dictionaries have been digitalized. |
| 3. Make a conceptual dictionary. | 10% | Still in its early stage. |
| 4. Build a historical dictionary of 3000 years. | 30% | it is a historical, geographical and tribal dictionary. |
| 5. Digitilize specialized dictionaries. | 1% | e.g.: General Science, Chemistry, Mathematics, Physics, ... |
| 6. Build an automatic verbs conjugator. | 90% | It is very advanced now. |
| 7. Build an automatic nouns derivator. | 90% | It is very advanced now. |
| 8. Build a dictionary of synonyms, antonyms and antagonyms. | 10% | |
| 9. Build a dictionary of proverbs and reduplicatives. | 30% | Only Reduplicatives dictionary is partially published. |
| 10. Build a dictionary of all dialects. | 1% | This is the final goal. |
At first glance, one may think it is just a matter of annotating the existing dictionaries.
That's indeed an important part of it. But let us illustartes some of the challenges:
Alankaa includes the followings:
| System | Status | Depolyment | Sources | Date & Info |
|---|---|---|---|---|
| Renaissance Dictionaries | ✔ | ✔ | 6 | Range: 1903 - 2017 AD. |
| Classical Dictionaries | ✔ | ✔ | 29 | Range: 718 - 1790 AD. |
| Conceptual Dictionary | ⏳ | ✔* | 4 | Range: 941 - 1790 AD. |
| Synonyms Dictionary | ✔ | ✔* | 4 | Range: 882 - 1274 AD. |
| Antonyms Dictionary | ✔ | ✔* | 1 | Range: 884 - 940 AD. |
| Historical Dictionary | ⏳ | ✖ | 2000+ | Range: 10 - 2017 AD. |
| Proverbs Dictionary | ✔ | ✖ | ~40 | Range: 718 - 2017 AD. |
| Dialects Dictionaries | ⏳ | ✔* | 3 | Moroccan, Egyptian & Mixed. |
| Special Dictionaries | ⏳ | ✖ | 5 | Geography, Chemistry, Religion... |
| Etymological Dictionary | ✔ | ✔ | 40 | See note * |
| Verbs Conjugator | ✔ | ✔ | Unlimitted * | Conjugates any verb (real or virtual) |
| Words Derivator | ✔ | ✔ | Unlimitted * | Derives nouns & plurals from any verb. |
Alankaa includes all the following features:
| Feature | Used for | Status | Info |
|---|---|---|---|
| Auto-complete | Whole system | ✔ | It predicts the rest of a word a user is typing. |
| Root auto-extraction | Whole system | ✔ | It extracts the root of a word a user typed. |
| Did you mean? | All dictionaries | ✔ | It offers suggested terms for queries with misspellings and typographical errors. |
| Related | All dictionaries | ✔ | It offers related terms to the searched word. |
| Search by word | Whole system | ✔ | Self-explanatory. |
| Search by Sentence (1) | All dictionaries | ✔ | It searches the whole corpus. |
| Search by RegEx | All dictionaries | ✔ | User can use some Regular Sxpression to perform the search. |
| Search by Exact Matching | All dictionaries | ✔ | User can force the system to search for the exact matching by using " ". |
| n-gram viewer | Etymology | ✔ | It shows the usage frequency of a given word from ancient time to now. |
| words variations | Etymology | ✔ | e.g. variations of إبراهيم are: أبْرَهَمْ، إبْرَاهَام، إِبراهوم ... |
| Etymological Types | Etymology | ✔ | 4 types: معرب، دخيل، فصيح، مجهول |
| Etymological Subtypes | Etymology | ✔ | 14 subtypes: ممات، مشتق، منحوت، مركب، ... |
| Loan-Words Languages | Etymology | ✔ | e.g: فردوس from Latin, فيروز from Persian, ... |
| Quotes System (2) | Etymology | ✔ | It shows the oldest possible quote for the searched word. |
| Images Search (3) | All dictionaries | ✔ | If needed, it shows the images for Animals, Insects, Fishes, Trees, Plants, Planets, ... |
In order to improve User Experience for better understading the meanings,
and also for facilitating NLP tools to parse the text efficiently,
Alankaa enriches the texts with heavy annotation:
| Type | Status | |
|---|---|---|
| Derivatives | ✔ | Possible derivatives of the entry word are highlithed inside the text. |
| Exact Matching | ✔ | The exact matching of the entry word is highlighted inside the text. |
| Rhymed Poetry | ✔ | Classical Poetry segments (أبيات) are clearly identified in the text. |
| Prose Poetry | ✔ | Prose Poetry segments (الرجز) are also recognized in the text. |
| Saj' | ✔ | Saj' (السجع) is a form of Rhymed prose in Arabic. It is also identified in the text. |
| Grammar (1) | ✔ | Masculine, Feminine, Plural, ... are highlithed in different colors. |
| Pronouciation (1) | ✔ | e.g.: الزَّبَرْجَدُ بوزن السفرجل |
| Honorifics | ✔ | e.g.: ﷺ، ﷻ، رضي الله عنه، ... are identified in the text. |
| Proverbs | ✔ | Provebs are identified in a different color in the text. |
| Reduplications | ✔ | This is called إتباع in Arabic language like حيص بيص, they are also identified in the text. |
| Roots (2) | ✔ | In the Renaissance dictionaries, roots of the entry word are identified. |
| Etymological Types | ✔ | The types of loan-words are clearly identified if they appear in the text. |
| Etymological Sources | ✔ | Some load-words came from Persian, Greek, Latin, Syriac, ... this is also identified in the text. |
| Quran Verses | ✔ | Quran verses are marked with a different color and encapsulated between {...} |
| Hadith | ✔ | Hadith also marked with different color and encapsulated. |
| References Marking (3) | ✔ | References are marked wihin square brackets inside the text. E.g. [أ] [ب] [1] [2] ... |
| Remarks Marking (3) | ✔ | Remarks are marked within parentheses inside the text. E.g. (1) (2) (ح) ... |
| Alphabetical Lists | ✔ | If the author gives different explanations in the form of an Alphabetical list, Alankaa will recognize it. |
| Numerical Lists | ✔ | If the author gives different explanations in the form of a Numerical list, Alankaa will recognize it. |
| Named Entity Recognition | ✔ | Names of Animals 🐹, Birds 🐤, Fishes 🦈, Trees 🌴, Plants 🌿, Planets 🌔, Food 🥙, Bacteria 🐛, Minerals 💎 and Medicine 💊 are marked with colors and emojis. |
Try it online, click on الخريطة