Advanced PII detection for Arabic text with full right-to-left support, Arabic name recognition, and regional identifier formats across MENA countries.
Everything you need for comprehensive data protection
Native right-to-left text processing with correct bidirectional handling for mixed Arabic-English content.
Detect Arabic names including full names, nicknames (كنية), patronymics, and honorifics with cultural context awareness.
Detect national IDs from Saudi Arabia, UAE, Egypt, and other MENA countries with format validation.
Parse addresses in Arabic script including street names, district names, and postal codes across regional formats.
Recognize phone numbers from all Arab countries with proper country code and carrier prefix detection.
Handle Modern Standard Arabic and regional dialects including Gulf, Egyptian, Levantine, and Maghrebi variations.
Arabic is spoken by over 400 million people across 22 countries, each with unique identifier formats and data protection requirements. RedactionAPI provides comprehensive Arabic language support with native RTL processing, regional identifier detection, and cultural awareness for accurate PII protection in Arabic text.
Arabic presents unique challenges for PII detection due to its right-to-left writing system, connected script, diacritical marks, and regional variations. Our Arabic NLP pipeline is specifically designed to handle these characteristics while maintaining high accuracy.
Arabic is written right-to-left, but numbers and embedded Latin text follow left-to-right. Our bidirectional processing ensures correct handling of mixed content.
Arabic letters connect to form words, with letter shapes changing based on position. Our tokenizer correctly segments words despite connected script.
Short vowels and pronunciation marks (tashkeel) may or may not be written. Our system handles text with and without diacritics.
Arabic text may use Western numerals (0-9) or Eastern Arabic-Indic numerals (٠-٩). We detect both in identifiers.
Arabic names follow traditional patterns that differ from Western naming conventions. Our system understands these structures:
// Arabic text with names
Input: "العميل محمد بن عبدالله الصالح يرغب في فتح حساب جديد"
// "Customer Muhammad bin Abdullah Al-Saleh wants to open a new account"
Output: "العميل [الاسم] يرغب في فتح حساب جديد"
// "Customer [NAME] wants to open a new account"
// Detected entity
{
"type": "name",
"value": "محمد بن عبدالله الصالح",
"transliteration": "Muhammad bin Abdullah Al-Saleh",
"components": {
"given_name": "محمد",
"patronymic": "بن عبدالله",
"family_name": "الصالح"
},
"confidence": 0.97
}
Each Arab country has distinct national identifier formats. We detect and validate IDs from all major MENA countries:
| Country | ID Name | Format | Validation |
|---|---|---|---|
| Saudi Arabia | National ID / Iqama | 10 digits (1xxx or 2xxx) | Luhn check digit |
| UAE | Emirates ID | 784-YYYY-NNNNNNN-C | Check digit algorithm |
| Egypt | National ID | 14 digits (birth date encoded) | Date + governorate validation |
| Kuwait | Civil ID | 12 digits | Format validation |
| Qatar | QID | 11 digits | Check digit |
| Bahrain | CPR | 9 digits (YYMMDDNNN) | Date validation |
| Oman | Civil ID | 8 digits | Format validation |
| Jordan | National Number | 10 digits | Date encoding check |
// Saudi National ID validation
function validateSaudiID(id) {
if (!/^[12]\d{9}$/.test(id)) return false;
// First digit indicates nationality
// 1 = Saudi citizen, 2 = Resident (Iqama)
// Luhn algorithm check
let sum = 0;
for (let i = 0; i < 10; i++) {
let digit = parseInt(id[i]);
if (i % 2 === 0) {
digit *= 2;
if (digit > 9) digit -= 9;
}
sum += digit;
}
return sum % 10 === 0;
}
// Example: 1234567890 - Valid Saudi National ID
// Example: 2098765432 - Valid Iqama (resident ID)
We detect phone numbers from all Arab countries with proper country code and format recognition:
Arabic addresses often include landmarks and descriptive directions rather than structured street addresses. Our parser handles both formats:
// Structured address
Input: "شارع الملك فهد، حي العليا، الرياض ١٢٣٤٥"
Parsed:
{
"street": "شارع الملك فهد",
"district": "حي العليا",
"city": "الرياض",
"postal_code": "١٢٣٤٥", // Eastern Arabic numerals
"country": "Saudi Arabia"
}
// Landmark-based address
Input: "بجوار مسجد الراشد، خلف بنك الراجحي، حي النخيل"
Parsed:
{
"landmarks": ["مسجد الراشد", "بنك الراجحي"],
"district": "حي النخيل",
"type": "landmark_based"
}
Documents often contain Arabic and English mixed together. Our bidirectional processing handles this correctly:
// Mixed language input
Input: "Customer محمد الأحمد with email [email protected] called support"
// Both Arabic name and English email detected
Output: "Customer [NAME] with email [EMAIL] called support"
Entities:
[
{"type": "name", "value": "محمد الأحمد", "script": "arabic"},
{"type": "email", "value": "[email protected]", "script": "latin"}
]
Arabic text may use either Western (0-9) or Eastern Arabic-Indic (٠-٩) numerals. We detect identifiers in both:
| Type | Western | Eastern Arabic |
|---|---|---|
| Phone Number | +966 512345678 | +٩٦٦ ٥١٢٣٤٥٦٧٨ |
| National ID | 1234567890 | ١٢٣٤٥٦٧٨٩٠ |
| Postal Code | 12345 | ١٢٣٤٥ |
Arab countries are increasingly adopting data protection legislation. RedactionAPI supports compliance with:
Personal Data Protection Law (2021) governs collection, processing, and transfer of personal data with requirements similar to GDPR.
Federal Decree-Law No. 45 of 2021 establishes comprehensive data protection rules for the UAE.
Law No. 151 of 2020 regulates personal data processing with consent requirements and data subject rights.
Law No. 13 of 2016 establishes data protection principles for Qatar with sector-specific regulations.
POST /v1/redact
{
"text": "العميل محمد الصالح، رقم الهوية ١٢٣٤٥٦٧٨٩٠، هاتف: +٩٦٦٥١٢٣٤٥٦٧٨",
"language": "ar",
"pii_types": ["name", "national_id", "phone"],
"options": {
"numeral_output": "arabic", // Use Eastern Arabic numerals in output
"placeholder_language": "arabic" // [اسم] instead of [NAME]
}
}
// Response
{
"redacted_text": "العميل [اسم]، رقم الهوية [هوية]، هاتف: [هاتف]",
"entities": [
{
"type": "name",
"value": "محمد الصالح",
"confidence": 0.96
},
{
"type": "national_id",
"value": "١٢٣٤٥٦٧٨٩٠",
"country": "SA",
"confidence": 0.99
},
{
"type": "phone",
"value": "+٩٦٦٥١٢٣٤٥٦٧٨",
"country": "SA",
"confidence": 0.98
}
]
}
RedactionAPI provides comprehensive Arabic language support with regional identifier detection across all MENA countries. Protect PII in Arabic text while meeting local data protection requirements.