Detect and redact PII from Portuguese text with support for Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT). Native language processing for names, addresses, and regional identifiers.
Native Portuguese NLP
Detect names with compound surnames common in Portuguese-speaking cultures.
Support Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT) differences.
Detect CPF, NIF, BI, and other country-specific identifiers.
Parse Brazilian and Portuguese address structures and postal codes.
Recognize phone formats across Portuguese-speaking countries.
Support LGPD (Brazil) and GDPR (Portugal) requirements.
Simple integration, powerful results
Send your documents, text, or files through our secure API endpoint or web interface.
Our AI analyzes content to identify all sensitive information types with 99.7% accuracy.
Sensitive data is automatically redacted based on your configured compliance rules.
Receive your redacted content with full audit trail and compliance documentation.
Get started with just a few lines of code
import requests
api_key = "your_api_key"
url = "https://api.redactionapi.net/v1/redact"
data = {
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}
response = requests.post(url,
headers={"Authorization": f"Bearer {api_key}"},
json=data
)
print(response.json())
# Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
const axios = require('axios');
const apiKey = 'your_api_key';
const url = 'https://api.redactionapi.net/v1/redact';
const data = {
text: "John Smith's SSN is 123-45-6789",
redaction_types: ["ssn", "person_name"],
output_format: "redacted"
};
axios.post(url, data, {
headers: { 'Authorization': `Bearer ${apiKey}` }
})
.then(response => {
console.log(response.data);
// Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
});
curl -X POST https://api.redactionapi.net/v1/redact \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}'
# Response:
# {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
Portuguese is spoken by over 260 million people across four continents, with Brazil representing the largest Portuguese-speaking population by far, followed by Portugal, Mozambique, Angola, and other lusophone nations. While these countries share a common language, they differ significantly in vocabulary, spelling conventions, naming practices, identifier formats, and address structures. Effective Portuguese PII detection must account for these regional variations while maintaining consistent protection across all variants.
Our Portuguese language processing handles both major variants—Brazilian Portuguese (pt-BR) and European Portuguese (pt-PT)—with dedicated models for each. We recognize regional naming conventions, validate country-specific identifiers with appropriate algorithms, and parse address formats used in each country. This enables comprehensive PII protection for Portuguese-language documents regardless of their origin.
Key differences between Brazilian and European Portuguese:
Vocabulary Differences:
PII-related terms:
Identity document: carteira de identidade (BR) vs bilhete de identidade (PT)
Cell phone: celular (BR) vs telemóvel (PT)
Address: endereço (both, but formatting differs)
Personal data: dados pessoais (both)
Passport: passaporte (both)
Driver's license: carteira de motorista (BR) vs carta de condução (PT)
// Detection handles both vocabularies
"Telemóvel: 912 345 678" → [PHONE] (Portugal)
"Celular: (11) 99999-8888" → [PHONE] (Brazil)
Spelling Differences:
// Post-1990 agreement variations
ação (BR) vs acção (PT, older)
anônimo (BR) vs anónimo (PT)
fato (BR) vs facto (PT)
// Names may follow either convention
Both spellings recognized for matching
Portuguese naming conventions across regions:
Brazilian Names:
Structure: Given name(s) + Maternal surname + Paternal surname
Example: Maria Eduarda Silva Santos
Components:
- Given names: Often religious or Italian/Portuguese origin
- Multiple given names common
- Surnames: Usually maternal then paternal
- Sometimes only one surname
Common patterns:
João Pedro Oliveira Costa
Ana Carolina Ferreira Lima
José Carlos da Silva
Maria das Graças Santos
Portuguese (European) Names:
Structure: Similar but with different common names
Example: António Manuel Rodrigues Sousa
Components:
- Given names: Traditional Portuguese names
- Surnames: Often preceded by "de", "da", "dos"
- Nobility particles preserved
Common patterns:
António José Almeida
Maria João Carvalho
Pedro Miguel Teixeira da Costa
Ana Sofia Santos e Silva
Name Detection Approach:
// Comprehensive name databases
Brazilian first names: João, José, Maria, Ana, Pedro, Paulo...
Portuguese first names: António, Manuel, Joaquim, Rui...
Shared surnames: Silva, Santos, Oliveira, Souza/Sousa...
// Context indicators
Nome: [name]
Nome completo: [full name]
Cliente: [name]
Sr./Sra./Dr./Dra. [name]
Brazilian Identifiers:
CPF (Cadastro de Pessoas Físicas):
Format: XXX.XXX.XXX-XX (11 digits)
Example: 123.456.789-09
Validation: Modulo 11 checksum
CNPJ (Cadastro Nacional da Pessoa Jurídica):
Format: XX.XXX.XXX/XXXX-XX (14 digits)
Example: 11.222.333/0001-81
Validation: Modulo 11 checksum
RG (Registro Geral):
Format: Varies by state
Example: 12.345.678-9 (SSP-SP)
Portuguese Identifiers:
NIF (Número de Identificação Fiscal):
Format: XXXXXXXXX (9 digits)
Example: 123456789
First digit indicates entity type:
- 1, 2, 3: Individual
- 5: Corporate
- 6: Public administration
CC (Cartão de Cidadão):
Format: XXXXXXXXX XX X (12 characters)
Example: 12345678 9 ZZ4
Components: Base number + Check digits + Version
NISS (Número de Identificação de Segurança Social):
Format: XXXXXXXXXXX (11 digits)
Example: 12345678901
Other Lusophone Countries:
Angola:
- NIF: Tax identification
- BI: Bilhete de Identidade
Mozambique:
- NUIT: Tax identification
- BI: Identity document
Cape Verde:
- NIF: Tax number
- BI: Identity card
Brazilian Addresses:
Format:
[Street type] [Name], [Number] - [Complement]
[Neighborhood]
[City] - [State]
CEP: [Postal code]
Example:
Rua das Flores, 123 - Apto 45
Jardim Paulista
São Paulo - SP
CEP: 01310-100
CEP format: XXXXX-XXX (8 digits)
Regions indicated by first digits
Portuguese Addresses:
Format:
[Street type] [Name], [Number], [Floor]
[Postal code] [City]
Example:
Rua Augusta, 25, 3º Esq.
1100-053 Lisboa
Código Postal format: XXXX-XXX (7 digits)
First 4: Region, Last 3: Specific location
Street types: Rua, Avenida, Praça, Largo, Travessa
Brazilian Phone Numbers:
Mobile: (XX) 9XXXX-XXXX (11 digits with area code)
Landline: (XX) XXXX-XXXX (10 digits)
International: +55 XX XXXXX-XXXX
Examples:
(11) 99999-8888 // São Paulo mobile
(21) 2222-3333 // Rio de Janeiro landline
+55 11 99999-8888 // International format
Area codes: 11-99 across states
Portuguese Phone Numbers:
Mobile: 9XX XXX XXX (9 digits, starts with 9)
Landline: 2XX XXX XXX (9 digits, starts with 2)
International: +351 XXX XXX XXX
Examples:
912 345 678 // Mobile
213 456 789 // Lisbon landline
+351 912 345 678 // International
Mobile prefixes: 91, 92, 93, 96
Landline prefixes: 21 (Lisbon), 22 (Porto), etc.
Brazilian Banking:
Bank account format:
Banco: [3-digit bank code]
Agência: [4 digits]-[check digit]
Conta: [variable digits]-[check digit]
Example:
Banco: 001 (Banco do Brasil)
Agência: 1234-5
Conta: 123456-7
PIX keys: CPF, phone, email, or random key
Portuguese Banking:
NIB (Número de Identificação Bancária):
Format: XXXXXX.XXXX.XXXXXXXXXXX.XX (21 digits)
Example: 0035.0123.00001234567.89
IBAN format:
PTXX XXXX XXXX XXXX XXXX XXXX X (25 characters)
Example: PT50 0035 0123 0000 1234 5678 9
Brazil - LGPD:
Portugal - GDPR + National Law:
// Automatic variant detection
Input: "O cliente João Silva forneceu seu telemóvel"
Detected: pt-PT (telemóvel indicates European Portuguese)
Input: "O cliente João Silva forneceu seu celular"
Detected: pt-BR (celular indicates Brazilian Portuguese)
// Processing adapts to variant
- Identifier formats appropriate to region
- Address parsing matching regional format
- Name patterns for detected region
// Can also specify explicitly
{
"language": "pt",
"region": "BR" // or "PT"
}
POST /v1/redact
{
"text": "Cliente: Maria Silva, NIF: 123456789, Telemóvel: 912 345 678",
"language": "pt",
"auto_detect_region": true,
"redaction_types": ["name", "tax_id", "phone", "address"]
}
Response:
{
"redacted_text": "Cliente: [NAME], NIF: [TAX_ID], Telemóvel: [PHONE]",
"detected_region": "PT",
"detections": [
{
"type": "name",
"value": "Maria Silva",
"confidence": 0.94
},
{
"type": "tax_id",
"value": "123456789",
"format": "portugal_nif",
"confidence": 0.97
},
{
"type": "phone",
"value": "912 345 678",
"format": "portugal_mobile",
"confidence": 0.98
}
]
}
RedactionAPI has transformed our document processing workflow. We've reduced manual redaction time by 95% while achieving better accuracy than our previous manual process.
The API integration was seamless. Within a week, we had automated redaction running across all our customer support channels, ensuring GDPR compliance effortlessly.
We process over 50,000 legal documents monthly. RedactionAPI handles it all with incredible accuracy and speed. It's become an essential part of our legal tech stack.
The multi-language support is outstanding. We operate in 30 countries and RedactionAPI handles all our documents regardless of language with consistent accuracy.
Trusted by 500+ enterprises worldwide





Beyond spelling differences (following 1990 orthographic agreement to varying degrees), vocabulary, syntax, and formal expressions differ. Brazilian Portuguese uses different terms for common PII contexts. We handle both variants with region-specific processing.
Portuguese names typically include multiple surnames (often maternal then paternal). We maintain databases of common Portuguese and Brazilian names and surnames, handling compound surnames and the various naming conventions across Portuguese-speaking regions.
For Brazil: CPF, CNPJ, RG, PIS. For Portugal: NIF, CC (Cartão de Cidadão), NISS. For other Portuguese-speaking countries: country-specific tax IDs and national identifiers. Each uses appropriate validation.
Yes, we support address formats for Brazil (with CEP), Portugal (with Código Postal), Angola, Mozambique, and other lusophone countries. Each has distinct formatting conventions for street addresses and postal codes.
Business documents often mix Portuguese and English. We detect PII in both languages within the same document, applying appropriate rules for each language section while maintaining document context.
Portuguese uses various diacritical marks (á, à, â, ã, ç, é, ê, í, ó, ô, õ, ú). Our processing correctly handles all diacritics, including matching names regardless of accent presence or accuracy in the source text.