Detect and redact PII from Japanese text with native language understanding. Full support for Hiragana, Katakana, and Kanji with recognition of Japanese names, My Number, addresses, and other identifiers.
Native Japanese NLP
Detect names in Kanji, Hiragana, Katakana, and Romaji with proper reading recognition.
Recognize and validate 12-digit Individual Number (マイナンバー) with checksum.
Detect mobile and landline numbers in Japanese formatting styles.
Parse complex Japanese address formats with prefecture/city/ward hierarchy.
Handle mixed Hiragana, Katakana, Kanji, and Romaji in single documents.
Support Japan's Act on Protection of Personal Information requirements.
Simple integration, powerful results
Send your documents, text, or files through our secure API endpoint or web interface.
Our AI analyzes content to identify all sensitive information types with 99.7% accuracy.
Sensitive data is automatically redacted based on your configured compliance rules.
Receive your redacted content with full audit trail and compliance documentation.
Get started with just a few lines of code
import requests
api_key = "your_api_key"
url = "https://api.redactionapi.net/v1/redact"
data = {
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}
response = requests.post(url,
headers={"Authorization": f"Bearer {api_key}"},
json=data
)
print(response.json())
# Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
const axios = require('axios');
const apiKey = 'your_api_key';
const url = 'https://api.redactionapi.net/v1/redact';
const data = {
text: "John Smith's SSN is 123-45-6789",
redaction_types: ["ssn", "person_name"],
output_format: "redacted"
};
axios.post(url, data, {
headers: { 'Authorization': `Bearer ${apiKey}` }
})
.then(response => {
console.log(response.data);
// Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
});
curl -X POST https://api.redactionapi.net/v1/redact \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}'
# Response:
# {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
Japanese presents distinctive challenges for PII detection due to its complex writing system combining three scripts—Hiragana, Katakana, and Kanji—often mixed within single documents. Names can be written in any of these scripts or romanized, requiring detection across multiple representations. Japanese addresses follow a hierarchical structure from large to small geographical units, opposite to Western conventions. The My Number system adds a national identifier requiring careful validation to avoid false positives.
Our Japanese language processing employs specialized NLP models trained on extensive Japanese corpora. We accurately segment text without word boundaries (similar to Chinese), recognize names in all script variants, parse the unique Japanese address format, and validate Japanese-specific identifiers. This enables comprehensive PII protection for Japanese business documents, customer records, and communications.
Japanese names require multi-script detection capabilities:
Name Structure: Japanese names consist of a family name (姓/名字) followed by a given name (名前). Unlike Chinese names which use only Kanji, Japanese names appear in multiple forms:
Name Variations:
Common patterns:
- Kanji surname + Kanji given name: 佐藤 花子
- Kanji with Hiragana reading: 田中(たなか)一郎(いちろう)
- Katakana for foreign names: マイケル・ジョンソン
- Mixed scripts: 鈴木 Mike
Spacing variations:
- With space: 山田 太郎
- Without space: 山田太郎
- Full-width space: 山田 太郎
Surname Database: We maintain comprehensive Japanese surname data:
Japan's national identification number system requires careful handling:
Format and Validation:
Format: 12 digits (NNNN-NNNN-NNNN or NNNNNNNNNNNN)
Check digit: Last digit is checksum
Validation algorithm:
1. Multiply first 11 digits by weights [6,5,4,3,2,7,6,5,4,3,2]
2. Sum the products
3. Calculate: 11 - (sum mod 11)
4. If result is 10 or 11, check digit is 0
5. Otherwise, result equals check digit
Example validation:
Number: 123456789018
- Weighted sum calculation
- Modulo 11 operation
- Verify check digit matches
Detection Considerations:
Corporate Number (法人番号):
Format: 13 digits (T + 12 digits)
- Prefix T indicates corporate entity
- Similar checksum algorithm
Example: T1234567890123
Japanese phone number formats:
Mobile Numbers:
Format: 0X0-XXXX-XXXX (11 digits starting with 070, 080, 090)
Carriers:
- 090: Original mobile prefix
- 080: Added mobile prefix
- 070: PHS and newer mobile
Examples:
090-1234-5678
080-9876-5432
070-1111-2222
Landline Numbers:
Format: (0XX) XXXX-XXXX or 0XX-XXX-XXXX
Area codes vary by region:
- Tokyo: 03
- Osaka: 06
- Nagoya: 052
- Regional: 4-digit area codes
Examples:
(03) 1234-5678 (Tokyo)
06-9876-5432 (Osaka)
0123-45-6789 (Regional)
Special Numbers:
Toll-free: 0120-XXX-XXX, 0800-XXX-XXXX
IP phones: 050-XXXX-XXXX
Emergency: 110 (Police), 119 (Fire/Ambulance)
Japanese addresses follow a unique hierarchical structure:
Address Hierarchy:
〒 (Postal code)
都道府県 (Prefecture)
市区町村 (City/Ward/Town/Village)
町名 (District name)
丁目 (Block number)
番地 (Building number)
号 (Unit number)
建物名・部屋番号 (Building name/Room number)
Example:
〒100-0001
東京都千代田区千代田1-1-1
皇居ビル 501号室
Components:
- 〒100-0001: Postal code
- 東京都: Tokyo Metropolis
- 千代田区: Chiyoda Ward
- 千代田: District name
- 1-1-1: Block-Building-Unit
- 皇居ビル: Building name
- 501号室: Room 501
Prefecture Types:
Address Variations:
Full format:
東京都新宿区西新宿二丁目8番1号 都庁第一本庁舎
Abbreviated:
東京都新宿区西新宿2-8-1
With building:
〒163-8001 東京都新宿区西新宿2-8-1 都庁
Kanji vs Arabic numerals:
二丁目八番一号 = 2丁目8番1号 = 2-8-1
Understanding the three scripts is essential for Japanese PII detection:
Hiragana (ひらがな):
Katakana (カタカナ):
Kanji (漢字):
Japanese financial documents contain specific identifiers:
Bank Account Numbers:
Format: Bank Code (4) + Branch Code (3) + Account Number (7)
Example:
銀行コード: 0001 (みずほ銀行)
支店番号: 001
口座番号: 1234567
Major bank codes:
- 0001: Mizuho Bank
- 0005: Mitsubishi UFJ
- 0009: Sumitomo Mitsui
- 0010: Resona Bank
Insurance and Pension Numbers:
Health Insurance (健康保険証):
- Insurer number: 8 digits
- Symbol/Number: Varies by insurer
Pension Number (年金番号):
- Basic Pension Number: 10 digits (4-6 format)
Japan's Act on Protection of Personal Information defines protected data:
Personal Information Categories:
Special Care-Required Personal Information:
Anonymization Requirements: APPI specifies requirements for anonymized data (匿名加工情報) including removal of identifying elements and prevention of re-identification.
Japanese business documents frequently mix languages:
Example mixed document:
お客様情報:
氏名: 山田 太郎 (Yamada Taro)
Email: [email protected]
Tel: 03-1234-5678
住所: 東京都港区六本木1-1-1 Roppongi Hills Tower 25F
Detected PII:
- Japanese name: 山田 太郎
- Romanized name: Yamada Taro
- Email: [email protected]
- Phone: 03-1234-5678
- Mixed address: 東京都港区六本木1-1-1 Roppongi Hills Tower 25F
Japanese text uses both full-width and half-width characters:
Full-width (全角):
Numbers: 0123456789
Letters: ABCDEFGHIJ
Katakana: アイウエオ
Half-width (半角):
Numbers: 0123456789
Letters: ABCDEFGHIJ
Katakana: アイウエオ
Both forms are detected:
電話: 03−1234−5678 = 03-1234-5678
Configure Japanese language processing:
POST /v1/redact
{
"text": "お客様: 山田太郎様 マイナンバー: 123456789018",
"language": "ja",
"redaction_types": ["name", "my_number", "phone", "address"]
}
Response:
{
"redacted_text": "お客様: [NAME]様 マイナンバー: [MY_NUMBER]",
"detections": [
{
"type": "name",
"value": "山田太郎",
"script": "kanji",
"confidence": 0.97
},
{
"type": "my_number",
"value": "123456789018",
"valid_checksum": true,
"confidence": 0.99
}
]
}
RedactionAPI has transformed our document processing workflow. We've reduced manual redaction time by 95% while achieving better accuracy than our previous manual process.
The API integration was seamless. Within a week, we had automated redaction running across all our customer support channels, ensuring GDPR compliance effortlessly.
We process over 50,000 legal documents monthly. RedactionAPI handles it all with incredible accuracy and speed. It's become an essential part of our legal tech stack.
The multi-language support is outstanding. We operate in 30 countries and RedactionAPI handles all our documents regardless of language with consistent accuracy.
Trusted by 500+ enterprises worldwide





Japanese names appear in multiple scripts—Kanji (山田太郎), Hiragana (やまだたろう), Katakana (ヤマダタロウ), or Romaji (Yamada Taro). We detect names across all scripts and recognize common surname/given name combinations from extensive Japanese name databases.
My Number (マイナンバー) is Japan's 12-digit Individual Number system for tax and social security. We detect these numbers using pattern matching and checksum validation to ensure accuracy while avoiding false positives from random number sequences.
Japanese addresses follow a specific hierarchy: Prefecture (都道府県) → City/Ward (市区町村) → District (町名) → Block (丁目) → Number (番地) → Building. Our parser recognizes this structure even with variations in formatting and abbreviations.
Yes, business documents often mix Japanese and English. We detect PII in both languages, handling English names and identifiers within Japanese text, and Japanese text romanized in English documents.
We detect both Kanji names and their furigana readings (hiragana/katakana pronunciation guides). When furigana appears alongside Kanji names, both are identified and can be redacted together or separately.
Yes, Japanese can be written vertically (tategaki) or horizontally (yokogaki). Our document processing handles both orientations, correctly detecting PII regardless of text direction.