Advanced PII detection for Hangul text with native Korean identifier support and Personal Information Protection Act (PIPA) compliance.
Everything you need for comprehensive data protection
Complete support for the Korean Hangul alphabet including all 11,172 modern syllable blocks with accurate character boundary detection.
Detect and validate Korean Resident Registration Numbers with checksum verification using the weighted modulo 11 algorithm.
Ensure compliance with Korea's Personal Information Protection Act through comprehensive identification and protection of regulated data.
Parse and redact Korean addresses in both road-name (도로명) and land-lot (지번) formats with proper district hierarchy recognition.
Identify Korean phone numbers in all formats including mobile (010), Seoul landlines (02), and regional codes with proper spacing variants.
Detect Korean names using comprehensive databases of family names (성) and given names with contextual analysis for accuracy.
Korean presents unique challenges for automated PII detection due to its Hangul writing system, complex honorific structures, and national identifier formats. RedactionAPI's Korean language support provides comprehensive detection capabilities that understand the nuances of Korean text while ensuring compliance with South Korea's Personal Information Protection Act (PIPA).
The Korean language uses Hangul, a featural alphabet invented by King Sejong in 1443. Unlike Chinese characters that represent meanings or Japanese scripts that mix syllabaries, Hangul consists of jamo (letters) combined into syllable blocks. Modern Korean uses 11,172 possible syllable blocks, created from 24 basic jamo: 14 consonants and 10 vowels.
For PII detection, this structure creates both opportunities and challenges. Korean text has clear syllable boundaries, making word segmentation more predictable than in Chinese or Japanese. However, the language lacks spaces within compound nouns, uses complex honorific systems that modify names, and often mixes Hangul with Hanja (Chinese characters) in formal contexts.
Korean names typically follow the pattern of a single-syllable family name followed by a two-syllable given name, though variations exist. Approximately 45% of Koreans share just three family names: Kim (김), Lee/Yi (이), and Park (박). Our detection system leverages this concentrated distribution while accounting for the remaining 283+ family names documented in Korean records.
김민준 - Three syllables (family + given)이서연 - Common structure박지훈 - Family name Park김민준 님 - With honorific suffix이서연 씨 - Common address form박 부장님 - Title with family nameThe Resident Registration Number (주민등록번호) is Korea's primary national identifier, assigned to all citizens at birth and to foreign residents upon registration. This 13-digit number contains encoded personal information and is heavily protected under Korean law.
The RRN follows the format YYMMDD-GNNNNNN, where:
Our validation uses the official weighted checksum algorithm mandated by the Korean government. Each of the first 12 digits is multiplied by a weight from the sequence [2,3,4,5,6,7,8,9,2,3,4,5], summed, and the check digit is calculated as (11 - (sum mod 11)) mod 10.
// RRN Validation Algorithm
function validateKoreanRRN(rrn) {
// Remove hyphen if present
const digits = rrn.replace('-', '');
if (digits.length !== 13) return false;
const weights = [2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5];
let sum = 0;
for (let i = 0; i < 12; i++) {
sum += parseInt(digits[i]) * weights[i];
}
const checkDigit = (11 - (sum % 11)) % 10;
return checkDigit === parseInt(digits[12]);
}
South Korea uses two parallel address systems, and many documents and databases contain addresses in either or both formats. Our parser handles the complexity of Korean administrative divisions and building identification.
Introduced in 2014 as the official system, road-name addresses follow a logical structure based on streets rather than land parcels:
서울특별시 강남구 테헤란로 152, 강남파이낸스센터 12층
The traditional system based on land parcel numbers remains in common use, particularly in older documents and rural areas:
서울특별시 강남구 역삼동 737
Korean phone numbers follow specific patterns that our system recognizes across multiple formatting variations:
010-1234-5678 - Standard mobile010.1234.5678 - Dot separated01012345678 - No separators02-1234-5678 - Seoul031-123-4567 - Gyeonggi051-234-5678 - BusanSouth Korea's Personal Information Protection Act (개인정보 보호법) is one of the world's most stringent data protection regulations. Enacted in 2011 and significantly amended in 2020, PIPA imposes strict requirements on organizations handling Korean personal data.
PIPA designates certain identifiers as "Unique Identifiers" (고유식별정보) requiring enhanced protection. Collection and processing of these identifiers requires explicit consent or legal basis:
Since 2014, PIPA has prohibited the collection of RRNs except in cases specifically authorized by law. Organizations must implement technical measures to ensure RRNs are not collected without legal basis. RedactionAPI helps organizations identify and protect RRNs that may have been collected historically or appear in documents unexpectedly.
Processing Korean text through our API is straightforward. The system automatically detects Korean content and applies appropriate detection rules:
{
"text": "고객님 김민준 님의 주민등록번호 850315-1234567로 본인확인이 완료되었습니다. 연락처: 010-9876-5432",
"language": "ko",
"pii_types": ["name", "rrn", "phone"],
"redaction_style": "mask"
}
Response:
{
"redacted_text": "고객님 [NAME] 님의 주민등록번호 [RRN]로 본인확인이 완료되었습니다. 연락처: [PHONE]",
"entities": [
{
"type": "name",
"value": "김민준",
"position": {"start": 4, "end": 7},
"confidence": 0.96
},
{
"type": "rrn",
"value": "850315-1234567",
"position": {"start": 17, "end": 31},
"confidence": 0.99,
"validation": "checksum_valid"
},
{
"type": "phone",
"value": "010-9876-5432",
"position": {"start": 48, "end": 61},
"confidence": 0.98
}
]
}
Korean documents frequently contain mixed content including English, Hanja (Chinese characters), and numbers. Our system maintains accuracy across these mixed contexts:
Korean driver's licenses use a 12-character format: regional code (2 digits) + year (2 digits) + sequence (6 digits) + check digits (2 digits). Example: 서울-12-345678-01
Korean business registration numbers (사업자등록번호) follow a 10-digit format (XXX-XX-XXXXX) with regional office codes and check digit validation.
Korean passports use a format starting with M or R followed by 8 digits, with validation based on ICAO standards.
Our Korean processing pipeline includes several optimizations:
Korean banks and fintech companies use our API to redact RRNs and account information in customer communications, audit logs, and internal documents while maintaining PIPA compliance.
Medical institutions process patient records through our system to remove identification while preserving clinical value for research and analytics.
Korean online retailers redact customer data in order histories, support tickets, and shipping records to minimize data exposure.
Mobile carriers process call records and customer data through our API to support regulatory compliance and internal data governance requirements.
RedactionAPI provides the most comprehensive Korean language PII detection available, with native support for Hangul, Korean identifiers, and PIPA compliance. Process thousands of documents per minute with enterprise-grade accuracy.