RedactionAPI.net
Home
Data Types
Name Redaction Email Redaction SSN Redaction Credit Card Redaction Phone Number Redaction Medical Record Redaction
Compliance
HIPAA GDPR PCI DSS CCPA SOX
Industries
Healthcare Financial Services Legal Government Technology
Use Cases
FOIA Redaction eDiscovery Customer Support Log Redaction
Quick Links
Pricing API Documentation Login Try Redaction Demo
Korean Language Redaction
99.7% Accuracy
70+ Data Types

Korean Language Redaction

Advanced PII detection for Hangul text with native Korean identifier support and Personal Information Protection Act (PIPA) compliance.

Enterprise Security
Real-Time Processing
Compliance Ready
0 Words Protected
0+ Enterprise Clients
0+ Languages
11,172
Hangul Syllables
99.6 %
RRN Detection Rate
286
Family Names
<50 ms
Processing Time

Powerful Redaction Features

Everything you need for comprehensive data protection

Hangul Script Support

Complete support for the Korean Hangul alphabet including all 11,172 modern syllable blocks with accurate character boundary detection.

RRN Detection

Detect and validate Korean Resident Registration Numbers with checksum verification using the weighted modulo 11 algorithm.

PIPA Compliance

Ensure compliance with Korea's Personal Information Protection Act through comprehensive identification and protection of regulated data.

Korean Address Parsing

Parse and redact Korean addresses in both road-name (도로명) and land-lot (지번) formats with proper district hierarchy recognition.

Korean Phone Numbers

Identify Korean phone numbers in all formats including mobile (010), Seoul landlines (02), and regional codes with proper spacing variants.

Korean Name Recognition

Detect Korean names using comprehensive databases of family names (성) and given names with contextual analysis for accuracy.

Understanding Korean Language PII Detection

Korean presents unique challenges for automated PII detection due to its Hangul writing system, complex honorific structures, and national identifier formats. RedactionAPI's Korean language support provides comprehensive detection capabilities that understand the nuances of Korean text while ensuring compliance with South Korea's Personal Information Protection Act (PIPA).

The Complexity of Korean Text Processing

The Korean language uses Hangul, a featural alphabet invented by King Sejong in 1443. Unlike Chinese characters that represent meanings or Japanese scripts that mix syllabaries, Hangul consists of jamo (letters) combined into syllable blocks. Modern Korean uses 11,172 possible syllable blocks, created from 24 basic jamo: 14 consonants and 10 vowels.

For PII detection, this structure creates both opportunities and challenges. Korean text has clear syllable boundaries, making word segmentation more predictable than in Chinese or Japanese. However, the language lacks spaces within compound nouns, uses complex honorific systems that modify names, and often mixes Hangul with Hanja (Chinese characters) in formal contexts.

Korean Name Detection Strategies

Korean names typically follow the pattern of a single-syllable family name followed by a two-syllable given name, though variations exist. Approximately 45% of Koreans share just three family names: Kim (김), Lee/Yi (이), and Park (박). Our detection system leverages this concentrated distribution while accounting for the remaining 283+ family names documented in Korean records.

Korean Name Pattern Examples

Standard Patterns
  • 김민준 - Three syllables (family + given)
  • 이서연 - Common structure
  • 박지훈 - Family name Park
With Honorifics
  • 김민준 님 - With honorific suffix
  • 이서연 씨 - Common address form
  • 박 부장님 - Title with family name

Korean Resident Registration Number (RRN) Detection

The Resident Registration Number (주민등록번호) is Korea's primary national identifier, assigned to all citizens at birth and to foreign residents upon registration. This 13-digit number contains encoded personal information and is heavily protected under Korean law.

RRN Structure and Validation

The RRN follows the format YYMMDD-GNNNNNN, where:

  • YYMMDD - Date of birth (year, month, day)
  • G - Gender and century indicator:
    • 1 = Male born 1900-1999
    • 2 = Female born 1900-1999
    • 3 = Male born 2000-2099
    • 4 = Female born 2000-2099
    • 5-8 = Foreign residents
  • NNNN - Registration location code and sequence
  • N - Check digit calculated via weighted modulo 11

RRN Checksum Algorithm

Our validation uses the official weighted checksum algorithm mandated by the Korean government. Each of the first 12 digits is multiplied by a weight from the sequence [2,3,4,5,6,7,8,9,2,3,4,5], summed, and the check digit is calculated as (11 - (sum mod 11)) mod 10.

// RRN Validation Algorithm
function validateKoreanRRN(rrn) {
    // Remove hyphen if present
    const digits = rrn.replace('-', '');
    if (digits.length !== 13) return false;

    const weights = [2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5];
    let sum = 0;

    for (let i = 0; i < 12; i++) {
        sum += parseInt(digits[i]) * weights[i];
    }

    const checkDigit = (11 - (sum % 11)) % 10;
    return checkDigit === parseInt(digits[12]);
}

Korean Address Recognition

South Korea uses two parallel address systems, and many documents and databases contain addresses in either or both formats. Our parser handles the complexity of Korean administrative divisions and building identification.

Road Name Address System (도로명주소)

Introduced in 2014 as the official system, road-name addresses follow a logical structure based on streets rather than land parcels:

서울특별시 강남구 테헤란로 152, 강남파이낸스센터 12층

  • 서울특별시 - Metropolitan city (Seoul Special City)
  • 강남구 - District (Gangnam-gu)
  • 테헤란로 152 - Road name and building number
  • 강남파이낸스센터 12층 - Building name and floor

Land Lot Address System (지번주소)

The traditional system based on land parcel numbers remains in common use, particularly in older documents and rural areas:

서울특별시 강남구 역삼동 737

  • 서울특별시 - Metropolitan city
  • 강남구 - District
  • 역삼동 - Neighborhood (dong)
  • 737 - Land lot number

Korean Phone Number Detection

Korean phone numbers follow specific patterns that our system recognizes across multiple formatting variations:

Phone Number Formats

Mobile Numbers
  • 010-1234-5678 - Standard mobile
  • 010.1234.5678 - Dot separated
  • 01012345678 - No separators
Landline Numbers
  • 02-1234-5678 - Seoul
  • 031-123-4567 - Gyeonggi
  • 051-234-5678 - Busan

PIPA Compliance Requirements

South Korea's Personal Information Protection Act (개인정보 보호법) is one of the world's most stringent data protection regulations. Enacted in 2011 and significantly amended in 2020, PIPA imposes strict requirements on organizations handling Korean personal data.

Unique Identifiers Under PIPA

PIPA designates certain identifiers as "Unique Identifiers" (고유식별정보) requiring enhanced protection. Collection and processing of these identifiers requires explicit consent or legal basis:

PIPA Unique Identifiers

  • Resident Registration Number (주민등록번호)
  • Passport Number (여권번호)
  • Driver's License Number (운전면허번호)
  • Alien Registration Number (외국인등록번호)

Sensitive Information

  • Health and medical information
  • Genetic and biometric data
  • Race and ethnicity
  • Political opinions and union membership

RRN Processing Restrictions

Since 2014, PIPA has prohibited the collection of RRNs except in cases specifically authorized by law. Organizations must implement technical measures to ensure RRNs are not collected without legal basis. RedactionAPI helps organizations identify and protect RRNs that may have been collected historically or appear in documents unexpectedly.

API Integration for Korean Text

Processing Korean text through our API is straightforward. The system automatically detects Korean content and applies appropriate detection rules:

{
    "text": "고객님 김민준 님의 주민등록번호 850315-1234567로 본인확인이 완료되었습니다. 연락처: 010-9876-5432",
    "language": "ko",
    "pii_types": ["name", "rrn", "phone"],
    "redaction_style": "mask"
}

Response:

{
    "redacted_text": "고객님 [NAME] 님의 주민등록번호 [RRN]로 본인확인이 완료되었습니다. 연락처: [PHONE]",
    "entities": [
        {
            "type": "name",
            "value": "김민준",
            "position": {"start": 4, "end": 7},
            "confidence": 0.96
        },
        {
            "type": "rrn",
            "value": "850315-1234567",
            "position": {"start": 17, "end": 31},
            "confidence": 0.99,
            "validation": "checksum_valid"
        },
        {
            "type": "phone",
            "value": "010-9876-5432",
            "position": {"start": 48, "end": 61},
            "confidence": 0.98
        }
    ]
}

Handling Mixed Script Content

Korean documents frequently contain mixed content including English, Hanja (Chinese characters), and numbers. Our system maintains accuracy across these mixed contexts:

Mixed Content Examples

  • Korean-English: "담당자: John Kim (김존)" - Detects both name representations
  • Hanja Names: "李成桂 (이성계)" - Recognizes Hanja with Hangul reading
  • Formal Documents: "株式會社" vs "주식회사" - Company designation variants
  • Dates: "2024년 3월 15일" vs "2024.03.15" - Multiple date formats

Additional Korean Identifiers

Korean Driver's License Numbers

Korean driver's licenses use a 12-character format: regional code (2 digits) + year (2 digits) + sequence (6 digits) + check digits (2 digits). Example: 서울-12-345678-01

Business Registration Numbers

Korean business registration numbers (사업자등록번호) follow a 10-digit format (XXX-XX-XXXXX) with regional office codes and check digit validation.

Korean Passport Numbers

Korean passports use a format starting with M or R followed by 8 digits, with validation based on ICAO standards.

Performance Optimizations for Korean

Our Korean processing pipeline includes several optimizations:

  • Syllable Block Indexing: Pre-indexed lookup tables for common Hangul patterns enable sub-millisecond pattern matching
  • Jamo Decomposition: When needed for fuzzy matching, we decompose syllables into constituent jamo without performance penalty
  • Name Database Optimization: Bloom filters provide rapid elimination of non-name patterns before detailed analysis
  • Regional Code Caching: Cached mappings for area codes, postal codes, and regional identifiers accelerate address parsing

Enterprise Use Cases

Financial Services

Korean banks and fintech companies use our API to redact RRNs and account information in customer communications, audit logs, and internal documents while maintaining PIPA compliance.

Healthcare

Medical institutions process patient records through our system to remove identification while preserving clinical value for research and analytics.

E-commerce

Korean online retailers redact customer data in order histories, support tickets, and shipping records to minimize data exposure.

Telecommunications

Mobile carriers process call records and customer data through our API to support regulatory compliance and internal data governance requirements.

Start Processing Korean Text Today

RedactionAPI provides the most comprehensive Korean language PII detection available, with native support for Hangul, Korean identifiers, and PIPA compliance. Process thousands of documents per minute with enterprise-grade accuracy.

?>