Comprehensive PHI detection and de-identification for healthcare data including patient identifiers, clinical notes, and medical terminology.
Everything you need for comprehensive data protection
Automatically identify and remove all 18 HIPAA identifiers to achieve Safe Harbor de-identification standard compliance.
Detect MRNs, patient names, dates of birth, Social Security numbers, and other patient identifiers across clinical documents.
Parse unstructured clinical notes including progress notes, discharge summaries, and operative reports with medical NLP.
Understand medical context to distinguish clinical terms from PHI, preserving diagnostic and treatment information.
Process HL7 v2/v3, FHIR, CCD/C-CDA, and DICOM structured reports with format-aware redaction.
Comprehensive logging for compliance verification, demonstrating de-identification methodology for audits.
Healthcare organizations handle vast amounts of Protected Health Information (PHI) that requires de-identification for research, analytics, and secondary use. RedactionAPI provides comprehensive PHI detection across clinical documents, supporting HIPAA Safe Harbor compliance while preserving the clinical value of medical data.
HIPAA provides two methods for de-identifying protected health information: Safe Harbor and Expert Determination. RedactionAPI supports both approaches:
Remove all 18 specified identifiers with no actual knowledge that remaining information could identify an individual.
Our default healthcare profile implements complete Safe Harbor compliance.
Statistical or scientific methods determine very small re-identification risk. Allows retention of some identifiers.
Configurable profiles support custom de-identification approaches.
Safe Harbor requires removal of these 18 types of identifiers:
| # | Identifier Type | Examples | Detection Method |
|---|---|---|---|
| 1 | Names | Patient, family members, providers | NLP + name database |
| 2 | Geographic data | Addresses, cities, zip codes | Pattern + geo database |
| 3 | Dates | Birth, admission, discharge, death | Date pattern recognition |
| 4 | Phone numbers | Home, work, mobile, fax | Pattern validation |
| 5 | Fax numbers | All fax numbers | Pattern + context |
| 6 | Email addresses | Patient email | Email pattern |
| 7 | Social Security numbers | SSN, tax ID | Pattern + validation |
| 8 | Medical record numbers | MRN, patient ID | Context + pattern |
| 9 | Health plan numbers | Insurance member ID | Context + pattern |
| 10 | Account numbers | Billing, financial accounts | Context + pattern |
| 11 | Certificate/license numbers | Driver's license, professional license | Pattern + validation |
| 12 | Vehicle identifiers | License plates, VINs | Pattern recognition |
| 13 | Device identifiers | Medical device serial numbers | Context + pattern |
| 14 | Web URLs | Patient portal links, personal websites | URL pattern |
| 15 | IP addresses | Network identifiers | IP pattern |
| 16 | Biometric identifiers | Fingerprints, voiceprints | Context + keywords |
| 17 | Full face photos | Patient photographs | Image detection |
| 18 | Any other unique identifier | Tattoo descriptions, unique characteristics | Context + NLP |
Medical records include both structured data and unstructured clinical notes. Our medical NLP handles both:
PATIENT: John Smith MRN: 12345678
DOB: 03/15/1965 DOS: 01/15/2024
CHIEF COMPLAINT: 58-year-old male presents with chest pain x 2 days.
HISTORY OF PRESENT ILLNESS:
Mr. Smith is a 58-year-old male who presents to the ED with complaints
of substernal chest pain that began 2 days ago. Patient states pain
radiates to his left arm. He was previously seen at Memorial Hospital
on 01/13/2024. His wife Mary Smith drove him to the ED today.
Patient lives at 123 Oak Street, Springfield, IL 62701. He can be
reached at (555) 123-4567. His insurance ID is BCBS-987654321.
ASSESSMENT/PLAN:
1. Acute coronary syndrome - will admit to cardiology
2. Contact cardiologist Dr. Johnson at ext. 4321
PATIENT: [NAME] MRN: [MRN]
DOB: [DATE] DOS: [DATE]
CHIEF COMPLAINT: [AGE]-year-old male presents with chest pain x 2 days.
HISTORY OF PRESENT ILLNESS:
[NAME] is a [AGE]-year-old male who presents to the ED with complaints
of substernal chest pain that began 2 days ago. Patient states pain
radiates to his left arm. He was previously seen at [FACILITY]
on [DATE]. His wife [NAME] drove him to the ED today.
Patient lives at [ADDRESS]. He can be
reached at [PHONE]. His insurance ID is [INSURANCE_ID].
ASSESSMENT/PLAN:
1. Acute coronary syndrome - will admit to cardiology
2. Contact cardiologist [PROVIDER] at ext. [PHONE_EXT]
MRNs are critical identifiers that must be removed for de-identification. However, MRN formats vary by institution:
{
"mrn_patterns": [
{
"name": "standard_numeric",
"pattern": "\\b\\d{7,10}\\b",
"context_required": ["MRN", "Medical Record", "Patient ID", "Chart"]
},
{
"name": "prefixed",
"pattern": "\\b(MRN|PT|PAT)[-:]?\\d{6,10}\\b",
"context_required": false
},
{
"name": "institution_specific",
"pattern": "\\bABC-\\d{8}\\b",
"context_required": false
}
]
}
Dates in medical records require careful handling. We offer multiple approaches:
Replace all dates with placeholder. Safest for de-identification but loses temporal information.
03/15/1965 → [DATE]
Keep year, remove month and day. Meets Safe Harbor requirements for most ages.
03/15/1965 → 1965
Apply consistent offset to all dates. Preserves time intervals for longitudinal research.
03/15/1965 → 07/22/1965 (shifted +129 days)
Convert birth date to age at time of service. For ages over 89, use "90+".
DOB: 03/15/1965 → Age: 58 years
We provide specialized processing for standard healthcare data formats:
// Original HL7 v2 message
MSH|^~\&|HIS|Hospital|LAB|Lab|202401151430||ADT^A01|MSG001|P|2.4
PID|1|12345678|12345678^^^Hospital^MR||Smith^John^Q||19650315|M|||123 Oak St^^Springfield^IL^62701||5551234567
// Redacted HL7 message
MSH|^~\&|HIS|[FACILITY]|LAB|Lab|[DATETIME]||ADT^A01|[MSG_ID]|P|2.4
PID|1|[MRN]|[MRN]^^^[FACILITY]^MR||[NAME]||[DOB]|M|||[ADDRESS]||[PHONE]
// Original FHIR Patient resource
{
"resourceType": "Patient",
"id": "patient-123",
"identifier": [{
"system": "http://hospital.org/mrn",
"value": "12345678"
}],
"name": [{
"family": "Smith",
"given": ["John", "Q"]
}],
"birthDate": "1965-03-15",
"address": [{
"line": ["123 Oak Street"],
"city": "Springfield",
"state": "IL",
"postalCode": "62701"
}]
}
// De-identified FHIR resource
{
"resourceType": "Patient",
"id": "[RESOURCE_ID]",
"identifier": [{
"system": "[SYSTEM]",
"value": "[MRN]"
}],
"name": [{
"family": "[FAMILY_NAME]",
"given": ["[GIVEN_NAME]"]
}],
"birthDate": "1965",
"address": [{
"line": ["[ADDRESS]"],
"city": "[CITY]",
"state": "IL",
"postalCode": "627"
}]
}
Our medical NLP understands clinical context to improve detection accuracy:
POST /v1/redact
{
"document": "PATIENT: John Smith MRN: 12345678...",
"profile": "hipaa_safe_harbor",
"options": {
"date_handling": "year_only",
"age_threshold": 89,
"zip_code_handling": "three_digit",
"preserve_providers": false,
"preserve_facilities": false
}
}
// Response
{
"redacted_document": "PATIENT: [NAME] MRN: [MRN]...",
"compliance": {
"method": "safe_harbor",
"identifiers_found": {
"names": 3,
"mrn": 1,
"dates": 4,
"addresses": 1,
"phone_numbers": 1,
"insurance_ids": 1
},
"all_identifiers_redacted": true
},
"audit_log": "https://api.redactionapi.com/audit/abc123"
}
De-identify patient records for retrospective studies, clinical trials, and outcomes research while preserving clinical value.
Prepare EHR data for population health analytics, quality metrics, and operational reporting without PHI exposure.
Create de-identified datasets for training clinical NLP models, diagnostic algorithms, and decision support systems.
Enable safe data sharing between organizations, with research networks, or for public health reporting.
RedactionAPI provides healthcare organizations with comprehensive PHI detection and HIPAA-compliant de-identification. Process clinical documents at scale while meeting regulatory requirements.