Schema-aware PII detection and redaction for XML documents with XPath targeting, namespace support, and guaranteed structural integrity.
Everything you need for comprehensive data protection
Parse and validate XML against XSD schemas, ensuring redacted output remains structurally valid and compliant.
Target specific elements and attributes using XPath expressions for precise control over what gets redacted.
Full XML namespace handling including default namespaces, prefixed namespaces, and namespace-aware XPath queries.
Apply redaction as part of XSLT transformation pipelines for seamless integration into existing XML workflows.
Redact PII in both element content and attribute values with configurable targeting rules.
Process large XML files efficiently using streaming parsers without loading entire documents into memory.
XML remains the backbone of enterprise data exchange, healthcare systems (HL7, CCD), financial services (SWIFT, FpML), and government communications. RedactionAPI provides schema-aware XML processing that detects and removes PII while guaranteeing structural validity—essential for systems where malformed XML causes failures.
XML's hierarchical structure, namespace complexity, and strict validation requirements make PII redaction more challenging than flat text processing. Simply replacing text can break schema validation, damage document structure, or corrupt cross-references.
XML schemas (XSD) define strict rules for element types, lengths, and patterns. Redaction must produce output that remains valid against these schemas.
Documents may use multiple namespaces with prefixes, default namespaces, and inheritance. XPath queries must be namespace-aware.
XML ID/IDREF attributes create document-internal references. Redacting an ID without updating references causes validation failures.
Enterprise XML files can be gigabytes in size. DOM-based processing runs out of memory; streaming parsers require careful state management.
When you provide an XSD schema, RedactionAPI validates input and ensures redacted output remains compliant:
{
"document": "<?xml version=\"1.0\"?><Customer xmlns=\"http://example.com/customer\">...</Customer>",
"document_type": "xml",
"schema": {
"type": "xsd",
"content": "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\">...</xs:schema>"
},
"validation": {
"validate_input": true,
"validate_output": true,
"fail_on_invalid": true
},
"pii_types": ["name", "ssn", "email", "phone", "address"]
}
Our schema processor understands XSD data types and generates appropriate replacements:
| XSD Type | Original | Redacted | Strategy |
|---|---|---|---|
| xs:string | John Smith | [REDACTED] | Placeholder text |
| xs:date | 1985-03-15 | 1900-01-01 | Valid date placeholder |
| xs:integer | 123456789 | 000000000 | Zero-filled |
| Enumeration | Male | Unknown | Neutral enum value |
| Pattern (SSN) | 123-45-6789 | XXX-XX-XXXX | Pattern-preserving mask |
XPath expressions provide surgical precision in specifying which elements and attributes to redact:
{
"document": "<Order>...</Order>",
"document_type": "xml",
"xpath_targets": [
{
"xpath": "//Customer/Name",
"pii_type": "name",
"action": "redact"
},
{
"xpath": "//Customer/SSN",
"pii_type": "ssn",
"action": "mask"
},
{
"xpath": "//BillingAddress/*",
"pii_type": "address",
"action": "redact"
},
{
"xpath": "//@email",
"pii_type": "email",
"action": "tokenize"
},
{
"xpath": "//Notes[contains(text(), 'CONFIDENTIAL')]",
"action": "remove_element"
}
]
}
| XPath | Targets |
|---|---|
| //SSN | All SSN elements anywhere in document |
| /Order/Customer/Name | Specific path from root |
| //Customer[@type='individual']/SSN | SSN only for individual customers |
| //@ssn | All attributes named "ssn" |
| //Person[Age < 18]/Name | Names of minors only |
| //ns:Patient/ns:MRN | Namespaced elements |
XML namespaces require careful handling to correctly target elements:
{
"document": "<Patient xmlns=\"urn:hl7-org:v3\" xmlns:ext=\"urn:example:extension\">...</Patient>",
"document_type": "xml",
"namespaces": {
"hl7": "urn:hl7-org:v3",
"ext": "urn:example:extension"
},
"xpath_targets": [
{
"xpath": "//hl7:Patient/hl7:name",
"pii_type": "name"
},
{
"xpath": "//ext:SSN",
"pii_type": "ssn"
}
]
}
For documents where namespaces vary or are unknown, use local-name() functions:
// Match any element named "SSN" regardless of namespace
//*[local-name()='SSN']
// Match SSN in any namespace under any Patient element
//*[local-name()='Patient']/*[local-name()='SSN']
// Match any attribute named "email" regardless of element
//@*[local-name()='email']
<?xml version="1.0" encoding="UTF-8"?>
<Customer>
<Name>John Smith</Name>
<SSN>123-45-6789</SSN>
<Email>[email protected]</Email>
<Phone>(555) 123-4567</Phone>
<Address>
<Street>123 Main Street</Street>
<City>Springfield</City>
<State>IL</State>
<Zip>62701</Zip>
</Address>
<Notes>Customer mentioned SSN 987-65-4321 during call.</Notes>
</Customer>
<?xml version="1.0" encoding="UTF-8"?>
<Customer>
<Name>[NAME]</Name>
<SSN>[SSN]</SSN>
<Email>[EMAIL]</Email>
<Phone>[PHONE]</Phone>
<Address>
<Street>[ADDRESS]</Street>
<City>[CITY]</City>
<State>IL</State>
<Zip>[ZIP]</Zip>
</Address>
<Notes>Customer mentioned SSN [SSN] during call.</Notes>
</Customer>
<ClinicalDocument xmlns="urn:hl7-org:v3">
<recordTarget>
<patientRole>
<id extension="12345678" root="2.16.840.1.113883.4.1"/>
<patient>
<name>
<given>John</given>
<family>Smith</family>
</name>
<birthTime value="19850315"/>
<administrativeGenderCode code="M"/>
</patient>
</patientRole>
</recordTarget>
<component>
<structuredBody>
<!-- Clinical content -->
</structuredBody>
</component>
</ClinicalDocument>
For HL7/CCD documents, we provide pre-configured profiles that understand healthcare-specific PII locations:
{
"document": "...",
"document_type": "xml",
"profile": "hl7_ccd",
"redaction_level": "safe_harbor" // HIPAA Safe Harbor de-identification
}
For files too large for memory-based processing, use streaming mode:
# Using curl with streaming
curl -X POST https://api.redactionapi.com/v1/redact/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/xml" \
-H "X-Document-Type: xml" \
-H "X-Streaming: true" \
-H "X-PII-Types: name,ssn,email,phone,address" \
--data-binary @large_file.xml \
-o redacted_file.xml
Incorporate redaction into XSLT transformation pipelines:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:redact="http://redactionapi.com/xslt">
<xsl:template match="Customer/Name">
<Name><xsl:value-of select="redact:redact(., 'name')"/></Name>
</xsl:template>
<xsl:template match="Customer/SSN">
<SSN><xsl:value-of select="redact:mask(., 'ssn')"/></SSN>
</xsl:template>
<xsl:template match="@email">
<xsl:attribute name="email">
<xsl:value-of select="redact:tokenize(., 'email')"/>
</xsl:attribute>
</xsl:template>
<!-- Identity transform for everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
We provide pre-built profiles for common XML standards:
from redactionapi import RedactionClient
client = RedactionClient(api_key="your_api_key")
# Read XML file
with open("customers.xml", "r") as f:
xml_content = f.read()
# Redact with XPath targeting
result = client.redact_xml(
document=xml_content,
xpath_targets=[
{"xpath": "//Customer/SSN", "pii_type": "ssn"},
{"xpath": "//Customer/Name", "pii_type": "name"},
{"xpath": "//@email", "pii_type": "email"}
],
preserve_formatting=True
)
# Save redacted output
with open("customers_redacted.xml", "w") as f:
f.write(result.redacted_document)
RedactionAPI provides enterprise-grade XML processing with schema validation, namespace support, and streaming capabilities. Protect PII in your XML data while maintaining structural integrity.