Detect and redact sensitive data in Microsoft Word documents. Process DOCX and legacy DOC files with full formatting preservation, track changes, comments, and metadata handling.
Comprehensive DOCX/DOC support
Maintain fonts, styles, tables, and layouts. Redacted documents look professional and preserve formatting.
Scan and redact within comments, tracked changes, and revision history. Handle collaboration artifacts.
Process embedded images with OCR, handle charts and SmartArt, and redact within embedded objects.
Scan headers, footers, and page numbers. PII often hides in document periphery.
Strip document properties, author information, company names, and hidden metadata.
Support for DOCX (2007+), DOC (97-2003), and Office Open XML formats.
Simple integration, powerful results
Send your documents, text, or files through our secure API endpoint or web interface.
Our AI analyzes content to identify all sensitive information types with 99.7% accuracy.
Sensitive data is automatically redacted based on your configured compliance rules.
Receive your redacted content with full audit trail and compliance documentation.
Get started with just a few lines of code
import requests
api_key = "your_api_key"
url = "https://api.redactionapi.net/v1/redact"
data = {
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}
response = requests.post(url,
headers={"Authorization": f"Bearer {api_key}"},
json=data
)
print(response.json())
# Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
const axios = require('axios');
const apiKey = 'your_api_key';
const url = 'https://api.redactionapi.net/v1/redact';
const data = {
text: "John Smith's SSN is 123-45-6789",
redaction_types: ["ssn", "person_name"],
output_format: "redacted"
};
axios.post(url, data, {
headers: { 'Authorization': `Bearer ${apiKey}` }
})
.then(response => {
console.log(response.data);
// Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
});
curl -X POST https://api.redactionapi.net/v1/redact \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}'
# Response:
# {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
Microsoft Word documents remain the standard for business correspondence, contracts, reports, and countless other document types. Their flexibility—supporting rich formatting, embedded objects, collaboration features, and complex layouts—makes them essential for business operations. But this same richness creates challenges for data protection. Sensitive information can hide in tracked changes, lurk in comments, embed within images, or persist in document metadata long after visible content has been modified.
Our Word document redaction addresses the full complexity of DOCX and DOC files. Beyond simple text search and replace, we process the complete document structure—analyzing tracked changes, scanning comments, extracting and OCR-processing embedded images, and removing hidden metadata. The result is a professionally formatted document with sensitive information properly removed, suitable for sharing, production, or archival.
Modern Word documents (DOCX) are actually ZIP archives containing XML files describing document content, formatting, and resources. This structure enables rich functionality but creates multiple locations where sensitive data may reside:
Main Document Content: The primary document text with all formatting, paragraphs, tables, and visible content. This is where most sensitive data appears and where redaction primarily operates.
Headers and Footers: Separate XML components for headers and footers, which may contain different content for first page, odd pages, and even pages. PII in letterheads, document IDs, or automatic fields often appears here.
Comments: Reviewer comments are stored separately and linked to text ranges. Comments may contain sensitive discussions, reviewer names, or referenced PII that requires protection.
Track Changes: Revision tracking stores both original and modified text, creating multiple versions of content within one document. Deleted text containing PII remains accessible until changes are accepted/rejected.
Embedded Objects: Images, charts, SmartArt, and other objects are stored as separate resources. Images may contain visible text requiring OCR analysis.
Document Properties: Core properties (author, title, dates) and custom properties store metadata that may reveal sensitive information about document origin or handling.
Professional documents require professional appearance after redaction. Our processing preserves all Word formatting elements:
Character Formatting: Fonts, sizes, colors, bold, italic, underline, and all character-level formatting remain exactly as specified. Redacted text can maintain the original formatting or use consistent redaction styling.
Paragraph Formatting: Alignment, spacing, indentation, bullets, numbering, and paragraph styles are preserved. Documents maintain their visual structure.
Tables: Table structures, cell formatting, merged cells, borders, and shading remain intact. Content within cells is redacted while table layout is preserved.
Sections and Layouts: Page size, margins, columns, section breaks, and page orientation settings are maintained. Multi-section documents process correctly.
Styles: Both built-in and custom styles are preserved, maintaining document consistency and enabling continued editing with consistent formatting.
Word's collaboration features create data protection challenges that require careful handling:
Revision History: When track changes is enabled, Word stores both original and modified text. Deleting PII with track changes on doesn't remove it—the deletion is tracked, and original text remains visible in revision history.
Multiple Revision Layers: Documents may have multiple rounds of tracked changes from different authors. Each revision layer may contain different sensitive content requiring analysis.
Comment Contents: Reviewer comments may quote or reference sensitive document content. Comments from multiple reviewers each need scanning.
Author Attribution: Track changes and comments record author names and timestamps. This metadata may itself require protection or removal.
Our processing offers multiple options for handling collaboration artifacts: redact within tracked changes/comments while preserving the collaboration structure, accept all changes and then redact, or completely remove all tracked changes and comments.
Word documents frequently contain embedded objects requiring specialized processing:
Images: Embedded images—screenshots, scanned documents, photos—may contain visible text with sensitive information. Our OCR extracts text from images, analyzes for PII, and applies visual redaction (blur, black box) to sensitive regions.
Charts and Graphs: Chart elements may include data labels, axis labels, or titles containing sensitive information. Text within charts is extracted and analyzed.
SmartArt: SmartArt diagrams with text content are analyzed, with text elements scanned for PII.
Embedded Objects: Embedded Excel charts, Visio diagrams, or other OLE objects are processed according to their type. Complex embedded objects can be flagged for manual review if automated processing isn't possible.
Word documents contain extensive hidden information beyond visible content:
Core Properties: Author name, last modified by, company name, creation date, modification date, and other standard document properties. These may reveal document origin or handling chain.
Custom Properties: User-defined properties that organizations may use for document management, which could contain sensitive classification or routing information.
Personal Information: Word's Document Inspector identifies personal information including author names, reviewer names, and email addresses embedded in the file.
Hidden Text: Word supports hidden text formatting that doesn't print but remains in the document. Hidden text may contain notes or information intended to be invisible.
Template Information: Documents created from templates may inherit metadata from the template, including author and company information.
Our processing supports the full range of Word document formats:
DOCX (Word 2007+): Modern Office Open XML format used by Word 2007, 2010, 2013, 2016, 2019, 2021, and Microsoft 365. XML-based structure enables precise manipulation.
DOC (Word 97-2003): Legacy binary format still common in archives and older systems. Full support for DOC processing with option to output as DOCX for enhanced compatibility.
DOCM (Macro-Enabled): DOCX with macros enabled. Macros can be preserved, stripped, or flagged depending on security requirements.
DOT/DOTX (Templates): Word template files can be processed to remove PII that might propagate to documents created from the template.
Word document redaction serves critical needs across industries:
Legal: Contracts, agreements, correspondence, and litigation documents frequently require redaction for production, filing, or sharing. Track changes handling is particularly important for legal documents with revision history.
Healthcare: Patient letters, reports, and correspondence in Word format require HIPAA-compliant redaction before sharing or archival.
Human Resources: Employment documents, policies, and employee communications often contain personal information requiring protection when sharing or archiving.
Finance: Reports, analyses, and correspondence containing client financial information require redaction for compliance and privacy.
Government: FOIA responses, reports, and official documents frequently use Word format and require systematic redaction before public release.
Multiple options for how redacted content appears in the output:
Placeholder Text: Replace with markers like [REDACTED], [NAME], [SSN] indicating data type removed.
Black Box: Replace with solid black rectangles matching the size of original text, traditional legal redaction style.
Highlighted Removal: Replace with highlighted blank space, showing where redaction occurred while maintaining document flow.
Complete Removal: Remove text entirely, adjusting document flow. Appropriate when redacted content shouldn't be evident.
Custom Styling: Apply specific fonts, colors, or formatting to redaction markers matching document style or organizational standards.
RedactionAPI has transformed our document processing workflow. We've reduced manual redaction time by 95% while achieving better accuracy than our previous manual process.
The API integration was seamless. Within a week, we had automated redaction running across all our customer support channels, ensuring GDPR compliance effortlessly.
We process over 50,000 legal documents monthly. RedactionAPI handles it all with incredible accuracy and speed. It's become an essential part of our legal tech stack.
The multi-language support is outstanding. We operate in 30 countries and RedactionAPI handles all our documents regardless of language with consistent accuracy.
Trusted by 500+ enterprises worldwide





We process Word documents at the XML level, modifying only the text content while preserving all formatting markup—fonts, styles, tables, paragraph settings, and document structure. The output document maintains identical appearance except for redacted content.
Yes, we scan all document layers including tracked changes (insertions, deletions), comments, and revision history. Each layer is analyzed separately, and sensitive data is redacted while preserving the track changes/comment structure if desired, or you can choose to remove all tracked changes.
Embedded images are extracted and processed with OCR to detect text content. Sensitive data in images is visually redacted (blur, black box) and the image is re-embedded. This includes screenshots, scanned documents, and photos within Word files.
Yes, Word documents contain extensive metadata—author name, company, creation date, last modified by, comments, revision history, and custom properties. We can remove all metadata or selectively strip specific fields.
Yes, we support both modern DOCX (Office 2007+) and legacy DOC (Office 97-2003) formats. Legacy DOC files are processed using appropriate libraries with full feature support. Output can be in the same format or converted to DOCX.
Tables, columns, and complex layouts are fully preserved. Text within table cells is analyzed and redacted like any other content. Table structure, cell formatting, merged cells, and borders all remain intact.