Permanently redact sensitive data from PDF documents. Native text, scanned documents, forms, annotations, and embedded images - comprehensive PDF protection.
Every PDF layer protected
Process text-based PDFs with perfect accuracy. Extract text layers, detect sensitive data, apply permanent redaction.
OCR technology extracts text from scanned documents with 99%+ accuracy. Visual redaction applied to images.
Detect and redact sensitive data in fillable form fields, including hidden and pre-filled values.
Process images within PDFs separately. OCR and visual redaction for embedded graphics.
True data removal, not just visual overlay. Redacted content cannot be recovered or revealed.
Maintain bookmarks, links, metadata, and document structure. Professional output quality.
Simple integration, powerful results
Send your documents, text, or files through our secure API endpoint or web interface.
Our AI analyzes content to identify all sensitive information types with 99.7% accuracy.
Sensitive data is automatically redacted based on your configured compliance rules.
Receive your redacted content with full audit trail and compliance documentation.
Get started with just a few lines of code
import requests
api_key = "your_api_key"
url = "https://api.redactionapi.net/v1/redact"
data = {
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}
response = requests.post(url,
headers={"Authorization": f"Bearer {api_key}"},
json=data
)
print(response.json())
# Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
const axios = require('axios');
const apiKey = 'your_api_key';
const url = 'https://api.redactionapi.net/v1/redact';
const data = {
text: "John Smith's SSN is 123-45-6789",
redaction_types: ["ssn", "person_name"],
output_format: "redacted"
};
axios.post(url, data, {
headers: { 'Authorization': `Bearer ${apiKey}` }
})
.then(response => {
console.log(response.data);
// Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
});
curl -X POST https://api.redactionapi.net/v1/redact \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}'
# Response:
# {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
PDF (Portable Document Format) has become the standard for sharing documents while preserving formatting and layout. From legal contracts to medical records, from financial statements to government documents, PDFs contain vast amounts of sensitive information requiring protection. Effective PDF redaction demands understanding of the format's complexity and the difference between visual masking and true data removal.
Unlike simple image overlays, proper PDF redaction must permanently remove sensitive data from the document structure. A black box drawn over text might hide information visually, but the underlying text often remains accessible through copy/paste, search, or PDF editing tools. True redaction eliminates the data entirely, ensuring it cannot be recovered regardless of how the PDF is processed or examined.
PDFs can contain multiple types of content requiring different redaction approaches:
Text Layers: Native PDFs contain actual text data that can be searched, selected, and copied. This text must be removed from the document structure, not just covered visually.
Image Layers: Scanned documents store content as images without extractable text. OCR must first extract the text for detection, then visual redaction applied to the image.
Form Fields: Fillable PDFs contain form fields that may hold sensitive data in their values, default text, or field names.
Annotations: Comments, highlights, sticky notes, and other annotations can contain sensitive information separate from the main content.
Metadata: PDFs contain metadata including author name, creation date, modification history, and custom properties that may reveal sensitive information.
The approach to redaction differs significantly based on PDF type:
Native PDFs: Created directly from digital sources (Word, web pages, applications), native PDFs contain actual text data. Redaction involves removing text from the document structure and optionally placing a visual marker (black box, replacement text) where the text appeared.
Scanned PDFs: Created by scanning paper documents, these PDFs contain images without extractable text. Processing requires OCR to extract text for sensitive data detection, then applying visual redaction to the image pixels. The resulting PDF may contain a new text layer for searchability if desired.
Hybrid PDFs: Some PDFs combine native text with embedded images or scanned pages. Each component requires appropriate processing—text extraction for native portions, OCR for image portions.
True PDF redaction must satisfy several requirements to be legally defensible and practically effective:
Data Removal: The underlying text data must be removed from the PDF structure, not just visually obscured. This means eliminating the content from text streams, not adding an overlay.
Metadata Cleaning: Sensitive data may appear in document metadata, form field names, bookmark titles, or other structural elements. Comprehensive redaction addresses all locations.
Layer Flattening: Multiple PDF layers can hide content that becomes visible when layers are manipulated. Proper redaction flattens layers to prevent hidden content exposure.
Certification: For legal proceedings and compliance, redacted PDFs should be digitally certified to prove the document hasn't been modified since redaction.
Legal Discovery: E-discovery productions require redaction of privileged information and irrelevant PII before document production. Our batch processing handles large document sets efficiently.
FOIA Requests: Government agencies must redact exempted information from public records. Our FOIA-specific rules automate exemption identification and redaction.
Healthcare Records: Medical PDFs require HIPAA-compliant de-identification before sharing for research, audits, or legal proceedings.
Financial Documents: Bank statements, tax forms, and financial reports contain PII and account information requiring protection before sharing.
RedactionAPI has transformed our document processing workflow. We've reduced manual redaction time by 95% while achieving better accuracy than our previous manual process.
The API integration was seamless. Within a week, we had automated redaction running across all our customer support channels, ensuring GDPR compliance effortlessly.
We process over 50,000 legal documents monthly. RedactionAPI handles it all with incredible accuracy and speed. It's become an essential part of our legal tech stack.
The multi-language support is outstanding. We operate in 30 countries and RedactionAPI handles all our documents regardless of language with consistent accuracy.
Trusted by 500+ enterprises worldwide





Yes, our PDF redaction permanently removes the underlying data, not just visually covers it. The original text is completely eliminated from the PDF structure. Unlike simple black boxes that can be removed, our redaction cannot be reversed or the original content recovered.
Yes, our OCR engine extracts text from scanned PDFs with 99%+ accuracy, even from poor quality scans. We then apply visual redaction (black boxes, white-out, or blur) to the image pixels containing sensitive data. Both the extracted text data and visual representation are protected.
We scan all form fields including visible entries, hidden fields, and form metadata. Sensitive data in any field type (text, dropdown, checkbox comments) is detected and redacted. The form structure remains intact with redacted values replaced.
Embedded images are processed separately from text layers. We extract each image, run OCR if it contains text, detect sensitive data, apply visual redaction, and re-embed the redacted image. This handles screenshots, scanned pages within larger PDFs, and embedded graphics.
Yes, we preserve document structure including bookmarks, internal links, external links, table of contents, and navigation elements. Only the sensitive content is redacted; document functionality remains intact.
Yes, if you provide the document password, we can process protected PDFs. We support both user passwords (for viewing) and owner passwords (for editing). Output PDFs can be re-encrypted with your specified password if needed.