Automatically detect and redact sensitive data in AWS S3 buckets. Event-driven processing with Lambda integration, serverless scaling, and comprehensive file format support.
Native AWS cloud integration
Direct integration with S3 via AWS SDK. Process objects without data egress to external systems.
Trigger redaction on S3 events—object creation, modification, or scheduled scans. Lambda-ready architecture.
Scale automatically with S3 event volume. No infrastructure to manage, pay only for processing.
Process all file types in S3—documents, images, CSVs, JSON, logs, and archives.
Redact objects in place or write to separate buckets. Preserve originals or replace with redacted versions.
Use IAM roles and policies for secure access. No credential storage required with assumed roles.
Simple integration, powerful results
Send your documents, text, or files through our secure API endpoint or web interface.
Our AI analyzes content to identify all sensitive information types with 99.7% accuracy.
Sensitive data is automatically redacted based on your configured compliance rules.
Receive your redacted content with full audit trail and compliance documentation.
Get started with just a few lines of code
import requests
api_key = "your_api_key"
url = "https://api.redactionapi.net/v1/redact"
data = {
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}
response = requests.post(url,
headers={"Authorization": f"Bearer {api_key}"},
json=data
)
print(response.json())
# Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
const axios = require('axios');
const apiKey = 'your_api_key';
const url = 'https://api.redactionapi.net/v1/redact';
const data = {
text: "John Smith's SSN is 123-45-6789",
redaction_types: ["ssn", "person_name"],
output_format: "redacted"
};
axios.post(url, data, {
headers: { 'Authorization': `Bearer ${apiKey}` }
})
.then(response => {
console.log(response.data);
// Output: {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
});
curl -X POST https://api.redactionapi.net/v1/redact \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "John Smith's SSN is 123-45-6789",
"redaction_types": ["ssn", "person_name"],
"output_format": "redacted"
}'
# Response:
# {"redacted_text": "[PERSON_NAME]'s SSN is [SSN_REDACTED]"}
Amazon S3 has become the default storage layer for modern applications, data lakes, and enterprise file systems. From application logs to customer documents, from data warehouse exports to machine learning datasets, S3 buckets accumulate vast amounts of data—including sensitive personal information that requires protection. As organizations face increasing privacy regulations and security requirements, the need to identify and protect sensitive data in cloud storage has become critical.
Our AWS S3 integration enables automated, event-driven redaction that processes sensitive data as it flows into S3 or scans existing buckets for historical data. Built on native AWS services, the integration scales automatically, requires no infrastructure management, and maintains data within your AWS environment for maximum security.
The S3 integration uses a serverless, event-driven architecture that aligns with AWS best practices:
Event-Driven Processing: S3 event notifications trigger processing when objects are created, modified, or copied. EventBridge routes events to Lambda functions that orchestrate redaction. This architecture processes data in near real-time as it arrives, preventing sensitive data from accumulating in raw form.
Lambda-Based Processing: Lambda functions handle event processing, API communication, and S3 operations. For standard documents, processing completes within Lambda's execution limits. Our Lambda layers provide optimized client libraries for efficient API communication.
Step Functions for Complex Workflows: For large files or complex processing requiring extended time, Step Functions orchestrate multi-step workflows. This enables processing beyond Lambda's 15-minute limit while maintaining serverless benefits.
VPC Integration: For maximum security, processing can run within your VPC with private endpoints to S3 and our API. This keeps data off the public internet while maintaining full functionality.
Multiple event patterns support different use cases:
Object Creation: Process objects immediately when uploaded via s3:ObjectCreated events. Supports PUT, POST, COPY, and multipart upload completion. This enables real-time protection of newly ingested data.
Object Modification: Detect and process when objects are replaced or versioned. s3:ObjectCreated events trigger for replacements, while versioning enables processing of all versions.
Scheduled Scanning: CloudWatch Events or EventBridge schedules trigger periodic scans of existing objects. Useful for initial bucket processing or periodic re-scanning with updated rules.
Manual Triggers: API endpoints enable on-demand processing of specific objects or prefixes. Useful for ad-hoc processing needs or integration with other workflows.
S3 Batch Operations: For large-scale processing of existing buckets, S3 Batch Operations invoke Lambda functions for each object in an inventory report. This efficiently processes millions of objects with built-in retry and reporting.
S3 buckets contain diverse file types, each requiring appropriate processing:
Documents: PDF, Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and other Office formats. Text is extracted, analyzed, and redacted with format preservation.
Images: JPEG, PNG, TIFF, and other image formats. OCR extracts text for analysis, then visual redaction (blur, black box) obscures sensitive regions.
Data Files: CSV, JSON, XML, Parquet, and other structured formats. Schema-aware processing enables field-level redaction while maintaining data structure.
Log Files: Application logs, access logs, and audit trails. Pattern-based detection identifies PII in unstructured log entries.
Archives: ZIP, TAR, GZIP archives are extracted, contents processed, and repackaged with redacted files.
Text Files: Plain text, markdown, code files, and other text-based content with direct pattern matching and redaction.
Flexible output handling supports various operational models:
In-Place Replacement: Redacted objects replace originals at the same key. Optionally preserve originals in a separate "quarantine" bucket or with lifecycle policies for deletion.
Separate Bucket: Write redacted objects to a designated output bucket while preserving originals. Maintains audit trail and enables comparison.
Prefix-Based Routing: Organize redacted objects under a specific prefix (e.g., /redacted/) within the same bucket. Enables IAM policies restricting access to unredacted originals.
Metadata Annotation: Add object metadata tags indicating redaction status, types detected, and processing timestamp. Enables filtering and searching of processed objects.
Manifest Generation: Generate processing manifests listing objects processed, detections found, and actions taken. Supports compliance documentation and audit requirements.
Security is paramount for data protection processing:
IAM Roles: Processing uses IAM roles with least-privilege permissions. Roles are assumed by Lambda, eliminating credential management. Cross-account access uses role assumption for secure multi-account deployments.
Encryption: Support for S3 server-side encryption (SSE-S3, SSE-KMS, SSE-C) and client-side encryption. Objects remain encrypted at rest; processing works with encrypted data transparently.
VPC Endpoints: S3 and API communication via VPC endpoints keeps traffic off public internet. PrivateLink enables fully private connectivity.
KMS Integration: Use KMS keys for encryption key management. Our processing integrates with KMS for decryption during processing and re-encryption of output.
CloudTrail Logging: All S3 operations and API calls are logged to CloudTrail for security audit and compliance documentation.
Comprehensive monitoring ensures reliable operation:
CloudWatch Metrics: Processing emits metrics for objects processed, detections found, errors encountered, and processing latency. Dashboards visualize processing volume and health.
CloudWatch Logs: Lambda execution logs capture processing details. Log Insights queries enable analysis of processing patterns and troubleshooting.
SNS Notifications: Configure notifications for processing completion, errors, or threshold alerts. Integration with PagerDuty, Slack, or email for operations alerting.
X-Ray Tracing: Distributed tracing through the processing pipeline enables performance analysis and bottleneck identification.
Processing Reports: Daily/weekly reports summarize processing activity, detection patterns, and compliance metrics. Delivered to S3 or email.
The S3 integration supports diverse data protection scenarios:
Data Lake Protection: As data flows into data lakes from various sources, redaction ensures sensitive information is protected before analysis. Enables analytics on de-identified data while maintaining privacy.
Log Sanitization: Application and access logs often contain IP addresses, user identifiers, and other PII. Automated redaction sanitizes logs before long-term retention or sharing with external parties.
Document Management: Customer documents, contracts, and correspondence accumulate in S3-backed document systems. Redaction protects sensitive content while maintaining document accessibility.
Data Export Preparation: Before exporting data for partners, researchers, or public release, redaction removes personal information. Automated processing ensures consistent protection across exports.
Backup Protection: Backups in S3 may contain sensitive data. Processing backup buckets ensures protection extends to backup copies, reducing breach impact if backups are compromised.
ML/AI Data Preparation: Machine learning datasets often derive from production data containing PII. Redaction creates training datasets with personal information removed, enabling compliant ML development.
Flexible deployment models suit different organizational needs:
CloudFormation Templates: Deploy the complete integration stack via CloudFormation. Templates create Lambda functions, event configurations, IAM roles, and supporting resources.
Terraform Modules: For infrastructure-as-code with Terraform, modules provide equivalent deployment capabilities with HCL configuration.
AWS SAM: Serverless Application Model templates enable familiar serverless deployment patterns with local testing capabilities.
CDK Constructs: Cloud Development Kit constructs for TypeScript, Python, or Java enable programmatic infrastructure definition.
Marketplace Deployment: One-click deployment through AWS Marketplace simplifies initial setup with pre-configured best practices.
The serverless architecture aligns costs with actual usage:
Pay-Per-Processing: Lambda charges only for execution time. No idle infrastructure costs during low-activity periods.
Reserved Capacity: For predictable high-volume processing, Lambda provisioned concurrency reduces cold starts and provides consistent performance.
Intelligent Tiering: Process objects regardless of S3 storage class. Intelligent Tiering optimizes storage costs while maintaining processing capability.
Selective Processing: Configure filters to process only relevant objects—specific prefixes, file types, or size ranges—avoiding unnecessary processing costs.
RedactionAPI has transformed our document processing workflow. We've reduced manual redaction time by 95% while achieving better accuracy than our previous manual process.
The API integration was seamless. Within a week, we had automated redaction running across all our customer support channels, ensuring GDPR compliance effortlessly.
We process over 50,000 legal documents monthly. RedactionAPI handles it all with incredible accuracy and speed. It's become an essential part of our legal tech stack.
The multi-language support is outstanding. We operate in 30 countries and RedactionAPI handles all our documents regardless of language with consistent accuracy.
Trusted by 500+ enterprises worldwide





Our integration uses S3 event notifications to trigger processing when objects are created or modified. A Lambda function receives the event, retrieves the object, sends it for redaction processing, and writes the redacted version back to S3. Processing can be in-place (replacing original) or to a separate bucket.
You can deploy our processing within your AWS account using our Lambda layers or containers. In this configuration, data stays within your AWS environment. Alternatively, our API processes data in transit over encrypted connections and doesn't retain content after processing.
For large files, we use S3's multipart capabilities for efficient transfer. Files are processed in streaming mode where possible, with chunked processing for very large documents. Lambda timeout limits are managed through Step Functions for extended processing.
Yes, we support both event-driven processing for new objects and batch scanning of existing buckets. S3 Inventory integration enables efficient scanning of large buckets. You can process entire buckets, specific prefixes, or objects matching certain criteria.
Pricing is based on processing volume—the amount of data redacted. There are no infrastructure costs on our side as processing uses serverless architecture. You pay standard AWS costs for Lambda execution, S3 requests, and data transfer within AWS.
Beyond S3, we integrate with Lambda for processing, EventBridge for event routing, Step Functions for workflows, SNS/SQS for notifications, CloudWatch for monitoring, IAM for security, KMS for encryption, and CloudTrail for auditing.