File Processing Guide
Upload and process PDF, DOCX, and XLSX files with automatic PHI detection
Supported File Types
Safe AI Workbench supports the following file formats:
PDF Documents
Text extraction from PDF files
Max size: 10 MB
Word Documents
DOCX format (Office 2007+)
Max size: 10 MB
Excel Spreadsheets
XLSX format (Office 2007+)
Max size: 10 MB
Uploading Files
Upload files using the /api/files/upload endpoint:
// JavaScript/TypeScript
const formData = new FormData();
formData.append('file', fileInput.files[0]);
const response = await fetch('/api/files/upload', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
// Note: Do NOT set Content-Type header - let browser set it
},
body: formData
});
const result = await response.json();
console.log(result);
// {
// "fileHash": "sha256_abc123...",
// "fileName": "contract.pdf",
// "fileSize": 245678,
// "mimeType": "application/pdf",
// "phiDetected": false,
// "uploadedAt": "2024-01-15T10:30:00Z"
// }💡 Important: Files are stored for 24 hours then automatically deleted for data minimization and HIPAA compliance.
Using Files with AI Tasks
After uploading, reference the file by its fileHash in AI chat requests:
// Step 1: Upload file (see above)
const { fileHash } = await uploadResponse.json();
// Step 2: Process file with AI task
const chatResponse = await fetch('/api/ai/chat', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
conversationId: 'new',
taskId: 'contract-summarizer',
fileHash: fileHash, // Reference the uploaded file
// variables: {} can be omitted when using fileHash
})
});
const result = await chatResponse.json();
console.log(result.completion);PHI Detection in Files
All uploaded files are automatically scanned for PHI before processing:
File Upload
Text extracted from PDF/DOCX/XLSX using appropriate parser
PHI Scan
Presidio analyzes extracted text for PHI entities
Upload Response
Returns fileHash and phiDetected flag immediately
AI Processing
Policy enforcement applied before sending to AI model
⚠️ Policy Enforcement: If a "block" policy triggers on file content, the AI request will be rejected even if the file uploaded successfully.
Upload Response Example
Successful upload response with PHI detection:
{
"fileHash": "sha256_f4d5e6a7b8c9d0e1f2a3b4c5d6e7f8a9",
"fileName": "employee_data.xlsx",
"fileSize": 156789,
"mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"phiDetected": true,
"phiEntities": [
{
"type": "PERSON",
"text": "Jane Doe",
"score": 0.92
},
{
"type": "US_SSN",
"text": "123-45-6789",
"score": 1.0
}
],
"uploadedAt": "2024-01-15T10:30:00Z",
"expiresAt": "2024-01-16T10:30:00Z" // 24 hours later
}File Size Limits
| File Type | Max Size | Processing Time |
|---|---|---|
| PDF (text-based) | 10 MB | 2-5 seconds |
| PDF (scanned/image) | 10 MB* | OCR not supported |
| DOCX | 10 MB | 1-3 seconds |
| XLSX | 10 MB | 2-4 seconds |
* Scanned PDFs require text layer from OCR software before upload
Error Handling
Common file upload errors and solutions:
413 Payload Too Large
File exceeds 10 MB limit
Solution: Compress file or split into smaller parts
400 Unsupported File Type
File format not supported (e.g., .doc, .txt, .csv)
Solution: Convert to PDF, DOCX, or XLSX
400 Text Extraction Failed
Unable to extract text (corrupted or image-only PDF)
Solution: Re-save PDF with text layer or use OCR tool
403 Policy Violation (Block)
File content triggered a "block" policy during AI request
Solution: Review policyViolations in response, redact content, or contact admin
Best Practices
Keep files under 5 MB when possible
Faster upload and processing times
Use text-based PDFs, not scanned images
OCR is not supported - ensure PDFs have selectable text
Check phiDetected flag before AI processing
Review PHI entities in upload response to understand content
Don't rely on files being available after 24 hours
Files auto-delete for security - re-upload if needed