Safe AI Workbench Developer Docs

File Processing Guide

Upload and process PDF, DOCX, and XLSX files with automatic PHI detection

Supported File Types

Safe AI Workbench supports the following file formats:

PDF Documents

Text extraction from PDF files

Max size: 10 MB

Word Documents

DOCX format (Office 2007+)

Max size: 10 MB

Excel Spreadsheets

XLSX format (Office 2007+)

Max size: 10 MB

Uploading Files

Upload files using the /api/files/upload endpoint:

// JavaScript/TypeScript
const formData = new FormData();
formData.append('file', fileInput.files[0]);

const response = await fetch('/api/files/upload', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
    // Note: Do NOT set Content-Type header - let browser set it
  },
  body: formData
});

const result = await response.json();
console.log(result);
// {
//   "fileHash": "sha256_abc123...",
//   "fileName": "contract.pdf",
//   "fileSize": 245678,
//   "mimeType": "application/pdf",
//   "phiDetected": false,
//   "uploadedAt": "2024-01-15T10:30:00Z"
// }

💡 Important: Files are stored for 24 hours then automatically deleted for data minimization and HIPAA compliance.

Using Files with AI Tasks

After uploading, reference the file by its fileHash in AI chat requests:

// Step 1: Upload file (see above)
const { fileHash } = await uploadResponse.json();

// Step 2: Process file with AI task
const chatResponse = await fetch('/api/ai/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    conversationId: 'new',
    taskId: 'contract-summarizer',
    fileHash: fileHash,  // Reference the uploaded file
    // variables: {} can be omitted when using fileHash
  })
});

const result = await chatResponse.json();
console.log(result.completion);

PHI Detection in Files

All uploaded files are automatically scanned for PHI before processing:

File Upload

Text extracted from PDF/DOCX/XLSX using appropriate parser

PHI Scan

Presidio analyzes extracted text for PHI entities

Upload Response

Returns fileHash and phiDetected flag immediately

AI Processing

Policy enforcement applied before sending to AI model

⚠️ Policy Enforcement: If a "block" policy triggers on file content, the AI request will be rejected even if the file uploaded successfully.

Upload Response Example

Successful upload response with PHI detection:

{
  "fileHash": "sha256_f4d5e6a7b8c9d0e1f2a3b4c5d6e7f8a9",
  "fileName": "employee_data.xlsx",
  "fileSize": 156789,
  "mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
  "phiDetected": true,
  "phiEntities": [
    {
      "type": "PERSON",
      "text": "Jane Doe",
      "score": 0.92
    },
    {
      "type": "US_SSN",
      "text": "123-45-6789",
      "score": 1.0
    }
  ],
  "uploadedAt": "2024-01-15T10:30:00Z",
  "expiresAt": "2024-01-16T10:30:00Z"  // 24 hours later
}

File Size Limits

File TypeMax SizeProcessing Time
PDF (text-based)10 MB2-5 seconds
PDF (scanned/image)10 MB*OCR not supported
DOCX10 MB1-3 seconds
XLSX10 MB2-4 seconds

* Scanned PDFs require text layer from OCR software before upload

Error Handling

Common file upload errors and solutions:

413 Payload Too Large

File exceeds 10 MB limit

Solution: Compress file or split into smaller parts

400 Unsupported File Type

File format not supported (e.g., .doc, .txt, .csv)

Solution: Convert to PDF, DOCX, or XLSX

400 Text Extraction Failed

Unable to extract text (corrupted or image-only PDF)

Solution: Re-save PDF with text layer or use OCR tool

403 Policy Violation (Block)

File content triggered a "block" policy during AI request

Solution: Review policyViolations in response, redact content, or contact admin

Best Practices

Keep files under 5 MB when possible

Faster upload and processing times

Use text-based PDFs, not scanned images

OCR is not supported - ensure PDFs have selectable text

Check phiDetected flag before AI processing

Review PHI entities in upload response to understand content

Don't rely on files being available after 24 hours

Files auto-delete for security - re-upload if needed