Safe AI Workbench Developer Docs

File Processing API

Upload PDF, DOCX, and XLSX files with automatic PHI detection and 24-hour retention

POST /api/files/upload

Upload a file for AI processing. Files are automatically scanned for PHI and retained for 24 hours before automatic deletion.

Supported Formats

  • PDF (text-based, not scanned images)
  • DOCX (Microsoft Word 2007+)
  • XLSX (Microsoft Excel 2007+)

Max file size: 10 MB

Request Format

Use multipart/form-data encoding:

FieldTypeRequiredDescription
filebinaryYesThe file to upload (PDF, DOCX, or XLSX)

Example Request

// JavaScript
const formData = new FormData();
formData.append('file', fileInput.files[0]);

const response = await fetch('/api/files/upload', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
    // Note: Do NOT set Content-Type - browser sets it automatically
  },
  body: formData
});

const result = await response.json();
# cURL
curl -X POST /api/files/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/document.pdf"

Response (200 OK)

FieldTypeDescription
fileHashstringSHA-256 hash of the file (use in /api/ai/chat requests)
fileNamestringOriginal filename
fileSizenumberFile size in bytes
mimeTypestringMIME type (application/pdf, etc.)
phiDetectedbooleanWhether PHI was detected in the file content
phiEntitiesarrayList of detected PHI entities with type and confidence
uploadedAtstringISO-8601 timestamp of upload
expiresAtstringISO-8601 timestamp when file will be deleted (24 hours)

Example Response

{
  "fileHash": "sha256_f4d5e6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5",
  "fileName": "contract.pdf",
  "fileSize": 245678,
  "mimeType": "application/pdf",
  "phiDetected": false,
  "phiEntities": [],
  "uploadedAt": "2024-01-15T10:30:00Z",
  "expiresAt": "2024-01-16T10:30:00Z"
}

Response with PHI Detection

{
  "fileHash": "sha256_abc123def456...",
  "fileName": "employee_roster.xlsx",
  "fileSize": 89234,
  "mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
  "phiDetected": true,
  "phiEntities": [
    {
      "type": "PERSON",
      "text": "Jane Doe",
      "score": 0.95
    },
    {
      "type": "US_SSN",
      "text": "123-45-6789",
      "score": 1.0
    },
    {
      "type": "PHONE_NUMBER",
      "text": "(555) 123-4567",
      "score": 0.98
    }
  ],
  "uploadedAt": "2024-01-15T11:00:00Z",
  "expiresAt": "2024-01-16T11:00:00Z"
}

⚠️ Important: PHI detection during upload is informational only. Policy enforcement happens when the file is processed via /api/ai/chat.

Error Responses

400 Bad Request - Unsupported File Type

{ "error": "Unsupported file type. Use PDF, DOCX, or XLSX" }

400 Bad Request - Text Extraction Failed

{ "error": "Could not extract text from PDF. Ensure file is not image-only" }

413 Payload Too Large

{ "error": "File size exceeds 10 MB limit" }

401 Unauthorized

{ "error": "Authentication required" }

429 Too Many Requests

{ "error": "Rate limit exceeded. Max 50 uploads/minute" }

GET /api/files/:fileHash

Retrieve metadata about a previously uploaded file.

GET /api/files/sha256_abc123...
Authorization: Bearer YOUR_API_KEY

Response: Same structure as upload response

Using Files with AI Tasks

After uploading a file, use the fileHash in your AI chat request:

// Step 1: Upload file
const uploadRes = await fetch('/api/files/upload', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: formData
});
const { fileHash } = await uploadRes.json();

// Step 2: Process with AI
const chatRes = await fetch('/api/ai/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    conversationId: 'new',
    taskId: 'contract-summarizer',
    fileHash: fileHash
  })
});

File Retention Policy

24-Hour Auto-Deletion

All uploaded files are automatically deleted 24 hours after upload for data minimization and HIPAA compliance. The expiresAt timestamp in the response indicates when deletion will occur.

  • Files are stored encrypted at rest in Azure Blob Storage
  • File content is never logged or retained beyond 24 hours
  • After expiration, fileHash lookups will return 404 Not Found
  • Re-upload the file if needed after expiration