Safe AI Workbench Developer Docs

PHI Detection Guide

Learn how Safe AI Workbench automatically detects and protects Protected Health Information

What is PHI Detection?

PHI (Protected Health Information) detection is a safety feature that automatically scans content before AI processing to identify sensitive healthcare data including:

  • Patient Names - Full names of individuals
  • Medical Record Numbers (MRN) - Hospital/clinic patient identifiers
  • Social Security Numbers - SSNs in any format
  • Dates - Birthdates, admission dates, discharge dates
  • Addresses - Street addresses, cities, ZIP codes
  • Phone Numbers - Any phone number format
  • Email Addresses - Personal email addresses
  • Account Numbers - Insurance policy numbers, account IDs

How It Works

Safe AI Workbench uses Microsoft Presidio, a context-aware, open-source PII detection engine:

1

Content Analysis

Presidio scans input text using NLP and pattern recognition

2

Entity Recognition

Identifies PHI entities with confidence scores (0.0 to 1.0)

3

Regex Fallback

Custom regex patterns catch additional PHI types (MRNs, etc.)

4

Policy Enforcement

Detected PHI triggers warn, block, or redact actions based on policy

API Response with PHI Detected

When PHI is detected, the response includes detailed information:

{
  "completion": "Summary: The patient discussed in the document...",
  "conversationId": "conv_abc123",
  "phiDetected": true,
  "phiEntities": [
    {
      "type": "PERSON",
      "text": "John Smith",
      "start": 15,
      "end": 25,
      "score": 0.95
    },
    {
      "type": "US_SSN",
      "text": "123-45-6789",
      "start": 50,
      "end": 61,
      "score": 1.0
    }
  ],
  "policyViolations": [
    {
      "policyName": "Block SSN in Content",
      "action": "block",
      "message": "Content blocked due to SSN detection"
    }
  ]
}

⚠️ Important: If a policy is set to "block", the AI completion will not be generated and the request will return the policy violation details.

Working with File Uploads

PHI detection also works on uploaded files (PDF, DOCX, XLSX):

// 1. Upload file
const formData = new FormData();
formData.append('file', fileInput.files[0]);

const uploadResponse = await fetch('/api/files/upload', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: formData
});

const { fileHash } = await uploadResponse.json();

// 2. Process with AI (PHI detection runs automatically)
const chatResponse = await fetch('/api/ai/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    conversationId: 'new',
    taskId: 'document-summarizer',
    fileHash: fileHash
  })
});

const result = await chatResponse.json();
// Check result.phiDetected and result.phiEntities

Best Practices

Always check phiDetected flag

Review the response to understand what PHI was found

Configure policies appropriately

Set warn/block/redact policies based on your compliance requirements

Review audit logs

All PHI detections are logged for compliance auditing

Don't disable PHI detection

It runs automatically on all requests - this is a core safety feature