Safe AI Workbench Developer Docs

PDF Workflow Processing

Automated document processing with PHI detection, smart redaction, and secure API transmission

Overview

PDF workflows enable automated processing of healthcare documents with built-in HIPAA compliance controls. These workflows handle PHI detection, redaction, AI processing, validation, and secure transmission to external systems.

Tier 1: De-identified Processing

PHI is detected and redacted before AI processing. Suitable for documents containing patient information where identifiable data must be removed.

Tier 2: BAA-Covered Processing

Identifiable PHI is preserved and transmitted under Business Associate Agreement. Requires valid BAA status and customer acknowledgment.

Tier 1 Workflow: De-identified Processing

Tier 1 workflows remove all PHI from documents before AI processing, ensuring HIPAA compliance without requiring a Business Associate Agreement.

1

PHI Detection

AI scans document text for PHI entities (names, SSNs, dates of birth, phone numbers, etc.)

2

Smart Redaction

PHI replaced with tokens (TOKEN_001, TOKEN_002). Clinical terms preserved. Original values stored securely in Azure Key Vault.

3

AI Processing

Redacted document processed by AI (extraction, summarization, etc.). AI never sees original PHI values.

4

Output Validation

AI output scanned for residual PHI. Workflow blocked if unexpected PHI detected.

5

API Transmission

Validated, de-identified data transmitted to external API endpoints securely.

āœ… HIPAA Compliant: Tier 1 workflows are suitable for general use without BAA because all PHI is removed before external transmission.

Example Use Cases:

  • Prior authorization document summarization
  • Lab result extraction (de-identified)
  • Clinical note categorization
  • Medical record quality checks

Tier 2 Workflow: BAA-Covered Processing

Tier 2 workflows preserve identifiable PHI and transmit it to external systems under Business Associate Agreement coverage.

1

Compliance Gate

Workflow validates BAA is signed, not expired, Tier 2 enabled, and customer has acknowledged PHI processing.

2

PHI Detection

AI scans document to identify PHI entities, but does NOT redact them.

3

AI Processing

Document processed by AI with PHI intact. AI can access patient identifiers as needed.

4

Output Validation

AI output validated for data quality. PHI allowed in designated fields only.

5

Secure API Transmission

Identifiable data transmitted to external API with TLS 1.2+, authentication, and audit logging.

āš ļø BAA Required: Tier 2 workflows require valid Business Associate Agreement. See BAA Management Guide for setup.

Example Use Cases:

  • FHIR data transmission to EHR systems
  • Patient matching and deduplication
  • Insurance claim submission
  • Referral coordination with identifiable data

Smart Redaction Features

Smart redaction goes beyond simple text removal to preserve clinical context while protecting PHI.

Tokenization

PHI values replaced with reversible tokens (TOKEN_001, TOKEN_002) instead of [REDACTED].

Original: "John Doe, DOB 01/15/1980"
Redacted: "TOKEN_001, DOB TOKEN_002"

Clinical Term Preservation

Medical terminology retained for accurate AI processing.

Original: "Patient has diabetes"
Redacted: "TOKEN_001 has diabetes"
āŒ NOT: "TOKEN_001 has TOKEN_002"

Date Granularity Options

Preserve year or month for clinical context while redacting day.

Original: "01/15/1980"
Year: "****/**/1980"
Month: "**/01/1980"
Full: "TOKEN_003"

Key Vault Storage

Token-to-PHI mappings stored securely in Azure Key Vault, not database.

Enables PHI reinsertion after AI processing if needed for downstream workflows.

Preserved Clinical Terms:

diabetes, hypertension, asthma, COPD, cancer, stroke, depression, anxiety, arthritis, obesity, pneumonia, sepsis, metformin, lisinopril, atorvastatin, insulin, warfarin, prednisone, albuterol, levothyroxine

And many more medical terms...

Output Validation & PHI Leakage Prevention

Output validation is the critical safety net that prevents accidental PHI disclosure in API transmissions.

Residual PHI Scanning

AI output re-scanned for PHI entities. If unexpected PHI detected, workflow blocked immediately.

Allowed PHI Fields (Tier 2)

Specify which fields can contain PHI (e.g., patientName, patientId). PHI in other fields triggers blocking.

Schema Validation

Optionally validate output against JSON schema to ensure data structure compliance.

Risk Assessment

Generates risk level (safe/review/block) based on PHI detection confidence and location.

// Validation Configuration Example
{
  "scanForResidualPHI": true,
  "allowedPHIFields": ["patientId", "patientName", "dateOfBirth"],
  "blockOnUnexpectedPHI": true,
  "schemaValidation": {
    "type": "object",
    "required": ["patientId", "diagnosis"],
    "properties": {
      "patientId": { "type": "string" },
      "diagnosis": { "type": "string" }
    }
  }
}

šŸ›‘ Blocking Behavior: When validation fails, the workflow stops immediately and no data is transmitted. Review audit logs to investigate the cause.

API Endpoint Configuration

Configure external API endpoints for secure data transmission with authentication, TLS enforcement, and retry logic.

Security Features:

TLS 1.2+ Enforcement

All transmissions use modern TLS versions only

Authentication Methods

Bearer token, API key, OAuth 2.0, or mTLS certificate

Credential Storage

Secrets stored in Azure Key Vault, never in database

Exponential Backoff

Automatic retries with configurable delays (up to 10 attempts)

Audit & Compliance:

  • Every transmission logged with timestamp, status code, and payload hash
  • Response times and retry counts tracked for monitoring
  • Transmission IDs linkable to workflow execution for full audit trail
  • Failed transmissions logged with error details for troubleshooting

šŸ’” Admin Configuration: API endpoints managed by site admins at /system/api-endpoints. See API Endpoint Configuration Guide.

Creating Workflows

Build custom PDF workflows using the visual workflow builder with drag-and-drop nodes.

Available PDF Workflow Nodes:

šŸ›”ļø

PHI Detection

Scan text for PHI entities with confidence scoring

šŸ”’

Smart Redaction

Redact PHI with tokenization and clinical term preservation

āœ…

Output Validation

Scan for residual PHI and validate data quality

šŸ“¤

API Post

Securely transmit data to external API endpoints

šŸ”

Compliance Gate

Check BAA status and Tier 2 eligibility

šŸ“– Learn More: See Workflow Builder Guide for step-by-step instructions on creating PDF workflows.