PDF Workflow Processing
Automated document processing with PHI detection, smart redaction, and secure API transmission
Overview
PDF workflows enable automated processing of healthcare documents with built-in HIPAA compliance controls. These workflows handle PHI detection, redaction, AI processing, validation, and secure transmission to external systems.
Tier 1: De-identified Processing
PHI is detected and redacted before AI processing. Suitable for documents containing patient information where identifiable data must be removed.
Tier 2: BAA-Covered Processing
Identifiable PHI is preserved and transmitted under Business Associate Agreement. Requires valid BAA status and customer acknowledgment.
Tier 1 Workflow: De-identified Processing
Tier 1 workflows remove all PHI from documents before AI processing, ensuring HIPAA compliance without requiring a Business Associate Agreement.
PHI Detection
AI scans document text for PHI entities (names, SSNs, dates of birth, phone numbers, etc.)
Smart Redaction
PHI replaced with tokens (TOKEN_001, TOKEN_002). Clinical terms preserved. Original values stored securely in Azure Key Vault.
AI Processing
Redacted document processed by AI (extraction, summarization, etc.). AI never sees original PHI values.
Output Validation
AI output scanned for residual PHI. Workflow blocked if unexpected PHI detected.
API Transmission
Validated, de-identified data transmitted to external API endpoints securely.
ā HIPAA Compliant: Tier 1 workflows are suitable for general use without BAA because all PHI is removed before external transmission.
Example Use Cases:
- Prior authorization document summarization
- Lab result extraction (de-identified)
- Clinical note categorization
- Medical record quality checks
Tier 2 Workflow: BAA-Covered Processing
Tier 2 workflows preserve identifiable PHI and transmit it to external systems under Business Associate Agreement coverage.
Compliance Gate
Workflow validates BAA is signed, not expired, Tier 2 enabled, and customer has acknowledged PHI processing.
PHI Detection
AI scans document to identify PHI entities, but does NOT redact them.
AI Processing
Document processed by AI with PHI intact. AI can access patient identifiers as needed.
Output Validation
AI output validated for data quality. PHI allowed in designated fields only.
Secure API Transmission
Identifiable data transmitted to external API with TLS 1.2+, authentication, and audit logging.
ā ļø BAA Required: Tier 2 workflows require valid Business Associate Agreement. See BAA Management Guide for setup.
Example Use Cases:
- FHIR data transmission to EHR systems
- Patient matching and deduplication
- Insurance claim submission
- Referral coordination with identifiable data
Smart Redaction Features
Smart redaction goes beyond simple text removal to preserve clinical context while protecting PHI.
Tokenization
PHI values replaced with reversible tokens (TOKEN_001, TOKEN_002) instead of [REDACTED].
Original: "John Doe, DOB 01/15/1980" Redacted: "TOKEN_001, DOB TOKEN_002"
Clinical Term Preservation
Medical terminology retained for accurate AI processing.
Original: "Patient has diabetes" Redacted: "TOKEN_001 has diabetes" ā NOT: "TOKEN_001 has TOKEN_002"
Date Granularity Options
Preserve year or month for clinical context while redacting day.
Original: "01/15/1980" Year: "****/**/1980" Month: "**/01/1980" Full: "TOKEN_003"
Key Vault Storage
Token-to-PHI mappings stored securely in Azure Key Vault, not database.
Enables PHI reinsertion after AI processing if needed for downstream workflows.
Preserved Clinical Terms:
diabetes, hypertension, asthma, COPD, cancer, stroke, depression, anxiety, arthritis, obesity, pneumonia, sepsis, metformin, lisinopril, atorvastatin, insulin, warfarin, prednisone, albuterol, levothyroxineAnd many more medical terms...
Output Validation & PHI Leakage Prevention
Output validation is the critical safety net that prevents accidental PHI disclosure in API transmissions.
Residual PHI Scanning
AI output re-scanned for PHI entities. If unexpected PHI detected, workflow blocked immediately.
Allowed PHI Fields (Tier 2)
Specify which fields can contain PHI (e.g., patientName, patientId). PHI in other fields triggers blocking.
Schema Validation
Optionally validate output against JSON schema to ensure data structure compliance.
Risk Assessment
Generates risk level (safe/review/block) based on PHI detection confidence and location.
// Validation Configuration Example
{
"scanForResidualPHI": true,
"allowedPHIFields": ["patientId", "patientName", "dateOfBirth"],
"blockOnUnexpectedPHI": true,
"schemaValidation": {
"type": "object",
"required": ["patientId", "diagnosis"],
"properties": {
"patientId": { "type": "string" },
"diagnosis": { "type": "string" }
}
}
}š Blocking Behavior: When validation fails, the workflow stops immediately and no data is transmitted. Review audit logs to investigate the cause.
API Endpoint Configuration
Configure external API endpoints for secure data transmission with authentication, TLS enforcement, and retry logic.
Security Features:
TLS 1.2+ Enforcement
All transmissions use modern TLS versions only
Authentication Methods
Bearer token, API key, OAuth 2.0, or mTLS certificate
Credential Storage
Secrets stored in Azure Key Vault, never in database
Exponential Backoff
Automatic retries with configurable delays (up to 10 attempts)
Audit & Compliance:
- Every transmission logged with timestamp, status code, and payload hash
- Response times and retry counts tracked for monitoring
- Transmission IDs linkable to workflow execution for full audit trail
- Failed transmissions logged with error details for troubleshooting
š” Admin Configuration: API endpoints managed by site admins at /system/api-endpoints. See API Endpoint Configuration Guide.
Creating Workflows
Build custom PDF workflows using the visual workflow builder with drag-and-drop nodes.
Available PDF Workflow Nodes:
PHI Detection
Scan text for PHI entities with confidence scoring
Smart Redaction
Redact PHI with tokenization and clinical term preservation
Output Validation
Scan for residual PHI and validate data quality
API Post
Securely transmit data to external API endpoints
Compliance Gate
Check BAA status and Tier 2 eligibility
š Learn More: See Workflow Builder Guide for step-by-step instructions on creating PDF workflows.