Safe AI Workbench Developer Docs

Policy Management Guide

Configure custom policies to control how sensitive content is handled before AI processing

What are Policies?

Policies are rules that automatically enforce safety controls on AI requests. Each policy can:

  • Detect sensitive patterns - Use regex or keyword matching to find content
  • Apply actions - Warn, block, redact, or allow content based on severity
  • Customize by group - Different policies for different teams or departments
  • Audit compliance - All policy triggers are logged for review

Policy Actions

Warn

Allow the request but flag the policy violation in the response. Use for monitoring without blocking.

Block

Reject the request immediately. AI processing does not occur. Use for critical violations.

Redact

Replace sensitive content with [REDACTED] before AI processing. Balances safety and functionality.

Allow

Explicitly permit content even if it matches other patterns. Use for exceptions and overrides.

Creating a Policy

Use the Admin Dashboard or API to create policies:

POST /api/admin/policies
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "name": "Block Social Security Numbers",
  "description": "Prevent SSNs from being sent to AI models",
  "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
  "isRegex": true,
  "action": "block",
  "enabled": true,
  "groupId": null  // null = applies to all groups
}

💡 Tip: Test regex patterns at regex101.com before deploying.

Example Policies

Block: Credit Card Numbers

{
  "name": "Block Credit Cards",
  "pattern": "\\b(?:\\d{4}[- ]?){3}\\d{4}\\b",
  "isRegex": true,
  "action": "block"
}

Redact: Patient Names

{
  "name": "Redact Patient Names",
  "pattern": "patient\\s+(?:name|id):\\s*([A-Za-z ]+)",
  "isRegex": true,
  "action": "redact"
}

Warn: Profanity

{
  "name": "Warn on Profanity",
  "pattern": "\\b(damn|hell|crap)\\b",
  "isRegex": true,
  "action": "warn"
}

Allow: De-identified Data

{
  "name": "Allow De-identified IDs",
  "pattern": "PATIENT-[0-9]{6}",
  "isRegex": true,
  "action": "allow"
}

Policy Evaluation Order

Policies are evaluated in this order:

1

Allow Policies First

If any allow policy matches, skip remaining checks

2

Block Policies Next

If any block policy matches, reject immediately

3

Redact Policies

Apply all matching redaction rules to content

4

Warn Policies Last

Flag violations in response but allow processing

Response with Policy Violations

When policies are triggered, the response includes violation details:

{
  "completion": null,  // null when blocked
  "policyViolations": [
    {
      "policyId": "pol_abc123",
      "policyName": "Block Social Security Numbers",
      "action": "block",
      "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
      "matches": ["123-45-6789"],
      "message": "Request blocked due to SSN detection"
    }
  ],
  "conversationId": "conv_xyz789"
}

Best Practices

Start with warn policies

Monitor patterns before blocking to avoid false positives

Use specific regex patterns

Overly broad patterns cause false positives and user frustration

Document policy intent

Clear descriptions help administrators understand and maintain policies

Review audit logs regularly

Analytics show which policies are triggering and how often