raipii API
raipii is a REST API that detects and sanitizes PII in text before it reaches your LLM. Two calls wrap your existing LLM workflow — one before (sanitize), one after (restore). No infrastructure changes required.
Base URL for all requests:
https://api.raipii.com
All requests use HTTPS. Request and response bodies are JSON. All endpoints require authentication.
Authentication
Pass your API key as a Bearer token in every request.
curl -X POST https://api.raipii.com/v1/sanitize \
-H "Authorization: Bearer ps_live_..." \
-H "Content-Type: application/json" \
-d '{"text": "Hello John Smith"}'| Status | Meaning |
|---|---|
| 401 | Missing or invalid API key |
| 403 | Session belongs to a different account |
Quick start
Three API calls — sanitize, your LLM, restore.
import raipii, openai
ps = raipii.Raipii(api_key="ps_live_...")
oai = openai.OpenAI()
prompt = "Help John Smith (john@acme.com, SSN 392-45-7810) with his claim."
# 1. Sanitize — strip PII before sending to your LLM
result = ps.sanitize(prompt, mode="fake_substitute")
# → "Help Michael Torres (m.torres@email.net, SSN 847-23-1956)..."
# 2. Call your LLM with clean text
reply = oai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": result.sanitized_text}],
).choices[0].message.content
# 3. Restore original values in the response
final = ps.restore(reply, result.session_id)
print(final.restored_text) # John Smith's real data is backPOST /v1/sanitize
Detects PII in text and replaces it according to the chosen mode. Returns a session_id for later restoration.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | required | The text to sanitize. Max 100,000 characters. |
| mode | string | optional | token (default) | fake_substitute | redact |
| entities | string[] | optional | Limit detection to specific entity types. Detects all if omitted. |
| session_ttl | integer | optional | Session expiry in seconds. Default 3600 (1 hr). Max 86400. |
| conversation_id | string | optional | Link to a multi-turn conversation session for consistent substitutions. |
| confidence_threshold | float | optional | Override the detection confidence threshold (0.0–1.0). Default 0.85. |
Response
{
"session_id": "ps_sess_abc123...",
"conversation_id": null,
"sanitized_text": "Call [PERSON_1] at [EMAIL_1]",
"entities_found": [
{
"type": "PERSON",
"original": "John Smith",
"replacement": "[PERSON_1]",
"position": [5, 15],
"confidence": 0.99
},
{
"type": "EMAIL",
"original": "john@acme.com",
"replacement": "[EMAIL_1]",
"position": [19, 32],
"confidence": 1.0
}
],
"char_count": 32,
"usage": { "chars_billed": 32 }
}Response fields
| Field | Type | Description |
|---|---|---|
| session_id | string | Pass to /v1/restore to reverse substitutions. |
| sanitized_text | string | Text with PII replaced. |
| entities_found | array | Each detected entity: type, original, replacement, position, confidence. |
| char_count | integer | Characters in the input text. |
| usage.chars_billed | integer | Characters billed against your monthly limit. |
| conversation_id | string | null | Echo of the conversation_id passed in, or null. |
POST /v1/restore
Reverses the substitutions made by /v1/sanitize, replacing tokens or synthetic values in the LLM response with their original PII.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | required | The LLM response text containing tokens or synthetic values. |
| session_id | string | required | The session_id returned by the corresponding sanitize call. |
Response
{
"restored_text": "Call John Smith at john@acme.com",
"substitutions_reversed": 2,
"usage": { "chars_billed": 32 }
}POST /v1/detect
Scans text for PII and returns detected entities with types, positions, and confidence scores. Does not modify the text. Useful for auditing and risk assessment.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | required | Text to scan for PII. |
| entities | string[] | optional | Limit detection to specific entity types. |
| confidence_threshold | float | optional | Override the detection confidence threshold. |
Response
{
"entities_found": [
{
"type": "US_SSN",
"value": "392-45-7810",
"confidence": 1.0,
"position": [10, 21],
"detection_method": "structured"
}
],
"pii_detected": true,
"risk_level": "HIGH",
"usage": { "chars_billed": 21 }
}Risk levels
| Level | Triggered by |
|---|---|
| HIGH | SSN, credit card, medical record number, bank account, tax ID |
| MEDIUM | Person name, email, date of birth, address |
| LOW | Any other detected entity type |
| NONE | No PII detected |
POST /v1/conversations
Creates a multi-turn conversation session. Pass the returned conversation_id to /v1/sanitize calls so the same real entity always maps to the same synthetic value across all turns.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
| ttl | integer | optional | Session lifetime in seconds. Default 86400 (24 hr). |
| metadata | object | optional | Arbitrary key-value pairs stored with the conversation. |
Response
{
"conversation_id": "ps_conv_xyz...",
"expires_at": "2026-04-11T12:00:00"
}Sanitize modes
tokendefaultReplaces each entity with a labelled placeholder. The LLM sees the type but not the value. Fully reversible.
Input
Call John Smith at john@acme.com
Output
Call [PERSON_1] at [EMAIL_1]
fake_substitutebest qualityReplaces each entity with a realistic synthetic value. The LLM sees natural data and produces higher-quality output. Fully reversible.
Input
Call John Smith at john@acme.com
Output
Call Michael Torres at m.torres@email.net
redactone-wayReplaces each entity with [REDACTED]. No restore possible — use when the LLM response must never reference PII.
Input
Call John Smith at john@acme.com
Output
Call [REDACTED] at [REDACTED]
Label neutralization
After substituting values, raipii also rewrites sensitive context phrases to prevent safety refusals — even when the actual values have been replaced.
| Original phrase | Rewritten as |
|---|---|
| SSN / social security number | ID number |
| credit card number | account number |
| date of birth / DOB | date |
| bank account | account reference |
| passport number | document number |
| driver's license | document number |
| tax ID / EIN | reference number |
Entity types
Supported entity types. Pass any of these in the entities array to limit detection.
| Type | Example | Tier |
|---|---|---|
| PERSON | John Smith | All tiers |
| john@acme.com | All tiers | |
| PHONE | 555-867-5309 | All tiers |
| US_SSN | 392-45-7810 | All tiers |
| DATE_OF_BIRTH | 03/14/1985 | All tiers |
| ADDRESS | 742 Evergreen Terrace, Springfield IL | All tiers |
| IP_ADDRESS | 192.168.1.1 | All tiers |
| MEDICAL_RECORD_NUMBER | MRN 00123456 | All tiers |
| TAX_ID | 12-3456789 | All tiers |
| IBAN | GB29 NWBK 6016 1331 9268 19 | All tiers |
| JWT | eyJhbGci... | All tiers |
| AWS_KEY | AKIA... | All tiers |
| CREDIT_CARD | 4111 1111 1111 1111 | Growth+ |
| BANK_ACCOUNT | 123456789012 | Growth+ |
| PASSPORT | A12345678 | Growth+ |
| DRIVERS_LICENSE | A1234567 | Growth+ |
| NPI | NPI 1234567890 | Growth+ |
Multi-turn conversations
Without a conversation session, each /v1/sanitize call generates independent substitutions. The same real name may map to different synthetic values across turns.
Create a conversation session once and pass its ID to all sanitize calls. raipii ensures the same entity always maps to the same synthetic value for the lifetime of the conversation.
conv = ps.conversations.create(ttl=3600)
# Turn 1 — "John Smith" → "Michael Torres"
turn1 = ps.sanitize(
"My name is John Smith.",
mode="fake_substitute",
conversation_id=conv.conversation_id,
)
# Turn 2 — "John Smith" → same "Michael Torres" from turn 1
turn2 = ps.sanitize(
"Tell me more about John Smith.",
mode="fake_substitute",
conversation_id=conv.conversation_id,
)HIPAA mode
HIPAA mode ensures no text is sent to any external service during detection. All analysis runs entirely within your AWS region using local engines only — no data ever leaves your region.
HIPAA mode is enabled by default on the Starter tier and available as a toggle for Business tier accounts. It reliably detects all structured PHI — SSNs, medical record numbers, dates of birth, addresses, contact information — as well as names and contextual entities via local NLP. No external cloud services are called.
Errors
All errors return a JSON body with an error field.
{ "error": "Monthly character limit exceeded" }| Status | Meaning | How to handle |
|---|---|---|
| 400 | Bad request — missing or invalid field | Check request body |
| 401 | Invalid or missing API key | Check Authorization header |
| 402 | Monthly character limit exceeded | Upgrade plan at raipii.com |
| 403 | Feature not available on current tier | Upgrade to Growth or Business tier |
| 404 | Session not found or expired | Re-sanitize the original text |
| 429 | Too many requests | Back off and retry — SDKs do this automatically |
| 503 | Service temporarily unavailable | Retry after a short delay — SDKs do this automatically |
Retry & timeouts
Both SDKs automatically retry on 429 and 503 with exponential backoff. If calling the HTTP API directly, implement your own backoff.
| Attempt | Delay |
|---|---|
| 1st retry | 1 second |
| 2nd retry | 2 seconds |
| 3rd retry (final) | 4 seconds |
Default request timeout is 30 seconds. Configurable via SDK options.
Caveats
Session expiry
Sessions expire after session_ttl seconds (default 3600 — 1 hr). Always call /v1/restore promptly after receiving the LLM response. Expired sessions return 404. Pass a larger session_ttl to extend — max 3600 (Starter), 86400 (Growth), 604800 (Business).
redact mode has no restore
When using redact mode, original values are not stored. Calling /v1/restore on a redact session returns the text unchanged. Use token or fake_substitute if you need to restore.
Characters billed
All three endpoints bill by character count of the input text. The monthly limit resets on the first of each calendar month. Free tier: 2M chars/month.
Detection accuracy by tier
raipii runs a multi-layer detection pipeline on every request. Regex patterns fire first at 100% confidence for all structured PII. A local contextual NLP engine then runs fully within your region — offline and HIPAA-safe — to catch names, dates, and addresses in natural language. On Growth and Business tiers, an additional cloud NLP service provides the highest recall for unstructured free-form text.
LLM Proxy
The raipii proxy lets you add PII protection to any LLM call with a single line change — swap base_url to point at raipii. No SDK required. Your existing OpenAI, Anthropic, or Gemini code works unchanged.
raipii intercepts the request, sanitizes all PII in the message body, forwards the clean request to the real LLM with your key, then restores original values in the response before returning it to your app.
X-LLM-API-Key header and is never logged or stored.Proxy quick start
OpenAI
from openai import OpenAI
client = OpenAI(
base_url="https://api.raipii.com/v1/proxy/openai",
api_key="ignored", # raipii handles auth
default_headers={
"Authorization": "Bearer ps_live_...", # your raipii key
"X-LLM-API-Key": "sk-...", # your OpenAI key
},
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Help John Smith (SSN 392-45-7810)"}],
)
# PII was never sent to OpenAI — response has original values restoredAnthropic
import anthropic
client = anthropic.Anthropic(
base_url="https://api.raipii.com/v1/proxy/anthropic",
api_key="ignored",
default_headers={
"Authorization": "Bearer ps_live_...",
"X-LLM-API-Key": "sk-ant-...",
},
)
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Help John Smith (SSN 392-45-7810)"}],
)Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.raipii.com/v1/proxy/openai",
apiKey: "ignored",
defaultHeaders: {
Authorization: "Bearer ps_live_...",
"X-LLM-API-Key": "sk-...",
},
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Help John Smith (SSN 392-45-7810)" }],
});Supported providers
All providers are accessed via https://api.raipii.com/v1/proxy/{provider}/...
| Provider | Base URL | Compatible SDKs |
|---|---|---|
| openai | /v1/proxy/openai | openai-python, openai-node, LangChain |
| anthropic | /v1/proxy/anthropic | anthropic-python, anthropic-node |
| gemini | /v1/proxy/gemini | google-generativeai, LangChain |
| groq | /v1/proxy/groq | groq-python (OpenAI-compatible) |
| mistral | /v1/proxy/mistral | mistralai SDK (OpenAI-compatible) |
| deepseek | /v1/proxy/deepseek | openai SDK with DeepSeek base_url |
stream=True) is not yet supported on the proxy. Use the standard sanitize → LLM → restore flow for streaming use cases.Python SDK
pip install raipii
import raipii
ps = raipii.Raipii(api_key="ps_live_...") # or RAIPII_API_KEY env var
result = ps.sanitize("John Smith, john@acme.com", mode="fake_substitute")
restored = ps.restore(llm_response, result.session_id)
detected = ps.detect("SSN 392-45-7810")
conv = ps.conversations.create(ttl=3600)Full docs and options in the PyPI README.
Node.js / TypeScript SDK
npm install raipii
import { Raipii } from "raipii";
const ps = new Raipii({ apiKey: "ps_live_..." }); // or RAIPII_API_KEY env var
const result = await ps.sanitize("John Smith, john@acme.com", { mode: "fake_substitute" });
const restored = await ps.restore(llmResponse, result.sessionId);
const detected = await ps.detect("SSN 392-45-7810");
const conv = await ps.conversations.create({ ttl: 3600 });Zero runtime dependencies. Ships ESM + CJS with full TypeScript types.
HTTP (curl)
# Sanitize
curl -X POST https://api.raipii.com/v1/sanitize \
-H "Authorization: Bearer ps_live_..." \
-H "Content-Type: application/json" \
-d '{"text": "John Smith, john@acme.com", "mode": "fake_substitute"}'
# Restore
curl -X POST https://api.raipii.com/v1/restore \
-H "Authorization: Bearer ps_live_..." \
-H "Content-Type: application/json" \
-d '{"text": "...", "session_id": "ps_sess_..."}'
# Detect
curl -X POST https://api.raipii.com/v1/detect \
-H "Authorization: Bearer ps_live_..." \
-H "Content-Type: application/json" \
-d '{"text": "My SSN is 392-45-7810"}'Ready to start?
Free tier — 2M characters/month, no credit card required.