raipii — PII sanitization for LLM pipelines

v1RESTHTTPS

raipii API

raipii is a REST API that detects and sanitizes PII in text before it reaches your LLM. Two calls wrap your existing LLM workflow — one before (sanitize), one after (restore). No infrastructure changes required.

Base URL for all requests:

text

https://api.raipii.com

All requests use HTTPS. Request and response bodies are JSON. All endpoints require authentication.

Quick start →

Make your first API call in 5 minutes

Sanitize →

Strip PII before sending to your LLM

Restore →

Put original values back in the response

Modes →

token, fake_substitute, redact

Authentication

Pass your API key as a Bearer token in every request.

bash

curl -X POST https://api.raipii.com/v1/sanitize \
  -H "Authorization: Bearer ps_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello John Smith"}'

ℹGet a free API key at raipii.com — 2M characters/month, no credit card required. Keys are prefixed with ps_live_.

Status	Meaning
401	Missing or invalid API key
403	Session belongs to a different account

Quick start

Three API calls — sanitize, your LLM, restore.

PythonNode.jscurl

python

import raipii, openai

ps = raipii.Raipii(api_key="ps_live_...")
oai = openai.OpenAI()

prompt = "Help John Smith (john@acme.com, SSN 392-45-7810) with his claim."

# 1. Sanitize — strip PII before sending to your LLM
result = ps.sanitize(prompt, mode="fake_substitute")
# → "Help Michael Torres (m.torres@email.net, SSN 847-23-1956)..."

# 2. Call your LLM with clean text
reply = oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": result.sanitized_text}],
).choices[0].message.content

# 3. Restore original values in the response
final = ps.restore(reply, result.session_id)
print(final.restored_text)  # John Smith's real data is back

POST /v1/sanitize

Detects PII in text and replaces it according to the chosen mode. Returns a session_id for later restoration.

Request body

Field	Type	Required	Description
text	string	required	The text to sanitize. Max 100,000 characters.
mode	string	optional	token (default) \| fake_substitute \| redact
entities	string[]	optional	Limit detection to specific entity types. Detects all if omitted.
session_ttl	integer	optional	Session expiry in seconds. Default 3600 (1 hr). Max 86400.
conversation_id	string	optional	Link to a multi-turn conversation session for consistent substitutions.
confidence_threshold	float	optional	Override the detection confidence threshold (0.0–1.0). Default 0.85.

Response

json

{
  "session_id": "ps_sess_abc123...",
  "conversation_id": null,
  "sanitized_text": "Call [PERSON_1] at [EMAIL_1]",
  "entities_found": [
    {
      "type": "PERSON",
      "original": "John Smith",
      "replacement": "[PERSON_1]",
      "position": [5, 15],
      "confidence": 0.99
    },
    {
      "type": "EMAIL",
      "original": "john@acme.com",
      "replacement": "[EMAIL_1]",
      "position": [19, 32],
      "confidence": 1.0
    }
  ],
  "char_count": 32,
  "usage": { "chars_billed": 32 }
}

Response fields

Field	Type	Description
session_id	string	Pass to /v1/restore to reverse substitutions.
sanitized_text	string	Text with PII replaced.
entities_found	array	Each detected entity: type, original, replacement, position, confidence.
char_count	integer	Characters in the input text.
usage.chars_billed	integer	Characters billed against your monthly limit.
conversation_id	string \| null	Echo of the conversation_id passed in, or null.

⚠Store the session_id immediately — sessions expire after session_ttl seconds (default 1 hr). Calling /v1/restore after expiry returns 404.

POST /v1/restore

Reverses the substitutions made by /v1/sanitize, replacing tokens or synthetic values in the LLM response with their original PII.

Request body

Field	Type	Required	Description
text	string	required	The LLM response text containing tokens or synthetic values.
session_id	string	required	The session_id returned by the corresponding sanitize call.

Response

json

{
  "restored_text": "Call John Smith at john@acme.com",
  "substitutions_reversed": 2,
  "usage": { "chars_billed": 32 }
}

ℹredact mode sessions have nothing to restore — original values were discarded. Calling restore on a redact session returns the text unchanged.

POST /v1/detect

Scans text for PII and returns detected entities with types, positions, and confidence scores. Does not modify the text. Useful for auditing and risk assessment.

Request body

Field	Type	Required	Description
text	string	required	Text to scan for PII.
entities	string[]	optional	Limit detection to specific entity types.
confidence_threshold	float	optional	Override the detection confidence threshold.

Response

json

{
  "entities_found": [
    {
      "type": "US_SSN",
      "value": "392-45-7810",
      "confidence": 1.0,
      "position": [10, 21],
      "detection_method": "structured"
    }
  ],
  "pii_detected": true,
  "risk_level": "HIGH",
  "usage": { "chars_billed": 21 }
}

Risk levels

Level	Triggered by
HIGH	SSN, credit card, medical record number, bank account, tax ID
MEDIUM	Person name, email, date of birth, address
LOW	Any other detected entity type
NONE	No PII detected

POST /v1/conversations

Creates a multi-turn conversation session. Pass the returned conversation_id to /v1/sanitize calls so the same real entity always maps to the same synthetic value across all turns.

Request body

Field	Type	Required	Description
ttl	integer	optional	Session lifetime in seconds. Default 86400 (24 hr).
metadata	object	optional	Arbitrary key-value pairs stored with the conversation.

Response

json

{
  "conversation_id": "ps_conv_xyz...",
  "expires_at": "2026-04-11T12:00:00"
}

Sanitize modes

tokendefault

Replaces each entity with a labelled placeholder. The LLM sees the type but not the value. Fully reversible.

Input

Call John Smith at john@acme.com

Output

Call [PERSON_1] at [EMAIL_1]

fake_substitutebest quality

Replaces each entity with a realistic synthetic value. The LLM sees natural data and produces higher-quality output. Fully reversible.

Input

Call John Smith at john@acme.com

Output

Call Michael Torres at m.torres@email.net

redactone-way

Replaces each entity with [REDACTED]. No restore possible — use when the LLM response must never reference PII.

Input

Call John Smith at john@acme.com

Output

Call [REDACTED] at [REDACTED]

Label neutralization

After substituting values, raipii also rewrites sensitive context phrases to prevent safety refusals — even when the actual values have been replaced.

Original phrase	Rewritten as
SSN / social security number	ID number
credit card number	account number
date of birth / DOB	date
bank account	account reference
passport number	document number
driver's license	document number
tax ID / EIN	reference number

Entity types

Supported entity types. Pass any of these in the entities array to limit detection.

Type	Example	Tier
PERSON	John Smith	All tiers
EMAIL	john@acme.com	All tiers
PHONE	555-867-5309	All tiers
US_SSN	392-45-7810	All tiers
DATE_OF_BIRTH	03/14/1985	All tiers
ADDRESS	742 Evergreen Terrace, Springfield IL	All tiers
IP_ADDRESS	192.168.1.1	All tiers
MEDICAL_RECORD_NUMBER	MRN 00123456	All tiers
TAX_ID	12-3456789	All tiers
IBAN	GB29 NWBK 6016 1331 9268 19	All tiers
JWT	eyJhbGci...	All tiers
AWS_KEY	AKIA...	All tiers
CREDIT_CARD	4111 1111 1111 1111	Growth+
BANK_ACCOUNT	123456789012	Growth+
PASSPORT	A12345678	Growth+
DRIVERS_LICENSE	A1234567	Growth+
NPI	NPI 1234567890	Growth+

ℹAll tiers detect structured PII (SSNs, emails, phones, addresses, dates) using regex and local NLP — no external API calls. Growth and Business tiers add AWS Comprehend for higher recall on names and unstructured free-form entities, plus extended financial and identity document types.

Multi-turn conversations

Without a conversation session, each /v1/sanitize call generates independent substitutions. The same real name may map to different synthetic values across turns.

Create a conversation session once and pass its ID to all sanitize calls. raipii ensures the same entity always maps to the same synthetic value for the lifetime of the conversation.

python

conv = ps.conversations.create(ttl=3600)

# Turn 1 — "John Smith" → "Michael Torres"
turn1 = ps.sanitize(
    "My name is John Smith.",
    mode="fake_substitute",
    conversation_id=conv.conversation_id,
)

# Turn 2 — "John Smith" → same "Michael Torres" from turn 1
turn2 = ps.sanitize(
    "Tell me more about John Smith.",
    mode="fake_substitute",
    conversation_id=conv.conversation_id,
)

HIPAA mode

HIPAA mode ensures no text is sent to any external service during detection. All analysis runs entirely within your AWS region using local engines only — no data ever leaves your region.

HIPAA mode is enabled by default on the Starter tier and available as a toggle for Business tier accounts. It reliably detects all structured PHI — SSNs, medical record numbers, dates of birth, addresses, contact information — as well as names and contextual entities via local NLP. No external cloud services are called.

ℹraipii is HIPAA-compliant and can provide a Business Associate Agreement (BAA). Review our BAA template or contact us to execute a signed copy.

Errors

All errors return a JSON body with an error field.

json

{ "error": "Monthly character limit exceeded" }

Status	Meaning	How to handle
400	Bad request — missing or invalid field	Check request body
401	Invalid or missing API key	Check Authorization header
402	Monthly character limit exceeded	Upgrade plan at raipii.com
403	Feature not available on current tier	Upgrade to Growth or Business tier
404	Session not found or expired	Re-sanitize the original text
429	Too many requests	Back off and retry — SDKs do this automatically
503	Service temporarily unavailable	Retry after a short delay — SDKs do this automatically

Retry & timeouts

Both SDKs automatically retry on 429 and 503 with exponential backoff. If calling the HTTP API directly, implement your own backoff.

Attempt	Delay
1st retry	1 second
2nd retry	2 seconds
3rd retry (final)	4 seconds

Default request timeout is 30 seconds. Configurable via SDK options.

Caveats

Session expiry

Sessions expire after session_ttl seconds (default 3600 — 1 hr). Always call /v1/restore promptly after receiving the LLM response. Expired sessions return 404. Pass a larger session_ttl to extend — max 3600 (Starter), 86400 (Growth), 604800 (Business).

redact mode has no restore

When using redact mode, original values are not stored. Calling /v1/restore on a redact session returns the text unchanged. Use token or fake_substitute if you need to restore.

Characters billed

All three endpoints bill by character count of the input text. The monthly limit resets on the first of each calendar month. Free tier: 2M chars/month.

Detection accuracy by tier

raipii runs a multi-layer detection pipeline on every request. Regex patterns fire first at 100% confidence for all structured PII. A local contextual NLP engine then runs fully within your region — offline and HIPAA-safe — to catch names, dates, and addresses in natural language. On Growth and Business tiers, an additional cloud NLP service provides the highest recall for unstructured free-form text.

LLM Proxy

The raipii proxy lets you add PII protection to any LLM call with a single line change — swap base_url to point at raipii. No SDK required. Your existing OpenAI, Anthropic, or Gemini code works unchanged.

raipii intercepts the request, sanitizes all PII in the message body, forwards the clean request to the real LLM with your key, then restores original values in the response before returning it to your app.

ℹLLM Proxy requires Growth tier or above. Your LLM API key is passed in the X-LLM-API-Key header and is never logged or stored.

OpenAI

Anthropic

Gemini

Groq

Mistral

DeepSeek

Proxy quick start

OpenAI

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.raipii.com/v1/proxy/openai",
    api_key="ignored",          # raipii handles auth
    default_headers={
        "Authorization": "Bearer ps_live_...",   # your raipii key
        "X-LLM-API-Key": "sk-...",               # your OpenAI key
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Help John Smith (SSN 392-45-7810)"}],
)
# PII was never sent to OpenAI — response has original values restored

Anthropic

python

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.raipii.com/v1/proxy/anthropic",
    api_key="ignored",
    default_headers={
        "Authorization": "Bearer ps_live_...",
        "X-LLM-API-Key": "sk-ant-...",
    },
)

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Help John Smith (SSN 392-45-7810)"}],
)

Node.js (OpenAI SDK)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.raipii.com/v1/proxy/openai",
  apiKey: "ignored",
  defaultHeaders: {
    Authorization: "Bearer ps_live_...",
    "X-LLM-API-Key": "sk-...",
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Help John Smith (SSN 392-45-7810)" }],
});

Supported providers

All providers are accessed via https://api.raipii.com/v1/proxy/{provider}/...

Provider	Base URL	Compatible SDKs
openai	/v1/proxy/openai	openai-python, openai-node, LangChain
anthropic	/v1/proxy/anthropic	anthropic-python, anthropic-node
gemini	/v1/proxy/gemini	google-generativeai, LangChain
groq	/v1/proxy/groq	groq-python (OpenAI-compatible)
mistral	/v1/proxy/mistral	mistralai SDK (OpenAI-compatible)
deepseek	/v1/proxy/deepseek	openai SDK with DeepSeek base_url

⚠Streaming (stream=True) is not yet supported on the proxy. Use the standard sanitize → LLM → restore flow for streaming use cases.

Python SDK

bash

pip install raipii

python

import raipii

ps = raipii.Raipii(api_key="ps_live_...")  # or RAIPII_API_KEY env var

result   = ps.sanitize("John Smith, john@acme.com", mode="fake_substitute")
restored = ps.restore(llm_response, result.session_id)
detected = ps.detect("SSN 392-45-7810")
conv     = ps.conversations.create(ttl=3600)

Full docs and options in the PyPI README.

Node.js / TypeScript SDK

bash

npm install raipii

import { Raipii } from "raipii";

const ps = new Raipii({ apiKey: "ps_live_..." }); // or RAIPII_API_KEY env var

const result   = await ps.sanitize("John Smith, john@acme.com", { mode: "fake_substitute" });
const restored = await ps.restore(llmResponse, result.sessionId);
const detected = await ps.detect("SSN 392-45-7810");
const conv     = await ps.conversations.create({ ttl: 3600 });

Zero runtime dependencies. Ships ESM + CJS with full TypeScript types.

HTTP (curl)

bash

# Sanitize
curl -X POST https://api.raipii.com/v1/sanitize \
  -H "Authorization: Bearer ps_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text": "John Smith, john@acme.com", "mode": "fake_substitute"}'

# Restore
curl -X POST https://api.raipii.com/v1/restore \
  -H "Authorization: Bearer ps_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text": "...", "session_id": "ps_sess_..."}'

# Detect
curl -X POST https://api.raipii.com/v1/detect \
  -H "Authorization: Bearer ps_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text": "My SSN is 392-45-7810"}'

Ready to start?

Free tier — 2M characters/month, no credit card required.

Get API key — free Try playground