Phemius API
v0.1.0High-quality text-to-speech API with real-time WebSocket streaming, OpenAI drop-in compatibility, and voice cloning.
https://api.phemius.devAuthentication
All API requests require a valid API key passed via the Authorization header. Keys are prefixed with sk_test_ (test) or sk_live_ (production). Keys are hashed server-side and cannot be recovered — store them securely after creation.
Authorization: Bearer sk_test_abc123...Quickstart
Get an API key
Sign up at phemius.dev and create an API key from the dashboard.
Make your first request
curl -X POST https://api.phemius.dev/v1/speech \
-H "Authorization: Bearer sk_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
--output speech.pcmPlay the audio
Play the raw PCM file: ffplay -f s16le -ar 24000 -ch_layout mono speech.pcm
/v1/speechCreate Speech
Synthesize text to speech. Returns raw PCM audio (16-bit, 24kHz, mono). Characters are billed after successful synthesis.
Request Body
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
text | string | Required | The text to synthesize Min: 1 Max: 5,000 |
voice | string | default | Voice name. Use a built-in name (alloy, echo, fable, onyx, nova, shimmer, aria) or a raw Kokoro voicepack name. |
model | string (phemius-fast) | phemius-fast | TTS model to use |
voice_id | string|null | — | UUID of a custom cloned voice. Mutually exclusive with voice. When set, audio caching is disabled. |
speed | number | 1 | Playback speed multiplier Range: 0.25–4 |
Response
audio/pcmRaw PCM audio bytes (16-bit signed, 24kHz, mono)
Response Headers
X-Chars-BilledNumber of characters billed for this requestErrors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
422 | Validation error (text too long, invalid speed, etc.) |
429 | Rate limit or monthly character limit exceeded |
502 | Speech synthesis failed (upstream model error) |
Examples
curl -X POST https://api.phemius.dev/v1/speech \
-H "Authorization: Bearer sk_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "nova"}' \
--output speech.pcm/v1/speech/streamStream Speech (WebSocket)
Real-time streaming speech synthesis over WebSocket. Audio is delivered as binary frames as it's generated, enabling low-latency playback. Ideal for interactive applications.
Protocol Flow
Send authentication
{
"api_key": "sk_test_YOUR_KEY"
}Send synthesis request
{
"text": "Hello world",
"voice": "nova",
"model": "phemius-fast",
"speed": 1
}Receive audio chunks as binary WebSocket frames. Each chunk is raw PCM (16-bit, 24kHz, mono). Chunks arrive as they're generated for low-latency streaming.
Final summary message
{
"done": true,
"chars_billed": 11,
"ttfb_ms": 245,
"cached": false
}Request Fields
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
text | string | Required | Max: 5,000 |
voice | string | default | |
model | string | phemius-fast | |
voice_id | string|null | — | |
speed | number | 1 | Range: 0.25–4 |
Close Codes
| Code | Reason |
|---|---|
4001 | Missing or invalid API key |
4029 | Rate limit exceeded |
4400 | Invalid request (validation error) |
Error Messages
These errors are sent as JSON messages before the connection closes — they are not WebSocket close codes.
| Code | Description |
|---|---|
5000 | Internal server error — sent as JSON {"error": "...", "code": 5000} before the connection closes |
5002 | Speech synthesis failed — sent as JSON {"error": "...", "code": 5002} before the connection closes |
Examples
const ws = new WebSocket('wss://api.phemius.dev/v1/speech/stream');
ws.onopen = () => {
ws.send(JSON.stringify({ api_key: 'sk_test_YOUR_KEY' }));
ws.send(JSON.stringify({
text: 'Hello world',
voice: 'nova',
speed: 1.0,
}));
};
const chunks = [];
ws.onmessage = (event) => {
if (event.data instanceof Blob) {
// Binary audio chunk — append to buffer or play immediately
chunks.push(event.data);
} else {
const msg = JSON.parse(event.data);
if (msg.done) {
console.log(`Billed: ${msg.chars_billed} chars, TTFB: ${msg.ttfb_ms}ms`);
} else if (msg.error) {
console.error(`Error ${msg.code}: ${msg.error}`);
}
}
};/v1/audio/speechCreate Speech (OpenAI Compatible)
Drop-in replacement for the OpenAI TTS API. Use your existing OpenAI SDK code — just change the base URL and API key. Always returns raw PCM audio.
Request Body
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
model | string | tts-1 | Must be tts-1. Other values return 400. |
input | string | Required | The text to synthesize Min: 1 Max: 10,000 |
voice | string | alloy | OpenAI voice name (alloy, echo, fable, onyx, nova, shimmer) or any Phemius voice name |
speed | number | 1 | Playback speed multiplier Range: 0.25–4 |
Response
audio/pcmRaw PCM audio bytes (16-bit signed, 24kHz, mono)
Response Headers
X-Chars-BilledNumber of characters billedX-Request-IdUnique request identifierErrors
| Status | Description |
|---|---|
400 | Unsupported model (use tts-1) |
401 | Invalid or missing API key |
422 | Validation error |
429 | Rate limit exceeded |
502 | Speech synthesis failed |
Voice Mapping
OpenAI voice names are first mapped to Phemius voice names, which then resolve to Kokoro voicepacks. You can also pass Phemius voice names or raw voicepack names directly.
| OpenAI Voice | Phemius Voice | Kokoro Voicepack |
|---|---|---|
alloy | aria | af_bella |
echo | marcus | am_adam |
fable | sophia | bf_emma |
onyx | orion | bm_lewis |
nova | nova | af_bella |
shimmer | aria | af_bella |
Examples
curl -X POST https://api.phemius.dev/v1/speech \
-H "Authorization: Bearer sk_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
--output speech.pcm/v1/voicesList Voices
List all custom voices belonging to the authenticated user.
Response
application/json{
"voices": [
{
"id": "uuid",
"name": "string",
"model": "phemius-fast",
"created_at": "datetime"
}
]
}Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
Examples
curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
https://api.phemius.dev/v1/voices/v1/voicesCreate Voice
Create a new custom voice. Free plan is limited to 2 voices.
Request Body
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
name | string | Required | Display name for the voice Min: 1 Max: 100 |
model | string (phemius-fast) | phemius-fast | TTS model for this voice |
Response
application/json{
"id": "uuid",
"name": "string",
"model": "string",
"created_at": "datetime"
}Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
403 | Free plan limited to 2 voices |
422 | Validation error (name too long, etc.) |
Examples
curl -X POST https://api.phemius.dev/v1/voices \
-H "Authorization: Bearer sk_test_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Alice"}'/v1/voices/{voice_id}Delete Voice
Delete a custom voice by ID. Only the owner can delete their voices.
Path Parameters
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
voice_id | string | — | The voice ID to delete |
Response
Voice deleted successfully (no body)
Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
404 | Voice not found |
Examples
curl -X DELETE https://api.phemius.dev/v1/voices/YOUR_VOICE_ID \
-H "Authorization: Bearer sk_test_YOUR_KEY"/v1/jobsList Jobs
List all async jobs (voice cloning, bulk synthesis) for the authenticated user, ordered by most recent first.
Response
application/json{
"jobs": [
{
"id": "uuid",
"type": "string",
"status": "queued",
"progress": "number",
"result": "object|null",
"error": "string|null",
"created_at": "datetime",
"updated_at": "datetime"
}
]
}Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
Examples
curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
https://api.phemius.dev/v1/jobs/v1/jobs/{job_id}Get Job
Get the status and details of a specific async job.
Path Parameters
| Parameter | Type | Required / Default | Description |
|---|---|---|---|
job_id | string | — | The job ID to retrieve |
Response
application/json{
"id": "uuid",
"type": "string",
"status": "queued",
"progress": "number",
"result": "object|null",
"error": "string|null",
"created_at": "datetime",
"updated_at": "datetime"
}Errors
| Status | Description |
|---|---|
401 | Invalid or missing API key |
404 | Job not found |
Examples
curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
https://api.phemius.dev/v1/jobs/YOUR_JOB_IDOpenAI TTS Migration
Phemius is a drop-in replacement for the OpenAI TTS API. Change two lines — base URL and API key — and your existing code works immediately.
Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello world",
)
response.stream_to_file("speech.pcm")After (Phemius)
from openai import OpenAI
client = OpenAI(
api_key="sk_test_YOUR_PHEMIUS_KEY",
base_url="https://api.phemius.dev/v1",
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello world",
)
response.stream_to_file("speech.pcm")Key Differences
- Only
tts-1model is supported (tts-1-hdis not available) - All six OpenAI voices are mapped to Kokoro equivalents
- Both endpoints return raw PCM audio — no format conversion
- Same speed range: 0.25x to 4.0x
- Higher character limit per request: 10,000 chars (vs OpenAI's 4,096)
Audio Formats
Native Endpoint /v1/speech
The native /v1/speech endpoint always returns raw PCM audio (16-bit signed, 24kHz, mono). There is no response_format parameter — the output is always PCM. To play it:
ffplay -f s16le -ar 24000 -ch_layout mono speech.pcmOpenAI-Compatible Endpoint /v1/audio/speech
The OpenAI-compatible /v1/audio/speech endpoint also always returns raw PCM audio. There is no response_format parameter.
WebSocket Streaming
The WebSocket endpoint at /v1/speech/stream enables real-time audio streaming with low time-to-first-byte. Audio chunks are sent as binary WebSocket frames as they're generated.
Audio Specification
Connection Flow
- 1Connect to
wss://api.phemius.dev/v1/speech/stream - 2Send JSON:
{"api_key": "sk_test_..."} - 3Send JSON:
{"text": "...", "voice": "...", ...} - 4Receive binary audio chunks
- 5Receive JSON:
{"done": true, "chars_billed": N, "ttfb_ms": N, "cached": bool}
voice_id (custom voices) bypass the cache.Rate Limits
Rate limits are applied per API key and reset on a rolling window (requests per minute) or monthly (character limits).
| Plan | Requests/min | Monthly Characters | Note |
|---|---|---|---|
| Free | 5 | 10,000 | Hard capped — no overage |
| Developer | 20 | 500,000 included | $8/1M chars overage |
| Growth | 100 | 2,000,000 included | $6/1M chars overage |
4029 when rate limitedError Handling
HTTP Errors
| Status | Description |
|---|---|
400 | Bad request — invalid model or malformed request |
401 | Unauthorized — missing, invalid, or revoked API key |
403 | Forbidden — plan limit reached (e.g. max voices) |
404 | Not found — resource doesn't exist or doesn't belong to you |
422 | Validation error — request body failed schema validation |
429 | Rate limited — too many requests or monthly character cap exceeded |
502 | Bad gateway — upstream TTS model failed |
WebSocket Close Codes
| Code | Description |
|---|---|
4001 | Authentication failed |
4029 | Rate limit exceeded |
4400 | Invalid request body |
WebSocket Error Messages
Sent as JSON before the connection closes — not WebSocket close frame codes.
| Code | Description |
|---|---|
5000 | Internal server error — sent as JSON message before close |
5002 | Speech synthesis failed — sent as JSON message before close |
Built-in Voices
The following voices are available with any speech endpoint. Pass the name as the voice parameter.
| Name | Voicepack | Gender | Description |
|---|---|---|---|
default | af_heart | female | Default voice |
alloy | af_heart | female | Warm and balanced |
echo | am_adam | male | Clear and articulate |
fable | bf_emma | female | Expressive storyteller |
onyx | bm_lewis | male | Deep and resonant |
nova | af_bella | female | Bright and energetic |
shimmer | af_sky | female | Soft and gentle |
aria | af_bella | female | Natural and expressive |
af_heart, am_adam). If the voice name doesn't match a built-in alias, it's used as a raw voicepack name.Plans & Limits
| Plan | Rate Limit | Monthly Chars | Max Voices | Billing |
|---|---|---|---|---|
| Free | 5 RPM | 10,000 | 2 | No billing — hard capped at 10,000 chars/month |
| Developer | 20 RPM | Unlimited | 10 | $10/mo + $8/1M chars overage |
| Growth | 100 RPM | Unlimited | Unlimited | $25/mo + $6/1M chars overage |