Speech (Text-to-Speech) API Usage Guide

Generate lifelike speech audio from text using Radient's Speech API. Ideal for assistants, IVR systems, accessibility features, and any application that needs high-quality audio output on demand.

This guide shows how to call the endpoint, control the voice, format, and speed, and save the audio result.

Core Concepts

The Speech API uses the /v1/speech endpoint and returns an audio file stream.

Key features:

Multiple voices and quality levels (models tts-1, tts-1-hd, gpt-4o-mini-tts).
Choose audio format (mp3, opus, aac, flac, wav, pcm) and playback speed.
Provider-aware (currently OpenAI-compatible).

Text-to-Speech

Convert text to speech and save the resulting audio file locally.

Endpoint: POST /v1/speech

Consumes: application/json
Produces: audio/mpeg (when response_format is mp3; content type may vary by format)

Example Request (Python using requests):

import requests

RADIENT_API_KEY = "YOUR_RADIENT_API_KEY"
RADIENT_BASE_URL = "https://api.radient.com/v1"  # Or your specific Radient API endpoint

headers = {
    "Authorization": f"Bearer {RADIENT_API_KEY}",
    "Content-Type": "application/json",
    # "Accept": "audio/mpeg"  # Optional; server will stream audio back
}

payload = {
    "input": "Hello! This is a test of Radient's text-to-speech service.",
    "model": "tts-1",               # or "tts-1-hd", "gpt-4o-mini-tts"
    "voice": "alloy",               # "alloy", "ash", "ballad", "coral", "echo", "fable", "onyx", "nova", "sage", "shimmer", "verse"
    "response_format": "mp3",       # "mp3", "opus", "aac", "flac", "wav", "pcm"
    "speed": 1.0,                   # between 0.25 and 4.0
    "provider": "openai"            # optional, currently "openai"
}

response = requests.post(f"{RADIENT_BASE_URL}/speech", headers=headers, json=payload, stream=True)

if response.status_code == 200:
    # Save the streamed audio to a file
    output_filename = "speech_output.mp3"  # match response_format
    with open(output_filename, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                f.write(chunk)
    print(f"Saved audio to {output_filename}")
else:
    # If not 200, the endpoint returns JSON error details
    try:
        print("Error:", response.status_code, response.json())
    except Exception:
        print("Error:", response.status_code, response.text)

Request Body (JSON)

Field	Type	Description	Required	Allowed / Range
`input`	string	The text to synthesize into speech.	Yes	1–4096 chars
`model`	string	TTS model to use.	Yes	`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`
`voice`	string	Voice preset.	Yes	`alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`, `verse`
`response_format`	string	Output audio format.	No	`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
`speed`	number	Playback speed multiplier.	No	0.25–4.0
`provider`	string	Underlying provider.	No	`openai`

Responses

200 OK: Audio stream (file). Content type corresponds to the chosen response_format (e.g., audio/mpeg for mp3).
400 Bad Request: Invalid fields or missing parameters (JSON body with error details).
500 Internal Server Error: Unexpected server issue.

Tips and Best Practices

Use tts-1-hd for higher fidelity when latency is less critical; tts-1 for lower-latency needs.
Select mp3 for broad compatibility; consider opus for very low bitrate/high quality in supported clients.
Keep input under ~4k characters per request; chunk longer text and combine outputs client-side.
Cache or reuse generated clips if the same text is requested frequently to reduce latency and cost.

For a full list of API endpoints and schemas, see the main API Reference.