ElevenLabs Audio
Generate multi-speaker voiceovers, single-speaker narration, and sound effects with ElevenLabs models through VicSee API. V3 Dialogue, TTS Turbo 2.5, and Sound Effect V2.
API-only models. ElevenLabs audio models are available exclusively through the VicSee API.
Pricing
| Model | Credits | Description |
|---|---|---|
| V3 Text to Dialogue | 15 | Multi-speaker voiceover (max 1K chars total) |
| TTS Turbo 2.5 | 8 | Single-speaker narration (max 1K chars) |
| Sound Effect V2 | 5 | Sound effects from text (max 22s) |
Credits are deducted only on successful generation. All models output MP3 audio.
Endpoint
POST https://vicsee.com/api/v1/generateSee Authentication for API key setup.
V3 Text to Dialogue
Generate multi-speaker voiceovers for ads, podcasts, and video narration. Each speaker gets their own voice.
Parameters
model · string · required
Must be "elevenlabs-text-to-dialogue-v3".
dialogue · array · required
Array of dialogue entries. Each entry has text (what to say) and voice (which voice to use).
Constraints:
- Total characters across all
textentries must not exceed 1,000 - Each entry must have both
textandvoicefields
Example:
[
{ "text": "Welcome to Agency HQ!", "voice": "Adam" },
{ "text": "Let's create something amazing.", "voice": "Sarah" }
]stability · number · optional
Controls voice stability. This is an enum, not a continuous range.
Supported values: 0, 0.5, 1
Default: 0.5
language_code · string · optional
ISO 639-1 language code (e.g., "en", "es", "fr").
Example Request
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-text-to-dialogue-v3",
"input": {
"dialogue": [
{ "text": "Welcome to Agency HQ!", "voice": "Adam" },
{ "text": "Let us create something amazing.", "voice": "Sarah" }
],
"stability": 0.5,
"language_code": "en"
}
}'TTS Turbo 2.5
Single-speaker text-to-speech. Fast, natural narration with fine-tuned voice controls.
Parameters
model · string · required
Must be "elevenlabs-text-to-speech-turbo-2-5".
text · string · required
The text to speak. Maximum 1,000 characters per generation.
voice · string · optional
Voice preset name or voice ID.
Example presets: "Rachel", "Adam", "Sarah", "Antoni", "Bella", "Domi", "Elli", "Josh"
ElevenLabs provides 100+ voices. Pass any valid voice name or ID.
stability · number · optional
Voice stability. 0-1 continuous range. Lower = more expressive, higher = more consistent.
Default: 0.5
similarity_boost · number · optional
Voice clarity and similarity to the original voice. 0-1 continuous range.
Default: 0.75
style · number · optional
Style exaggeration. 0-1 continuous range. Higher values amplify the style of the original voice.
Default: 0
speed · number · optional
Speech speed multiplier.
Range: 0.7 - 1.2
Default: 1.0
language_code · string · optional
ISO 639-1 language code to enforce a specific language (e.g., "en", "es", "fr"). If omitted, the model auto-detects language from the text.
Example Request
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-text-to-speech-turbo-2-5",
"input": {
"text": "Coca-Cola. Open Happiness.",
"voice": "Rachel",
"stability": 0.5,
"similarity_boost": 0.75,
"speed": 1.0
}
}'Sound Effect V2
Generate sound effects from text descriptions. Ideal for background audio, ambient sounds, and foley effects.
Parameters
model · string · required
Must be "elevenlabs-sound-effect-v2".
text · string · required
Description of the sound effect to generate. Maximum 5,000 characters (model limit).
Good prompts are specific:
- "Busy city street with car horns honking and crowd chatter"
- "Thunder rumbling in the distance, followed by heavy rain"
- "Old wooden door creaking open slowly"
duration_seconds · number · optional
Duration of the generated sound effect in seconds.
Range: 0.5 - 22 (step 0.1)
If omitted, the model determines the optimal duration based on the prompt.
loop · boolean · optional
Whether to generate a seamless looping sound effect.
Default: false
prompt_influence · number · optional
How closely the output follows the prompt. Higher values produce less variation between generations.
Range: 0 - 1
Default: 0.3
Example Request
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "elevenlabs-sound-effect-v2",
"input": {
"text": "Busy city street with car horns and crowd chatter",
"duration_seconds": 10,
"loop": false,
"prompt_influence": 0.3
}
}'Response
All three models return the same response format.
Success (200)
{
"success": true,
"data": {
"id": "task_abc123xyz",
"model": "elevenlabs-text-to-dialogue-v3",
"status": "pending",
"creditsUsed": 15,
"creditsRemaining": 485,
"createdAt": "2026-02-23T12:00:00.000Z"
}
}Poll for completion using Tasks API.
Task Complete
{
"success": true,
"data": {
"id": "task_abc123xyz",
"model": "elevenlabs-text-to-dialogue-v3",
"status": "completed",
"output": {
"url": "https://cdn.vicsee.com/dew/abc123.mp3",
"format": "mp3"
},
"createdAt": "2026-02-23T12:00:00.000Z",
"completedAt": "2026-02-23T12:00:30.000Z"
}
}Related Models
Z Image
Fast photorealistic AI image generation with Alibaba Z Image through VicSee API. Text-to-image with multiple aspect ratios, only 2 credits.
Image Upscale
Premium AI image upscaling with Topaz through VicSee API. Upscale images up to 8x with automatic dimension detection and output validation. 20–80 credits per image.