ElevenLabs Audio API — Voiceover, TTS & Sound Effects

Generate multi-speaker voiceovers, single-speaker narration, and sound effects with ElevenLabs models through VicSee API. V3 Dialogue, TTS Turbo 2.5, and Sound Effect V2.

API-only models. ElevenLabs audio models are available exclusively through the VicSee API.

Pricing

Model	Credits	Description
V3 Text to Dialogue	15	Multi-speaker voiceover (max 1K chars total)
TTS Turbo 2.5	8	Single-speaker narration (max 1K chars)
Sound Effect V2	5	Sound effects from text (max 22s)

Credits are deducted only on successful generation. All models output MP3 audio.

Endpoint

POST https://vicsee.com/api/v1/generate

See Authentication for API key setup.

V3 Text to Dialogue

Generate multi-speaker voiceovers for ads, podcasts, and video narration. Each speaker gets their own voice.

Parameters

`model` · `string` · required

Must be "elevenlabs-text-to-dialogue-v3".

`dialogue` · `array` · required

Array of dialogue entries. Each entry has text (what to say) and voice (which voice to use).

Constraints:

Total characters across all text entries must not exceed 1,000
Each entry must have both text and voice fields

Example:

[
  { "text": "Welcome to Agency HQ!", "voice": "Adam" },
  { "text": "Let's create something amazing.", "voice": "Sarah" }
]

`stability` · `number` · optional

Controls voice stability. This is an enum, not a continuous range.

Supported values: 0, 0.5, 1

Default: 0.5

`language_code` · `string` · optional

ISO 639-1 language code (e.g., "en", "es", "fr").

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-text-to-dialogue-v3",
    "input": {
      "dialogue": [
        { "text": "Welcome to Agency HQ!", "voice": "Adam" },
        { "text": "Let us create something amazing.", "voice": "Sarah" }
      ],
      "stability": 0.5,
      "language_code": "en"
    }
  }'

TTS Turbo 2.5

Single-speaker text-to-speech. Fast, natural narration with fine-tuned voice controls.

Parameters

`model` · `string` · required

Must be "elevenlabs-text-to-speech-turbo-2-5".

`text` · `string` · required

The text to speak. Maximum 1,000 characters per generation.

`voice` · `string` · optional

Voice preset name or voice ID.

Example presets: "Rachel", "Adam", "Sarah", "Antoni", "Bella", "Domi", "Elli", "Josh"

ElevenLabs provides 100+ voices. Pass any valid voice name or ID.

`stability` · `number` · optional

Voice stability. 0-1 continuous range. Lower = more expressive, higher = more consistent.

Default: 0.5

`similarity_boost` · `number` · optional

Voice clarity and similarity to the original voice. 0-1 continuous range.

Default: 0.75

`style` · `number` · optional

Style exaggeration. 0-1 continuous range. Higher values amplify the style of the original voice.

Default: 0

`speed` · `number` · optional

Speech speed multiplier.

Range: 0.7 - 1.2

Default: 1.0

`language_code` · `string` · optional

ISO 639-1 language code to enforce a specific language (e.g., "en", "es", "fr"). If omitted, the model auto-detects language from the text.

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-text-to-speech-turbo-2-5",
    "input": {
      "text": "Coca-Cola. Open Happiness.",
      "voice": "Rachel",
      "stability": 0.5,
      "similarity_boost": 0.75,
      "speed": 1.0
    }
  }'

"Busy city street with car horns honking and crowd chatter"
"Thunder rumbling in the distance, followed by heavy rain"
"Old wooden door creaking open slowly"

`duration_seconds` · `number` · optional

Duration of the generated sound effect in seconds.

Range: 0.5 - 22 (step 0.1)

If omitted, the model determines the optimal duration based on the prompt.

`loop` · `boolean` · optional

Whether to generate a seamless looping sound effect.

Default: false

`prompt_influence` · `number` · optional

How closely the output follows the prompt. Higher values produce less variation between generations.

Range: 0 - 1

Default: 0.3

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-sound-effect-v2",
    "input": {
      "text": "Busy city street with car horns and crowd chatter",
      "duration_seconds": 10,
      "loop": false,
      "prompt_influence": 0.3
    }
  }'

Response

All three models return the same response format.

Success (200)

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "elevenlabs-text-to-dialogue-v3",
    "status": "pending",
    "creditsUsed": 15,
    "creditsRemaining": 485,
    "createdAt": "2026-02-23T12:00:00.000Z"
  }
}

Poll for completion using Tasks API.

Task Complete

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "elevenlabs-text-to-dialogue-v3",
    "status": "completed",
    "output": {
      "url": "https://cdn.vicsee.com/dew/abc123.mp3",
      "format": "mp3"
    },
    "createdAt": "2026-02-23T12:00:00.000Z",
    "completedAt": "2026-02-23T12:00:30.000Z"
  }
}

Seedance 2.0 — Multimodal reference video with native audio
Veo 3.1 — Cinematic video with native audio synthesis
Kling 2.6 — Video with dialogue and lip-sync

ElevenLabs Audio

On this page