Seedance 2.0 API — Multimodal Reference Video with Native Audio

Generate 4-15 second videos with multimodal reference input using Seedance 2.0 through VicSee API. Text, image, video, and audio references. 480p, 720p, 1080p, and 4K. Native audio included. 100-4190 credits.

Try it now: Use the Seedance 2.0 Generator to create videos with multimodal references.

Pricing

Text to Video / Image to Video

Duration	480p Credits	720p Credits	1080p Credits	4K Credits
4s	100	220	550	1120
5s	120	280	690	1400
6s	150	330	820	1680
7s	170	390	960	1960
8s	200	440	1100	2240
9s	220	500	1230	2520
10s	250	550	1370	2800
11s	270	610	1510	3080
12s	300	660	1650	3350
13s	320	720	1780	3630
14s	350	770	1920	3910
15s	370	830	2060	4190

Reference to Video

Pricing depends on whether a video reference is included:

Image and/or audio references only (no video reference): billed at the flat rate in the standard pricing table above, by output duration and resolution (e.g. 4s/480p = 100 credits). This is the most common case.
With a video reference (reference_video_urls): billed per second of total video — per-second rate x (sum of input video durations + output duration).

Resolution	Credits per second (video reference only)
480p	15/s
720p	35/s
1080p	83/s
4K	172/s

Example: 10s reference video + 8s output at 720p = (10 + 8) x 35 = 630 credits

Note: Audio generation is included at no extra cost. Credits are deducted only on successful generation. 1080p and 4K are available on Standard only — see Seedance 2.0 Fast for 480p/720p drafting at lower cost.

Credit range: 100-2060 credits (standard pricing), variable for video references.

Endpoint

POST https://vicsee.com/api/v1/generate

See Authentication for API key setup.

Text to Video

Generate videos from text descriptions with native audio.

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	`seedance-2-0-text-to-video`
input.prompt	string	Yes	Video description (max 20,000 chars)
input.duration	number	No	4-15 seconds, any integer (default: 8)
input.resolution	string	No	`480p`, `720p`, `1080p`, `4k` (default: 720p)
input.aspect_ratio	string	No	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`, `21:9`, `adaptive` (default: 16:9)
input.audio	boolean	No	Enable native audio (default: true)
input.web_search	boolean	No	Enhance prompt with web search (default: false)
input.nsfw_checker	boolean	No	NSFW content filter (default: true). VicSee always sends `true` unless explicitly overridden.

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-text-to-video",
    "input": {
      "prompt": "A martial arts master demonstrates fluid spear techniques in a sunlit courtyard",
      "duration": 8,
      "resolution": "720p",
      "aspect_ratio": "16:9",
      "audio": true
    }
  }'

Image to Video

Animate images into video using first frame and optional last frame control.

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	`seedance-2-0-image-to-video`
input.prompt	string	Yes	Description of the animation (max 20,000 chars)
input.image_urls	string[]	Yes	Array of 1-2 images (first frame, optional last frame)
input.duration	number	No	4-15 seconds, any integer (default: 8)
input.resolution	string	No	`480p`, `720p`, `1080p`, `4k` (default: 720p)
input.aspect_ratio	string	No	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`, `21:9`, `adaptive` (default: 16:9)
input.audio	boolean	No	Enable native audio (default: true)
input.nsfw_checker	boolean	No	NSFW content filter (default: true).

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-image-to-video",
    "input": {
      "prompt": "The woman turns and smiles warmly, the wind catching her hair",
      "image_urls": ["https://example.com/start-frame.jpg"],
      "duration": 6,
      "resolution": "720p",
      "audio": true
    }
  }'

Two images: Pass a second URL for last frame control. The model generates a smooth transition between the two frames.

"image_urls": [
  "https://example.com/start-frame.jpg",
  "https://example.com/end-frame.jpg"
]

Reference to Video

Generate videos using multimodal references: images, videos, and audio. This is Seedance 2.0's unique capability, allowing character consistency, motion reference, and audio-driven generation.

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	`seedance-2-0-reference-to-video`
input.prompt	string	No	Text description to guide generation (max 20,000 chars)
input.reference_image_urls	string[]	No	Up to 7 reference images (character refs, scene refs)
input.reference_video_urls	string[]	No	Up to 3 reference videos (motion, style)
input.reference_audio_urls	string[]	No	Up to 3 reference audio files (voice, music, SFX)
input.duration	number	No	4-15 seconds, any integer (default: 8)
input.resolution	string	No	`480p`, `720p`, `1080p`, `4k` (default: 720p)
input.aspect_ratio	string	No	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`, `21:9`, `adaptive` (default: 16:9)
input.audio	boolean	No	Enable native audio (default: true)
input.nsfw_checker	boolean	No	NSFW content filter (default: true).

At least one reference type must be provided. You can combine all three in a single request.

Reference Constraints

Type	Max Count	Max Size	Duration Limit	Formats
Images	7	30MB each	—	jpeg, png, webp, bmp, tiff, gif
Videos	3	50MB each	2-15s each, total max 15s	mp4, mov
Audio	3	15MB each	2-15s each, total max 15s	mp3, wav

Example Request

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "input": {
      "prompt": "The character walks through a garden, looking around with curiosity",
      "reference_image_urls": [
        "https://example.com/character-ref.jpg",
        "https://example.com/scene-ref.jpg"
      ],
      "reference_video_urls": [
        "https://example.com/motion-reference.mp4"
      ],
      "duration": 10,
      "resolution": "720p"
    }
  }'

Response

Success (200)

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "seedance-2-0-text-to-video",
    "status": "pending",
    "creditsUsed": 440,
    "creditsRemaining": 560,
    "createdAt": "2026-04-04T12:00:00Z"
  }
}

Poll for completion using Tasks API.

Task Complete

{
  "taskId": "task_abc123xyz",
  "status": "completed",
  "output": {
    "url": "https://cdn.vicsee.com/outputs/video_xyz.mp4",
    "duration": 8,
    "format": "mp4",
    "hasAudio": true
  }
}

Seedance 2.0 Fast - Same capabilities, 19-20% cheaper, faster generation
Seedance 1.5 Pro - Up to 1080p, multilingual audio, no reference mode
Kling 3.0 - Multi-shot control, start and end frame references

Seedance 2.0

On this page