Seedance 2.0 Prompt Guide
Comprehensive prompt structure, camera vocabulary, multi-shot strategy, and 30-second chaining technique for Seedance 2.0 on the VicSee API.
For: Customers writing Seedance 2.0 prompts via the VicSee API. Pairs with the Seedance 2.0 API reference.
This guide covers prompt structure, camera vocabulary, common use cases, and the chaining technique to produce videos longer than the 15-second native limit.
TL;DR — Prompt Formula
[Subject/Character] + [Scene/Environment] + [Action/Motion] +
[Camera Movement] + [Timing Breakdown] + [Audio/Sound Design] + [Style/Mood]Three rules that matter more than anything else:
- The first 20–30 words carry the most weight. Pin your subject immediately or the model can hallucinate new characters at transitions.
- For clips over 10 seconds, use the time-segmented format (shown below). Without explicit timing, the model improvises and your prompt becomes a suggestion rather than a script.
- Keep it to 2–3 distinct shots per generation. Past 5 shots, subject consistency frays and visuals become chaotic.
Comprehensive Prompt Structure
1. Subject / Character Setup
State who or what is in the scene first. Be specific. The first 20–30 words determine what the model treats as "the subject."
Weak: "A person walking" Strong: "A 30s-year-old woman in a charcoal blazer, dark hair pulled back, walking confidently"
For multi-character scenes, name and describe each upfront:
"Two characters in frame: ALEX, a 40s tech executive in glasses and a navy suit, and SAM, a 20s designer in a white shirt with rolled sleeves."
2. Scene / Environment
Describe location, time of day, lighting, and atmosphere. The model uses this to set color grade and mood.
"Modern open-plan office, late afternoon, golden hour light streaming through floor-to-ceiling windows, glass partitions, polished concrete floor."
3. Action / Motion
What happens. Describe motion verbs concretely.
Weak: "She does something with the product" Strong: "She picks up the bottle, turns it 90 degrees toward camera, then sets it back down with a soft tap"
4. Camera Movement
The single biggest lever for cinematic quality. Use specific terms (full vocabulary below).
"Camera slowly orbits around the subject from left to right while gradually pushing in"
5. Timing Breakdown (required for 10s+)
0-3s: [Camera movement], [scene setup], [subject action], [expression]
3-6s: [Camera movement], [development], [shift in tone]
6-10s: [Camera movement], [climax], [emotional or visual peak]Without this breakdown, the model picks its own pacing and your prompt becomes advisory.
6. Audio / Sound Design
Always at the end, separated by 【Sound】 (the audio cue).
【Sound】Cinematic orchestral build, soft ambient room tone, footsteps on concrete
7. Style / Mood
Anchor genre, color grade, and reference style. Helps lock the visual treatment.
"Cinematic commercial style, shallow depth of field, warm color grade, premium product film aesthetic"
Full Example (10s Standard, 720p)
A 30s-year-old woman in a charcoal blazer, dark hair pulled back, in a
modern open-plan office at golden hour. Floor-to-ceiling windows, glass
partitions, polished concrete.
0-3s: Wide shot, camera slowly pushes in. She walks toward camera,
confident pace, looking just past the lens.
3-6s: Medium shot, camera tracks alongside her. She turns her head,
half-smile forming, eyes catching the light.
6-10s: Close-up on her face, camera continues subtle push in. The
half-smile widens slightly. She holds the gaze.
【Sound】Cinematic orchestral build, soft ambient room tone, low piano
Cinematic commercial style, shallow depth of field, warm color grade,
premium product film aestheticCamera Vocabulary
Use these exact terms — they map cleanly to the model's training distribution.
Shot Types
| Term | When to Use |
|---|---|
| Wide shot | Establish location, show scale |
| Medium shot | Body language, conversation, interaction |
| Close-up | Facial expressions, emotion, detail |
| Macro shot | Tiny details (hands, objects, textures) |
| Bird's-eye view | God perspective, show patterns from above |
| Low-angle shot | Make subject feel powerful or threatening |
| Over-the-shoulder | Conversation, POV connection |
| First-person perspective | Immersion, urgency, subjective view |
Camera Movements
| Term | Effect |
|---|---|
| Camera slowly zooming in | Build tension, draw attention |
| Camera panning left to right | Reveal, survey scene |
| Camera orbiting around subject | Drama, showcase, hero shot |
| Camera following behind subject | Journey, pursuit |
| Drone shot flying above | Scale, establishing |
| Handheld camera with slight shake | Documentary feel, urgency, chaos |
| Fixed camera position | Stability, stillness, aftermath |
| Medium tracking shot | Follow action, natural movement |
| Quick push in | Sudden focus, confrontation |
| Orbit shot | Dynamic showcase, fight scene |
Speed Modifiers
| Term | Effect |
|---|---|
| Slow-motion | Emphasize impact, beauty, horror |
| High-speed camera style | Cinematic slow-mo, premium feel |
| Fast-paced camera following | Urgency, chase, chaos |
| Smooth steady cam | Professional, calm, controlled |
| Dynamic action camera | Intense, fast, disorienting |
Subject Detail — Show, Don't Tell
The model reads visual descriptors better than emotional labels.
| Avoid | Use Instead |
|---|---|
| "sad face" | "eyes lowered, brow furrowed, mouth slightly downturned" |
| "happy expression" | "wide genuine smile, eyes crinkled at the corners, head slightly tilted" |
| "scared look" | "eyes wide, lips parted, shoulders pulled inward" |
| "angry" | "jaw clenched, brows drawn together, nostrils flared slightly" |
| "confident" | "shoulders squared, chin lifted, steady gaze, half-smile" |
The same principle applies to body language, gesture, and posture. Describe what the camera would see.
Multi-Shot Strategy
Seedance 2.0 handles 2–3 distinct shots per generation cleanly. Past that, consistency degrades — the same character can shift age, hair, or wardrobe between shots.
Recommended structure for 8–10s:
- 2 shots, ~4–5s each
- One subject, varying camera angle and distance
Recommended structure for 10–15s:
- 3 shots maximum
- Use the time-segmented format to make transitions explicit
Avoid:
- 5+ shots in one generation
- Switching subjects mid-clip without a clear cue
- Drastic location changes (use chaining instead — see below)
Common Use Cases
Commercial Spots (15s, 30s, 60s)
15s spot — single generation: product hero shot, clear CTA visual.
30s spot — chained generation (see Chaining section below): two 15s clips with first-last frame chaining for seamless continuity.
60s spot — chained: four 15s clips chained, or two 30s segments with a hard cut between.
Pattern that works:
- 0-3s: hook (visual surprise, motion, curiosity gap)
- 3-10s: product reveal + use context
- 10-15s: emotional payoff or CTA-friendly composition
Product Demos
Use macro shots and slow camera moves. Pair with reference_image_urls to lock product appearance:
{
"model": "seedance-2-0-reference-to-video",
"input": {
"prompt": "Macro shot of the product, camera slowly orbits 360 degrees around it...",
"reference_image_urls": ["https://your-cdn.com/product-hero.jpg"],
"duration": 8,
"resolution": "1080p"
}
}Character-Led Storytelling
Pin the character with reference images. Use reference_image_urls (up to 7) to lock face, wardrobe, and key props:
{
"reference_image_urls": [
"https://your-cdn.com/character-face.jpg",
"https://your-cdn.com/character-outfit.jpg",
"https://your-cdn.com/scene-setting.jpg"
]
}Social Content (9:16, vertical)
Set aspect_ratio: "9:16". Lead with subject in the upper third (mobile UI usually covers the bottom). Keep the first 1–2 seconds visually arresting — the scroll-stop window.
Cinematic / Trailer Style
Use aspect_ratio: "21:9" for letterboxed feel. Pair with slow push-ins, low-angle hero shots, and high-contrast color grade in the style description.
Going Beyond 15 Seconds — The Chaining Technique
Seedance 2.0's native maximum is 15 seconds per generation (per ByteDance's official launch: "15-second high-quality multi-shot audio-video output"). The model itself has explicit support for what ByteDance calls "video extension functionality that can generate continuous shots based on user prompts" — meaning chaining is an officially supported workflow, not a hack.
To produce 30s, 45s, or 60s output, chain multiple 15s generations using first-last frame control.
Method 1 — First-Last Frame Chaining (Recommended)
This produces a seamless transition between clips because the second clip starts on the exact last frame of the first.
Step 1: Generate the first 15s clip in image-to-video mode, supplying first_frame_url (your starting image) and steering the prompt toward a clear end state.
{
"model": "seedance-2-0-image-to-video",
"input": {
"prompt": "0-3s: ...; 3-10s: ...; 10-15s: subject ends in this exact pose, looking off-frame to the right",
"image_urls": ["https://your-cdn.com/scene-start.jpg"],
"duration": 15,
"resolution": "1080p"
}
}Step 2: Extract the last frame from the v1 output video. (Use ffmpeg -sseof -0.1 -i v1.mp4 -vframes 1 last.jpg, your video editor's "export frame" feature, or any frame-grab tool.)
Step 3: Generate the second 15s clip in image-to-video mode, using the extracted last-frame as the new first_frame_url. Continue the action in the prompt.
{
"model": "seedance-2-0-image-to-video",
"input": {
"prompt": "Continuing from previous shot. 0-3s: subject turns back toward camera; 3-10s: ...; 10-15s: ...",
"image_urls": ["https://your-cdn.com/v1-last-frame.jpg"],
"duration": 15,
"resolution": "1080p"
}
}Step 4: Stitch v1 and v2 in any video editor. The transition is invisible because v2's first frame is identical to v1's last frame.
Result: 30 seconds of seamless video. Repeat for 45s, 60s, or longer — by repeating the process iteratively, the duration ceiling is effectively removed (see reference for the underlying technique discussion).
Best practices for long chains (3+ extensions, 45s+)
- Re-supply the same
reference_image_urls(character/scene reference) on every extension to prevent drift. After 30+ seconds of chained generation, character consistency degrades unless the reference is re-anchored. (Reference: glbgpt — "users should re-upload the original Reference Image (the character sheet) in the extension prompt settings".) - Keep core character description identical in every extension prompt. Don't paraphrase the subject between clips.
- Shorter extensions (5–10s) preserve continuity better than 15s extensions when chaining 4+ clips. Trade off: more API calls but tighter visual coherence.
Method 2 — Video Reference Continuation
Less seamless than Method 1 but lighter setup. Feed the v1 output URL as reference_video_urls for v2 with a continuation prompt:
{
"model": "seedance-2-0-reference-to-video",
"input": {
"prompt": "Continuing the action from the reference video. Camera stays in the same style. 0-5s: ...; 5-15s: ...",
"reference_video_urls": ["https://your-cdn.com/v1-output.mp4"],
"duration": 15
}
}The model picks up motion style and approximate visual continuity, but the cut between clips will be more visible than Method 1.
Method 3 — Hard-Cut Stitching
For multi-scene narratives where you don't need seamless flow (e.g., a 30s spot with three distinct beats), generate independent clips and stitch with editor cuts. This is the easiest workflow for storytelling-style content.
Cost Consideration
Two 15s/1080p Standard clips = 2 × 2060 credits = 4120 credits for 30 seconds. Same generation cost as if a single 30s gen existed — you're just paying per-clip rather than per-spot.
Common Mistakes
| Mistake | Why It Fails | Fix |
|---|---|---|
| Short prompts (< 50 words) | Model improvises everything | Use the full formula, even for simple shots |
| No timing breakdown on 10s+ clips | Pacing is unpredictable | Always use 0-3s / 3-6s / 6-10s structure |
| Emotional labels ("sad", "happy") | Model doesn't translate labels reliably | Describe what the camera sees |
| 5+ distinct shots per generation | Subject consistency breaks | Cap at 2–3 shots; use chaining for more |
| Location changes mid-clip | Visual coherence fails | Use chaining with new first_frame_url |
| Ambiguous subject ("someone", "a person") | Model invents details | Specify age, clothing, posture upfront |
| Mixing first-person and third-person POV | Camera perspective wobbles | Pick one POV per generation |
| Asking for text overlays in the video | Model is unreliable at rendering text | Add text in post-production |
Related
- Seedance 2.0 API Reference
- Seedance 2.0 Fast API Reference — drafting tier, 480p/720p only
- ElevenLabs Audio API — pair with TTS for narrated video spots
References
The factual claims about Seedance 2.0 capabilities (15-second native cap, video extension support, chaining workflow) in this guide are sourced from:
- ByteDance Seed (model maker, official) — Seedance 2.0 Official Launch. Confirms the 15-second multi-shot output and explicitly cites "video extension functionality that can generate continuous shots based on user prompts" as a built-in capability.
- glbgpt — Seedance 2.0 Maximum Video Length. Documents the iterative chaining technique and best practices around character consistency on long chains.
The prompt structure recommendations (formula, time-segmented format, camera vocabulary, show-don't-tell, multi-shot strategy) are based on internal VicSee production experience generating Seedance 2.0 content at scale. Use cases and ad-spot pacing recommendations are illustrative; adapt to your specific creative brief.
Seedance 2.0 Fast
Generate 4-15 second videos quickly with Seedance 2.0 Fast through VicSee API. Same multimodal reference support as Standard, 19-20% cheaper. 80-670 credits.
Seedance 1.5 Pro
Generate 4-12 second videos with multilingual audio using Seedance 1.5 Pro through VicSee API. Up to 1080p resolution, native audio in 8+ languages. 15-260 credits.