Seedance 2.0 Prompt Guide

Comprehensive prompt structure, camera vocabulary, multi-shot strategy, and 30-second chaining technique for Seedance 2.0 on the VicSee API.

For: Customers writing Seedance 2.0 prompts via the VicSee API. Pairs with the Seedance 2.0 API reference.

This guide covers prompt structure, camera vocabulary, common use cases, and the chaining technique to produce videos longer than the 15-second native limit.


TL;DR — Prompt Formula

[Subject/Character] + [Scene/Environment] + [Action/Motion] + 
[Camera Movement] + [Timing Breakdown] + [Audio/Sound Design] + [Style/Mood]

Three rules that matter more than anything else:

  1. The first 20–30 words carry the most weight. Pin your subject immediately or the model can hallucinate new characters at transitions.
  2. For clips over 10 seconds, use the time-segmented format (shown below). Without explicit timing, the model improvises and your prompt becomes a suggestion rather than a script.
  3. Keep it to 2–3 distinct shots per generation. Past 5 shots, subject consistency frays and visuals become chaotic.

Comprehensive Prompt Structure

1. Subject / Character Setup

State who or what is in the scene first. Be specific. The first 20–30 words determine what the model treats as "the subject."

Weak: "A person walking" Strong: "A 30s-year-old woman in a charcoal blazer, dark hair pulled back, walking confidently"

For multi-character scenes, name and describe each upfront:

"Two characters in frame: ALEX, a 40s tech executive in glasses and a navy suit, and SAM, a 20s designer in a white shirt with rolled sleeves."

2. Scene / Environment

Describe location, time of day, lighting, and atmosphere. The model uses this to set color grade and mood.

"Modern open-plan office, late afternoon, golden hour light streaming through floor-to-ceiling windows, glass partitions, polished concrete floor."

3. Action / Motion

What happens. Describe motion verbs concretely.

Weak: "She does something with the product" Strong: "She picks up the bottle, turns it 90 degrees toward camera, then sets it back down with a soft tap"

4. Camera Movement

The single biggest lever for cinematic quality. Use specific terms (full vocabulary below).

"Camera slowly orbits around the subject from left to right while gradually pushing in"

5. Timing Breakdown (required for 10s+)

0-3s: [Camera movement], [scene setup], [subject action], [expression]
3-6s: [Camera movement], [development], [shift in tone]
6-10s: [Camera movement], [climax], [emotional or visual peak]

Without this breakdown, the model picks its own pacing and your prompt becomes advisory.

6. Audio / Sound Design

Always at the end, separated by 【Sound】 (the audio cue).

【Sound】Cinematic orchestral build, soft ambient room tone, footsteps on concrete

7. Style / Mood

Anchor genre, color grade, and reference style. Helps lock the visual treatment.

"Cinematic commercial style, shallow depth of field, warm color grade, premium product film aesthetic"

Full Example (10s Standard, 720p)

A 30s-year-old woman in a charcoal blazer, dark hair pulled back, in a 
modern open-plan office at golden hour. Floor-to-ceiling windows, glass 
partitions, polished concrete.

0-3s: Wide shot, camera slowly pushes in. She walks toward camera, 
confident pace, looking just past the lens.

3-6s: Medium shot, camera tracks alongside her. She turns her head, 
half-smile forming, eyes catching the light.

6-10s: Close-up on her face, camera continues subtle push in. The 
half-smile widens slightly. She holds the gaze.

【Sound】Cinematic orchestral build, soft ambient room tone, low piano

Cinematic commercial style, shallow depth of field, warm color grade, 
premium product film aesthetic

Camera Vocabulary

Use these exact terms — they map cleanly to the model's training distribution.

Shot Types

TermWhen to Use
Wide shotEstablish location, show scale
Medium shotBody language, conversation, interaction
Close-upFacial expressions, emotion, detail
Macro shotTiny details (hands, objects, textures)
Bird's-eye viewGod perspective, show patterns from above
Low-angle shotMake subject feel powerful or threatening
Over-the-shoulderConversation, POV connection
First-person perspectiveImmersion, urgency, subjective view

Camera Movements

TermEffect
Camera slowly zooming inBuild tension, draw attention
Camera panning left to rightReveal, survey scene
Camera orbiting around subjectDrama, showcase, hero shot
Camera following behind subjectJourney, pursuit
Drone shot flying aboveScale, establishing
Handheld camera with slight shakeDocumentary feel, urgency, chaos
Fixed camera positionStability, stillness, aftermath
Medium tracking shotFollow action, natural movement
Quick push inSudden focus, confrontation
Orbit shotDynamic showcase, fight scene

Speed Modifiers

TermEffect
Slow-motionEmphasize impact, beauty, horror
High-speed camera styleCinematic slow-mo, premium feel
Fast-paced camera followingUrgency, chase, chaos
Smooth steady camProfessional, calm, controlled
Dynamic action cameraIntense, fast, disorienting

Subject Detail — Show, Don't Tell

The model reads visual descriptors better than emotional labels.

AvoidUse Instead
"sad face""eyes lowered, brow furrowed, mouth slightly downturned"
"happy expression""wide genuine smile, eyes crinkled at the corners, head slightly tilted"
"scared look""eyes wide, lips parted, shoulders pulled inward"
"angry""jaw clenched, brows drawn together, nostrils flared slightly"
"confident""shoulders squared, chin lifted, steady gaze, half-smile"

The same principle applies to body language, gesture, and posture. Describe what the camera would see.


Multi-Shot Strategy

Seedance 2.0 handles 2–3 distinct shots per generation cleanly. Past that, consistency degrades — the same character can shift age, hair, or wardrobe between shots.

Recommended structure for 8–10s:

  • 2 shots, ~4–5s each
  • One subject, varying camera angle and distance

Recommended structure for 10–15s:

  • 3 shots maximum
  • Use the time-segmented format to make transitions explicit

Avoid:

  • 5+ shots in one generation
  • Switching subjects mid-clip without a clear cue
  • Drastic location changes (use chaining instead — see below)

Common Use Cases

Commercial Spots (15s, 30s, 60s)

15s spot — single generation: product hero shot, clear CTA visual.

30s spot — chained generation (see Chaining section below): two 15s clips with first-last frame chaining for seamless continuity.

60s spot — chained: four 15s clips chained, or two 30s segments with a hard cut between.

Pattern that works:

  • 0-3s: hook (visual surprise, motion, curiosity gap)
  • 3-10s: product reveal + use context
  • 10-15s: emotional payoff or CTA-friendly composition

Product Demos

Use macro shots and slow camera moves. Pair with reference_image_urls to lock product appearance:

{
  "model": "seedance-2-0-reference-to-video",
  "input": {
    "prompt": "Macro shot of the product, camera slowly orbits 360 degrees around it...",
    "reference_image_urls": ["https://your-cdn.com/product-hero.jpg"],
    "duration": 8,
    "resolution": "1080p"
  }
}

Character-Led Storytelling

Pin the character with reference images. Use reference_image_urls (up to 7) to lock face, wardrobe, and key props:

{
  "reference_image_urls": [
    "https://your-cdn.com/character-face.jpg",
    "https://your-cdn.com/character-outfit.jpg",
    "https://your-cdn.com/scene-setting.jpg"
  ]
}

Social Content (9:16, vertical)

Set aspect_ratio: "9:16". Lead with subject in the upper third (mobile UI usually covers the bottom). Keep the first 1–2 seconds visually arresting — the scroll-stop window.

Cinematic / Trailer Style

Use aspect_ratio: "21:9" for letterboxed feel. Pair with slow push-ins, low-angle hero shots, and high-contrast color grade in the style description.


Going Beyond 15 Seconds — The Chaining Technique

Seedance 2.0's native maximum is 15 seconds per generation (per ByteDance's official launch: "15-second high-quality multi-shot audio-video output"). The model itself has explicit support for what ByteDance calls "video extension functionality that can generate continuous shots based on user prompts" — meaning chaining is an officially supported workflow, not a hack.

To produce 30s, 45s, or 60s output, chain multiple 15s generations using first-last frame control.

This produces a seamless transition between clips because the second clip starts on the exact last frame of the first.

Step 1: Generate the first 15s clip in image-to-video mode, supplying first_frame_url (your starting image) and steering the prompt toward a clear end state.

{
  "model": "seedance-2-0-image-to-video",
  "input": {
    "prompt": "0-3s: ...; 3-10s: ...; 10-15s: subject ends in this exact pose, looking off-frame to the right",
    "image_urls": ["https://your-cdn.com/scene-start.jpg"],
    "duration": 15,
    "resolution": "1080p"
  }
}

Step 2: Extract the last frame from the v1 output video. (Use ffmpeg -sseof -0.1 -i v1.mp4 -vframes 1 last.jpg, your video editor's "export frame" feature, or any frame-grab tool.)

Step 3: Generate the second 15s clip in image-to-video mode, using the extracted last-frame as the new first_frame_url. Continue the action in the prompt.

{
  "model": "seedance-2-0-image-to-video",
  "input": {
    "prompt": "Continuing from previous shot. 0-3s: subject turns back toward camera; 3-10s: ...; 10-15s: ...",
    "image_urls": ["https://your-cdn.com/v1-last-frame.jpg"],
    "duration": 15,
    "resolution": "1080p"
  }
}

Step 4: Stitch v1 and v2 in any video editor. The transition is invisible because v2's first frame is identical to v1's last frame.

Result: 30 seconds of seamless video. Repeat for 45s, 60s, or longer — by repeating the process iteratively, the duration ceiling is effectively removed (see reference for the underlying technique discussion).

Best practices for long chains (3+ extensions, 45s+)

  • Re-supply the same reference_image_urls (character/scene reference) on every extension to prevent drift. After 30+ seconds of chained generation, character consistency degrades unless the reference is re-anchored. (Reference: glbgpt — "users should re-upload the original Reference Image (the character sheet) in the extension prompt settings".)
  • Keep core character description identical in every extension prompt. Don't paraphrase the subject between clips.
  • Shorter extensions (5–10s) preserve continuity better than 15s extensions when chaining 4+ clips. Trade off: more API calls but tighter visual coherence.

Method 2 — Video Reference Continuation

Less seamless than Method 1 but lighter setup. Feed the v1 output URL as reference_video_urls for v2 with a continuation prompt:

{
  "model": "seedance-2-0-reference-to-video",
  "input": {
    "prompt": "Continuing the action from the reference video. Camera stays in the same style. 0-5s: ...; 5-15s: ...",
    "reference_video_urls": ["https://your-cdn.com/v1-output.mp4"],
    "duration": 15
  }
}

The model picks up motion style and approximate visual continuity, but the cut between clips will be more visible than Method 1.

Method 3 — Hard-Cut Stitching

For multi-scene narratives where you don't need seamless flow (e.g., a 30s spot with three distinct beats), generate independent clips and stitch with editor cuts. This is the easiest workflow for storytelling-style content.

Cost Consideration

Two 15s/1080p Standard clips = 2 × 2060 credits = 4120 credits for 30 seconds. Same generation cost as if a single 30s gen existed — you're just paying per-clip rather than per-spot.


Common Mistakes

MistakeWhy It FailsFix
Short prompts (< 50 words)Model improvises everythingUse the full formula, even for simple shots
No timing breakdown on 10s+ clipsPacing is unpredictableAlways use 0-3s / 3-6s / 6-10s structure
Emotional labels ("sad", "happy")Model doesn't translate labels reliablyDescribe what the camera sees
5+ distinct shots per generationSubject consistency breaksCap at 2–3 shots; use chaining for more
Location changes mid-clipVisual coherence failsUse chaining with new first_frame_url
Ambiguous subject ("someone", "a person")Model invents detailsSpecify age, clothing, posture upfront
Mixing first-person and third-person POVCamera perspective wobblesPick one POV per generation
Asking for text overlays in the videoModel is unreliable at rendering textAdd text in post-production


References

The factual claims about Seedance 2.0 capabilities (15-second native cap, video extension support, chaining workflow) in this guide are sourced from:

  • ByteDance Seed (model maker, official)Seedance 2.0 Official Launch. Confirms the 15-second multi-shot output and explicitly cites "video extension functionality that can generate continuous shots based on user prompts" as a built-in capability.
  • glbgpt — Seedance 2.0 Maximum Video Length. Documents the iterative chaining technique and best practices around character consistency on long chains.

The prompt structure recommendations (formula, time-segmented format, camera vocabulary, show-don't-tell, multi-shot strategy) are based on internal VicSee production experience generating Seedance 2.0 content at scale. Use cases and ad-spot pacing recommendations are illustrative; adapt to your specific creative brief.