Last tested: February 2026 | All videos generated on VicSee
Kling 3.0 is the most capable video model Kuaishou has released. But the difference between a generic AI clip and something that looks like real footage comes down to two things: the starting image and how you describe the motion.
Every prompt below uses image-to-video — I generated a base image first, then animated it with Kling 3.0. This gives you control over composition, lighting, and framing before the model ever touches it. The results look like footage, not AI demos.
If it didn't produce something worth showing, I cut it. What you see are the actual outputs.
Prompt 1: Dragon Tower Launch
Base image (Nano Banana Pro):
A massive dragon with iridescent green-gold scales perched on a medieval stone watchtower at dusk. Wings folded against its body. Tail wrapped around the tower base. A walled city visible below with torchlight in the streets. Storm clouds gathering overhead with distant lightning. The dragon's amber eye catching the last light of the setting sun.
Video prompt:
The dragon beats its wings with thunderous force and launches off the tower, stone crumbling from the edge. It rockets upward, the downdraft scattering torchlight in the streets below. It banks hard left at speed, jaws opening, a stream of fire ripping across the storm sky. Lightning illuminates its silhouette.
Settings: Professional | 8s | Audio on | 16:9
Why this works: The medieval city grounds the scene in reality. The dragon is the only impossible element, and it interacts with real things — stone crumbles, torches flicker, fire lights up clouds. These interactions prevent it from looking like a cartoon pasted onto a background. The lightning silhouette at the end is a classic fantasy money shot.
Prompt 2: Supercar in the Rain
Base image (Nano Banana Pro):
A sleek black Lamborghini parked on a rain-slicked Tokyo street at night, low three-quarter angle. Neon signs reflecting off the wet asphalt in streaks of pink and blue. Puddles around the tires. Volumetric fog from a steam vent. Lens flare from a distant streetlight. Cinematic, photorealistic.
Video prompt:
The headlights blaze on and the engine roars to life. The rear tires spin, spraying water across the wet asphalt, and the car launches forward. The camera tracks alongside at speed as neon reflections streak past on the wet ground.
Settings: Professional | 8s | Audio on | 16:9
Why this works: Wet surfaces are your best friend in AI video. Rain creates reflections, spray creates particles, and neon creates color contrast — all of which give the model rich motion data. The low camera angle makes the car feel powerful. The tracking shot creates the sensation of speed without needing a second vehicle.
Tip: I trimmed the first 2 seconds in post — the model spends them on the engine startup before the car moves. When your prompt starts with a static setup ("lights turn on", "engine starts"), expect the model to spend real time on it. If you want instant motion, lead with the action verb: "The car rockets forward..."
Prompt 3: Cafe Window in the Rain
Base image (Nano Banana Pro):
A woman in a cream wool sweater sitting at a rain-streaked cafe window in Paris. One hand wrapped around a ceramic latte cup, the other resting near a small spoon on the saucer. Warm amber interior lighting contrasting with cold blue-grey street outside. Reflections of neon brasserie signs in the wet glass. A half-read paperback on the marble table. Shallow depth of field, cinematic.
Video prompt:
She lowers her cup to the saucer, gazing out at the rain. Lightning cracks outside, illuminating the wet street for an instant — she turns toward the camera with a faint smile. Rain intensifies on the glass, streaks racing down, neon reflections rippling on the cobblestones. Slow dolly in toward her face.
Settings: Professional | 8s | Audio on | 16:9
Why this works: The warm/cold color contrast between interior and exterior does the emotional work. You don't need to tell the model "she feels nostalgic" — the lighting already says it. The model spent its motion budget on the dolly-in and the head turn rather than the lightning flash I asked for — that's a pattern with Kling 3.0. It prioritizes character motion and camera work over environmental events. The rain on glass gives constant subtle motion, keeping the frame alive during the quiet character moment. The result reads like a film cutaway, not an AI demo.
Try these yourself: VicSee Kling 3.0 → — free credits on signup.
Prompt 4: Samurai Sword Draw
Base image (Nano Banana Pro):
A lone samurai in weathered dark armor standing in a misty bamboo forest at dawn. Katana sheathed at his left hip, right hand gripping the handle, poised to draw. Golden shafts of light cutting through fog. Ground-level camera angle looking up. Dewdrops visible on bamboo stalks. Shallow depth of field, background dissolving into mist.
Video prompt:
The samurai raises the katana and slashes in one explosive arc, the blade catching golden light as it cuts through the air. Mist swirls violently around the slash. Two bamboo stalks behind him slide and topple, cut clean. The camera pushes in slowly on his face, eyes locked forward.
Settings: Professional | 8s | Audio on | 16:9
Why this works: The base image holds tension — the samurai is poised, blade already in hand. The slash releases it. The model nailed the explosive arc and the camera push-in to a tight portrait, which gives the clip a three-act structure: stance, strike, stare. One thing it didn't do: the two bamboo stalks I asked to topple never fell. Kling 3.0 prioritized the character action and camera work over environmental consequences. That's a pattern you'll notice — the model spends its budget on whatever has the most motion energy, and a slow bamboo fall loses to a fast sword slash every time.
Prompt 5: Eagle River Strike
Base image (Nano Banana Pro):
A bald eagle in mid-flight over a mountain river, wings fully spread, talons extended downward toward the water surface. Crystal clear river with visible rocks beneath. Forested mountains in the background. Sharp morning light creating a rim light on the eagle's white head feathers. Water surface showing a faint reflection of the eagle above.
Video prompt:
The eagle's talons slam into the river with an explosive splash, water erupting around its legs. Wings beat down hard and it launches back into the air, climbing fast, water streaming off its talons in silver trails. The camera tilts up to follow as it rises against the mountain backdrop, morning light catching the spray.
Settings: Professional | 8s | Audio on | 16:9
Why this works: The strike-and-climb arc gives the model a complete two-act motion sequence. Kling 3.0 rendered the approach in nature-documentary slow motion — the talons reach the water over 4 seconds, then the ascent fills the second half with powerful wingbeats and water droplets streaming off the talons. This is a pattern: the model treats dramatic wildlife moments as cinematic slow-mo regardless of speed verbs. The result looks like real footage from a nature documentary, which is arguably better than the "fast strike" we prompted for.
Prompt 6: Sneaker Levitation
Base image (Nano Banana Pro):
A pristine white sneaker suspended in midair against a pure black background. Dramatic side lighting from the right revealing mesh texture and stitching detail. A faint reflection on the glossy black surface below. Fine particles of gold dust frozen in the air around it. Clean, minimal, luxury product photography.
Video prompt:
The sneaker rotates in midair, the side lighting catching different textures as it turns. Gold dust particles drift across the frame. The shoe tilts slightly, hovering, as if weightless. The camera orbits around it in a half circle.
Settings: Professional | 6s | Audio off | 1:1
Why this works: Black background with single light source is luxury visual language. The floating effect elevates the product literally and figuratively. Gold particles add movement without distracting from the subject. 1:1 ratio for Instagram feed ads. 6 seconds is the sweet spot for social auto-play. Audio off — add your own brand soundtrack.
Use these for your brand: Generate on VicSee →
Prompt 7: Rally Car Corner Exit
Base image (Nano Banana Pro):
A red rally car mid-corner on a dusty mountain road, seen from outside the turn. Gravel and dust kicked up behind the rear wheels. Pine forest in the background. Late afternoon golden light cutting across the road. Spectators visible behind a barrier in the distance. Photorealistic motorsport photography.
Video prompt:
The car powers out of the corner, rear end sliding wide, gravel and dirt erupting from the tires in a rooster tail. It straightens, accelerates hard down the mountain road, kicking up a dust cloud that hangs in the golden afternoon light. The camera pans to follow as it disappears around the next bend.
Settings: Professional | 8s | Audio on | 16:9
Why this works — and what to watch for: The dust, gravel spray, and golden volumetric light are all here. But notice the first 5 seconds feel like slow motion — the car "powers" rather than rockets. That's Kling 3.0's default behavior: it treats dramatic action as cinematic slow-mo unless you explicitly override it. The fix: swap "powers out" for "rockets out at full speed", add "violently" to the gravel eruption, and change "camera pans" to "fast tracking shot keeping pace at road level." Speed verbs and camera speed language are the difference between a contemplative drift and an adrenaline clip.
Prompt 8: Night Market Wok Fire
Base image (Nano Banana Pro):
Close-up of a Thai street food chef behind a flaming wok at a Bangkok night market. Fire erupting from the wok, illuminating his focused expression from below. Steam and smoke swirling. Colorful lanterns and string lights bokeh in the background. Shot from across the counter at customer eye level. Cinematic food photography.
Video prompt:
The chef flips the wok hard, launching noodles into the air as a fireball erupts upward. He catches them clean, tilts the wok and slides the dish onto a plate with his spatula, then pushes the plate across the worn counter toward the camera with a grin. Steam billows upward through the lantern light.
Settings: Professional | 8s | Audio on | 16:9
Why this works: Fire is the ultimate AI video cheat code — it creates unpredictable, dynamic light that changes every frame, which is what makes the output look like real footage instead of a rendered scene. The wok toss gives the model a clear action arc, and the slide toward camera breaks the fourth wall, putting the viewer in the scene as the customer. One limitation to note: the model executes "cook noodles" and "serve dish" as two separate actions rather than a connected sequence — the plate he slides over isn't the food from the wok. At normal viewing speed you won't notice, but it reveals a pattern: Kling 3.0 handles individual actions well but struggles to chain cause-and-effect across a full sequence.
Bonus: What Happens When Your Prompt Isn't Specific Enough
The prompts above are the final versions. Here's what the first draft of Prompt 8 looked like — and why it failed.
Original prompt (first attempt):
The chef tosses noodles high from the flaming wok, fire roaring up in a burst. He catches them with a rapid flip, plates the dish in one swift motion, and slides it across the worn counter toward the camera with a grin. Steam billows upward through the lantern light.
What went wrong: Watch carefully at the 3-second mark. The chef reaches into the flaming wok and pulls out the noodles with his bare hands. The video quality is excellent — the fire, the lighting, the camera angle are all cinematic. But the physics are impossible. No human touches food in a wok that's actively on fire.
The fix:
The chef flips the wok hard, launching noodles into the air as a fireball erupts upward. He catches them clean, tilts the wok and slides the dish onto a plate with his spatula...
Two changes: (1) specified the technique (wok flip, not hand grab), and (2) specified the tool (spatula, not bare hands). The model doesn't know cooking physics — it knows what "catches them" looks like visually, and sometimes that means hands. When physical realism matters, name the specific action and the specific tool.
Takeaway: AI video models execute what looks right, not what is right. If your prompt says "catches them," the model picks whatever catch motion it's seen most often in training data. For actions involving heat, sharp objects, heavy machinery, or any physical danger — always specify the method.
How to Write Your Own Image-to-Video Prompts
Every prompt above follows the same two-step workflow: generate a base image, then animate it.
Why image-to-video? Text-to-video asks the model to design the scene, set the lighting, compose the frame, AND animate it. That's too many decisions, and the model compromises on all of them. Image-to-video splits the work — the image model handles visual quality, the video model handles motion. You get both at their peak.
| Step | Tool | You control |
|---|---|---|
| 1. Design the frame | Nano Banana Pro | Composition, lighting, color, detail |
| 2. Animate the frame | Kling 3.0 | Motion, camera movement, action |
The base image prompt describes a frozen moment — the first frame of your film. Be specific about camera angle, lighting direction, color palette, and subject position.
The video prompt describes what happens next. Don't re-describe the scene — the model can already see the image. Just tell it what moves.
| Bad video prompt | Good video prompt |
|---|---|
| "A car on a rainy street starts to drive" | "The engine roars to life, tires spin and spray water, the car rockets forward as the camera tracks alongside" |
| "A woman sitting at a table looks up" | "She stirs her coffee slowly, then looks up with a faint smile as lightning illuminates the street outside" |
| "A dragon flies away" | "The dragon spreads its wings with explosive force, launches off the tower, beats once and banks hard left, fire trailing from its jaws" |
More verbs, more physics, more camera direction. That's the formula.
Settings Quick Reference
Quality
| Mode | Best for | Cost |
|---|---|---|
| Standard | Nature, architecture, no close-up faces | Lower |
| Professional | Faces, products, detailed action | Higher |
Duration
| Seconds | Use case |
|---|---|
| 3-5s | Product shots, single action |
| 6-8s | Character moments, two-beat sequences |
| 9-12s | Complex action, environmental storytelling |
Aspect Ratio
| Ratio | Platform |
|---|---|
| 16:9 | YouTube, cinematic, horizontal ads |
| 9:16 | TikTok, Reels, Shorts, Story ads |
| 1:1 | Instagram feed, product shots |
Audio
- On: Action scenes, dialogue, ambience, anything with impacts or environment
- Off: Content you'll add custom music or voiceover to
Try These Prompts Now
- Generate a base image on Nano Banana Pro
- Go to vicsee.com/kling-3
- Select Image to Video
- Upload your base image
- Paste the video prompt
- Match the recommended settings
- Generate
Free credits on signup. No credit card required. Need more? Check the pricing plans.
API access: VicSee Developers
FAQ
What base image model works best?
We used Nano Banana Pro for all base images in this guide. It produces the best photorealism, detail, and prompt adherence of any image model on VicSee.
Does the base image aspect ratio matter?
Yes. Match your base image aspect ratio to your video aspect ratio. A 16:9 image for 16:9 video, 9:16 for vertical, 1:1 for square.
How much does it cost?
Base image generation uses separate credits from video. Video costs depend on duration, quality, and audio. Check the pricing page for full details.
Can I use these for commercial projects?
Yes. No additional licensing fees beyond credit usage.
What image resolution should I use?
1K or 2K resolution for the base image is enough. The video model will produce its own output resolution. You don't need 4K base images.
Can I add dialogue to image-to-video?
Yes. Include quoted speech in your video prompt. The model will animate lip movement on faces in your base image. Keep dialogue to 1-2 short sentences for best results.
All prompts tested by JZ on VicSee using Kling 3.0. Base images generated with Nano Banana Pro. Last updated: February 2026.

