Veo 3.1
Generate cinematic videos with Google Veo 3.1 through VicSee API. Native audio synthesis, text-to-video and image-to-video modes. Fast and Quality variants. 58-300 credits.
Try it now: Use the Veo 3.1 Generator to create cinematic videos with native audio.
Pricing
| Variant | Resolution | Credits | Price (Pro Yearly) | Price (Pro Monthly) |
|---|---|---|---|---|
| Veo 3.1 (Fast) | 720p / 1080p | 58 | $0.348 | $0.696 |
| Veo 3.1 (Fast) | 4K | 175 | $1.05 | $2.10 |
| Veo 3.1 Quality | 720p / 1080p | 300 | $1.80 | $3.60 |
| Veo 3.1 Quality | 4K | 960 | $5.76 | $11.52 |
Credits are deducted only on successful generation.
Endpoint
POST https://vicsee.com/api/v1/generateSee Authentication for API key setup.
Text to Video
Generate videos from text descriptions with native audio.
Model Variants
| Model ID | Quality | Use Case |
|---|---|---|
veo-3-1-text-to-video | Fast | Quick iterations, testing, drafts |
veo-3-1-quality-text-to-video | Quality | Final production, highest fidelity |
Parameters
model · string · required
The model ID for text-to-video generation.
Supported values:
"veo-3-1-text-to-video"— Fast mode (58 credits)"veo-3-1-quality-text-to-video"— Quality mode (300 credits)
prompt · string · required
Text description of the video to generate. Include both visual and audio elements for best results with native audio.
Tips for native audio:
- Describe sounds explicitly: "waves crashing", "birds chirping"
- Include ambient sounds: "busy city street with traffic"
- Mention speech if needed: "person speaking to camera"
Example: "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance"
aspect_ratio · string · optional
Video aspect ratio.
Supported values:
"16:9"— Landscape (default)"9:16"— Portrait"Auto"— Auto-detect from content
Default: "16:9"
resolution · string · optional
Output video resolution. Higher resolutions use APIMart provider and cost more credits.
Supported values:
"720p"— Standard definition (default, 58 / 300 credits)"1080p"— Full HD (58 / 300 credits)"4k"— Ultra HD (175 / 960 credits)
Default: "720p"
Example Request (Fast)
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "veo-3-1-text-to-video",
"input": {
"prompt": "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance",
"aspect_ratio": "16:9"
}
}'Example Request (Quality, 1080p)
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "veo-3-1-quality-text-to-video",
"input": {
"prompt": "Cinematic shot of a chef plating a gourmet dish in a professional kitchen, sizzling sounds, ambient restaurant chatter",
"aspect_ratio": "16:9",
"resolution": "1080p"
}
}'Image to Video
Animate images into videos with native audio. Veo 3.1 supports first-frame and first+last frame transitions.
Model Variants
| Model ID | Quality | Use Case |
|---|---|---|
veo-3-1-image-to-video | Fast | Quick iterations, testing |
veo-3-1-quality-image-to-video | Quality | Final production |
Parameters
model · string · required
The model ID for image-to-video generation.
Supported values:
"veo-3-1-image-to-video"— Fast mode (58 credits)"veo-3-1-quality-image-to-video"— Quality mode (300 credits)
prompt · string · required
Text description of how to animate the image(s). Describe the motion and any audio elements.
Example: "The scene comes alive with gentle motion, leaves rustling in the wind"
image_urls · array<string> · required
Array of image URLs to animate.
Modes:
- 1 image — Animates from the single frame
- 2 images — Creates a transition video from first frame to last frame
Constraints:
- Minimum: 1 image
- Maximum: 2 images
- Must be publicly accessible (http:// or https://)
- Supported formats:
.jpg,.jpeg,.png,.webp - Maximum file size: 10MB per image
Example (single frame): ["https://example.com/start.jpg"]
Example (first+last): ["https://example.com/start.jpg", "https://example.com/end.jpg"]
aspect_ratio · string · optional
Video aspect ratio. Should match your input image(s) for best results.
Supported values:
"16:9"— Landscape"9:16"— Portrait"Auto"— Auto-detect from image
Default: "16:9"
resolution · string · optional
Output video resolution. Same values and pricing as text-to-video.
Supported values: "720p" (default), "1080p", "4k"
Example Request (Single Frame)
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "veo-3-1-image-to-video",
"input": {
"prompt": "The scene comes alive with gentle motion, birds flying across the sky",
"image_urls": ["https://example.com/landscape.jpg"],
"aspect_ratio": "16:9"
}
}'Example Request (First + Last Frame Transition)
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "veo-3-1-quality-image-to-video",
"input": {
"prompt": "Smooth cinematic transition from day to night, time passing",
"image_urls": [
"https://example.com/scene-day.jpg",
"https://example.com/scene-night.jpg"
],
"aspect_ratio": "16:9"
}
}'Tips for Native Audio
Veo 3.1 generates audio based on your prompt. Include audio cues for best results:
Good prompts include:
- "waves crashing on shore"
- "birds chirping in the forest"
- "busy city street with traffic sounds"
- "person speaking to the camera"
- "rain pattering on a window"
- "coffee shop with espresso machine sounds"
Example with rich audio:
{
"model": "veo-3-1-text-to-video",
"input": {
"prompt": "A cozy coffee shop interior, barista steaming milk, espresso machine hissing, soft jazz playing in background, customers chatting quietly",
"aspect_ratio": "16:9"
}
}Response
Success (200)
{
"success": true,
"data": {
"id": "task_abc123xyz",
"model": "veo-3-1-text-to-video",
"status": "pending",
"creditsUsed": 58,
"creditsRemaining": 440,
"createdAt": "2026-01-17T12:00:00.000Z"
}
}Poll for completion using Tasks API.
Task Complete
{
"success": true,
"data": {
"id": "task_abc123xyz",
"model": "veo-3-1-text-to-video",
"status": "completed",
"output": {
"url": "https://cdn.vicsee.com/outputs/video_xyz.mp4",
"duration": 8,
"format": "mp4",
"hasAudio": true
},
"createdAt": "2026-01-17T12:00:00.000Z",
"completedAt": "2026-01-17T12:03:00.000Z"
}
}Related Models
- Sora 2 — 10-15s videos with physics-accurate motion
- Kling 2.6 — Videos with dialogue and lip-sync
- Hailuo 2.3 — Image-to-video with motion control
Sora 2 Pro
Generate HD 1080p AI videos with Sora 2 Pro through VicSee API. Standard (720p) and HD (1080p) quality, physics-accurate motion. 10-15 second clips, 105-440 credits per video.
Kling 2.6
Generate 5-10 second videos with dialogue and lip-sync using Kling 2.6 through VicSee API. Audio-visual sync, talking head videos, character animations. 75-300 credits.