Veo 3.1 API — Cinematic Video with Native Audio

Generate cinematic videos with Google Veo 3.1 through VicSee API. Native audio synthesis, text-to-video and image-to-video modes. Fast and Quality variants. 58-300 credits.

Try it now: Use the Veo 3.1 Generator to create cinematic videos with native audio.

Pricing

Variant	Resolution	Credits	Price (Pro Yearly)	Price (Pro Monthly)
Veo 3.1 (Fast)	720p / 1080p	58	$0.348	$0.696
Veo 3.1 (Fast)	4K	175	$1.05	$2.10
Veo 3.1 Quality	720p / 1080p	300	$1.80	$3.60
Veo 3.1 Quality	4K	960	$5.76	$11.52

Credits are deducted only on successful generation.

Endpoint

POST https://vicsee.com/api/v1/generate

See Authentication for API key setup.

Text to Video

Generate videos from text descriptions with native audio.

Model Variants

Model ID	Quality	Use Case
`veo-3-1-text-to-video`	Fast	Quick iterations, testing, drafts
`veo-3-1-quality-text-to-video`	Quality	Final production, highest fidelity

Parameters

`model` · `string` · required

The model ID for text-to-video generation.

Supported values:

"veo-3-1-text-to-video" — Fast mode (58 credits)
"veo-3-1-quality-text-to-video" — Quality mode (300 credits)

`prompt` · `string` · required

Text description of the video to generate. Include both visual and audio elements for best results with native audio.

Tips for native audio:

Describe sounds explicitly: "waves crashing", "birds chirping"
Include ambient sounds: "busy city street with traffic"
Mention speech if needed: "person speaking to camera"

Example: "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance"

`aspect_ratio` · `string` · optional

Video aspect ratio.

Supported values:

"16:9" — Landscape (default)
"9:16" — Portrait
"Auto" — Auto-detect from content

Default: "16:9"

`resolution` · `string` · optional

Output video resolution. Higher resolutions use APIMart provider and cost more credits.

Supported values:

"720p" — Standard definition (default, 58 / 300 credits)
"1080p" — Full HD (58 / 300 credits)
"4k" — Ultra HD (175 / 960 credits)

Default: "720p"

Example Request (Fast)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-text-to-video",
    "input": {
      "prompt": "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance",
      "aspect_ratio": "16:9"
    }
  }'

Example Request (Quality, 1080p)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-quality-text-to-video",
    "input": {
      "prompt": "Cinematic shot of a chef plating a gourmet dish in a professional kitchen, sizzling sounds, ambient restaurant chatter",
      "aspect_ratio": "16:9",
      "resolution": "1080p"
    }
  }'

Image to Video

Animate images into videos with native audio. Veo 3.1 supports first-frame and first+last frame transitions.

Model Variants

Model ID	Quality	Use Case
`veo-3-1-image-to-video`	Fast	Quick iterations, testing
`veo-3-1-quality-image-to-video`	Quality	Final production

Parameters

`model` · `string` · required

The model ID for image-to-video generation.

Supported values:

"veo-3-1-image-to-video" — Fast mode (58 credits)
"veo-3-1-quality-image-to-video" — Quality mode (300 credits)

`prompt` · `string` · required

Text description of how to animate the image(s). Describe the motion and any audio elements.

Example: "The scene comes alive with gentle motion, leaves rustling in the wind"

`image_urls` · `array<string>` · required

Array of image URLs to animate.

Modes:

1 image — Animates from the single frame
2 images — Creates a transition video from first frame to last frame

Constraints:

Minimum: 1 image
Maximum: 2 images
Must be publicly accessible (http:// or https://)
Supported formats: .jpg, .jpeg, .png, .webp
Maximum file size: 10MB per image

Example (single frame): ["https://example.com/start.jpg"]

Example (first+last): ["https://example.com/start.jpg", "https://example.com/end.jpg"]

`aspect_ratio` · `string` · optional

Video aspect ratio. Should match your input image(s) for best results.

Supported values:

"16:9" — Landscape
"9:16" — Portrait
"Auto" — Auto-detect from image

Default: "16:9"

`resolution` · `string` · optional

Output video resolution. Same values and pricing as text-to-video.

Supported values: "720p" (default), "1080p", "4k"

Example Request (Single Frame)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-image-to-video",
    "input": {
      "prompt": "The scene comes alive with gentle motion, birds flying across the sky",
      "image_urls": ["https://example.com/landscape.jpg"],
      "aspect_ratio": "16:9"
    }
  }'

Example Request (First + Last Frame Transition)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-quality-image-to-video",
    "input": {
      "prompt": "Smooth cinematic transition from day to night, time passing",
      "image_urls": [
        "https://example.com/scene-day.jpg",
        "https://example.com/scene-night.jpg"
      ],
      "aspect_ratio": "16:9"
    }
  }'

Tips for Native Audio

Veo 3.1 generates audio based on your prompt. Include audio cues for best results:

Good prompts include:

"waves crashing on shore"
"birds chirping in the forest"
"busy city street with traffic sounds"
"person speaking to the camera"
"rain pattering on a window"
"coffee shop with espresso machine sounds"

Example with rich audio:

{
  "model": "veo-3-1-text-to-video",
  "input": {
    "prompt": "A cozy coffee shop interior, barista steaming milk, espresso machine hissing, soft jazz playing in background, customers chatting quietly",
    "aspect_ratio": "16:9"
  }
}

Response

Success (200)

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "veo-3-1-text-to-video",
    "status": "pending",
    "creditsUsed": 58,
    "creditsRemaining": 440,
    "createdAt": "2026-01-17T12:00:00.000Z"
  }
}

Poll for completion using Tasks API.

Task Complete

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "veo-3-1-text-to-video",
    "status": "completed",
    "output": {
      "url": "https://cdn.vicsee.com/outputs/video_xyz.mp4",
      "duration": 8,
      "format": "mp4",
      "hasAudio": true
    },
    "createdAt": "2026-01-17T12:00:00.000Z",
    "completedAt": "2026-01-17T12:03:00.000Z"
  }
}

Seedance 2.0 — Multimodal reference video with native audio
Kling 2.6 — Videos with dialogue and lip-sync
Hailuo 2.3 — Image-to-video with motion control

Veo 3.1

On this page