Veo 3.1

Generate cinematic videos with Google Veo 3.1 through VicSee API. Native audio synthesis, text-to-video and image-to-video modes. Fast and Quality variants. 58-300 credits.

Try it now: Use the Veo 3.1 Generator to create cinematic videos with native audio.

Pricing

VariantResolutionCreditsPrice (Pro Yearly)Price (Pro Monthly)
Veo 3.1 (Fast)720p / 1080p58$0.348$0.696
Veo 3.1 (Fast)4K175$1.05$2.10
Veo 3.1 Quality720p / 1080p300$1.80$3.60
Veo 3.1 Quality4K960$5.76$11.52

Credits are deducted only on successful generation.

Endpoint

POST https://vicsee.com/api/v1/generate

See Authentication for API key setup.


Text to Video

Generate videos from text descriptions with native audio.

Model Variants

Model IDQualityUse Case
veo-3-1-text-to-videoFastQuick iterations, testing, drafts
veo-3-1-quality-text-to-videoQualityFinal production, highest fidelity

Parameters


model · string · required

The model ID for text-to-video generation.

Supported values:

  • "veo-3-1-text-to-video" — Fast mode (58 credits)
  • "veo-3-1-quality-text-to-video" — Quality mode (300 credits)

prompt · string · required

Text description of the video to generate. Include both visual and audio elements for best results with native audio.

Tips for native audio:

  • Describe sounds explicitly: "waves crashing", "birds chirping"
  • Include ambient sounds: "busy city street with traffic"
  • Mention speech if needed: "person speaking to camera"

Example: "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance"


aspect_ratio · string · optional

Video aspect ratio.

Supported values:

  • "16:9" — Landscape (default)
  • "9:16" — Portrait
  • "Auto" — Auto-detect from content

Default: "16:9"


resolution · string · optional

Output video resolution. Higher resolutions use APIMart provider and cost more credits.

Supported values:

  • "720p" — Standard definition (default, 58 / 300 credits)
  • "1080p" — Full HD (58 / 300 credits)
  • "4k" — Ultra HD (175 / 960 credits)

Default: "720p"


Example Request (Fast)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-text-to-video",
    "input": {
      "prompt": "Drone shot flying over a tropical beach at golden hour, waves crashing on shore, seagulls calling in the distance",
      "aspect_ratio": "16:9"
    }
  }'

Example Request (Quality, 1080p)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-quality-text-to-video",
    "input": {
      "prompt": "Cinematic shot of a chef plating a gourmet dish in a professional kitchen, sizzling sounds, ambient restaurant chatter",
      "aspect_ratio": "16:9",
      "resolution": "1080p"
    }
  }'

Image to Video

Animate images into videos with native audio. Veo 3.1 supports first-frame and first+last frame transitions.

Model Variants

Model IDQualityUse Case
veo-3-1-image-to-videoFastQuick iterations, testing
veo-3-1-quality-image-to-videoQualityFinal production

Parameters


model · string · required

The model ID for image-to-video generation.

Supported values:

  • "veo-3-1-image-to-video" — Fast mode (58 credits)
  • "veo-3-1-quality-image-to-video" — Quality mode (300 credits)

prompt · string · required

Text description of how to animate the image(s). Describe the motion and any audio elements.

Example: "The scene comes alive with gentle motion, leaves rustling in the wind"


image_urls · array<string> · required

Array of image URLs to animate.

Modes:

  • 1 image — Animates from the single frame
  • 2 images — Creates a transition video from first frame to last frame

Constraints:

  • Minimum: 1 image
  • Maximum: 2 images
  • Must be publicly accessible (http:// or https://)
  • Supported formats: .jpg, .jpeg, .png, .webp
  • Maximum file size: 10MB per image

Example (single frame): ["https://example.com/start.jpg"]

Example (first+last): ["https://example.com/start.jpg", "https://example.com/end.jpg"]


aspect_ratio · string · optional

Video aspect ratio. Should match your input image(s) for best results.

Supported values:

  • "16:9" — Landscape
  • "9:16" — Portrait
  • "Auto" — Auto-detect from image

Default: "16:9"


resolution · string · optional

Output video resolution. Same values and pricing as text-to-video.

Supported values: "720p" (default), "1080p", "4k"


Example Request (Single Frame)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-image-to-video",
    "input": {
      "prompt": "The scene comes alive with gentle motion, birds flying across the sky",
      "image_urls": ["https://example.com/landscape.jpg"],
      "aspect_ratio": "16:9"
    }
  }'

Example Request (First + Last Frame Transition)

curl -X POST https://vicsee.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3-1-quality-image-to-video",
    "input": {
      "prompt": "Smooth cinematic transition from day to night, time passing",
      "image_urls": [
        "https://example.com/scene-day.jpg",
        "https://example.com/scene-night.jpg"
      ],
      "aspect_ratio": "16:9"
    }
  }'

Tips for Native Audio

Veo 3.1 generates audio based on your prompt. Include audio cues for best results:

Good prompts include:

  • "waves crashing on shore"
  • "birds chirping in the forest"
  • "busy city street with traffic sounds"
  • "person speaking to the camera"
  • "rain pattering on a window"
  • "coffee shop with espresso machine sounds"

Example with rich audio:

{
  "model": "veo-3-1-text-to-video",
  "input": {
    "prompt": "A cozy coffee shop interior, barista steaming milk, espresso machine hissing, soft jazz playing in background, customers chatting quietly",
    "aspect_ratio": "16:9"
  }
}

Response

Success (200)

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "veo-3-1-text-to-video",
    "status": "pending",
    "creditsUsed": 58,
    "creditsRemaining": 440,
    "createdAt": "2026-01-17T12:00:00.000Z"
  }
}

Poll for completion using Tasks API.

Task Complete

{
  "success": true,
  "data": {
    "id": "task_abc123xyz",
    "model": "veo-3-1-text-to-video",
    "status": "completed",
    "output": {
      "url": "https://cdn.vicsee.com/outputs/video_xyz.mp4",
      "duration": 8,
      "format": "mp4",
      "hasAudio": true
    },
    "createdAt": "2026-01-17T12:00:00.000Z",
    "completedAt": "2026-01-17T12:03:00.000Z"
  }
}

  • Sora 2 — 10-15s videos with physics-accurate motion
  • Kling 2.6 — Videos with dialogue and lip-sync
  • Hailuo 2.3 — Image-to-video with motion control