Grok Imagine Video
Generate cinematic videos with native audio using xAI's Grok Imagine through VicSee API. Text-to-video and image-to-video with three creative modes. 15-55 credits.
Try it now: Select Grok Imagine from the model picker in the Studio.
Pricing
| Duration | Resolution | Credits | Price (Pro Yearly) | Price (Pro Monthly) |
|---|---|---|---|---|
| 6s | 480p | 15 | $0.09 | $0.18 |
| 6s | 720p | 28 | $0.17 | $0.34 |
| 10s | 480p | 28 | $0.17 | $0.34 |
| 10s | 720p | 40 | $0.24 | $0.48 |
| 15s | 480p | 40 | $0.24 | $0.48 |
| 15s | 720p | 55 | $0.33 | $0.66 |
Credits are deducted only on successful generation.
Endpoint
POST https://vicsee.com/api/v1/generateSee Authentication for API key setup.
Text to Video
Generate videos from text prompts with native audio.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | grok-imagine-text-to-video |
| input.prompt | string | Yes | Description of the video (max 5000 chars) |
| input.aspect_ratio | string | No | "16:9", "9:16", "1:1", "3:2", "2:3" (default: "16:9") |
| input.duration | number | No | 6, 10, or 15 seconds (default: 6) |
| input.resolution | string | No | "480p" or "720p" (default: "720p") |
| input.mode | string | No | "fun", "normal", or "spicy" (default: "normal") |
Creative Modes
- fun — Playful, exaggerated motion with creative flair
- normal — Balanced, realistic motion (default)
- spicy — Intense, dramatic motion with bold creative choices (text-to-video only)
Example Request
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-text-to-video",
"input": {
"prompt": "A golden retriever runs through a sunlit meadow, wildflowers swaying in the breeze, cinematic slow motion",
"aspect_ratio": "16:9",
"duration": 10,
"resolution": "720p",
"mode": "normal"
}
}'Image to Video
Animate an image with optional motion description.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | grok-imagine-image-to-video |
| input.prompt | string | No | Motion description (max 5000 chars) |
| input.image_urls | string[] | Yes | Array with one image URL |
| input.aspect_ratio | string | No | "16:9", "9:16", "1:1", "3:2", "2:3" (default: "16:9") |
| input.duration | number | No | 6, 10, or 15 seconds (default: 6) |
| input.resolution | string | No | "480p" or "720p" (default: "720p") |
| input.mode | string | No | "fun" or "normal" (default: "normal") |
Note: "spicy" mode is not available for image-to-video. If provided, it will be automatically set to "normal".
Example Request
curl -X POST https://vicsee.com/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-imagine-image-to-video",
"input": {
"prompt": "Camera slowly zooms in as the character turns to face the viewer",
"image_urls": ["https://example.com/your-image.jpg"],
"aspect_ratio": "16:9",
"duration": 6,
"resolution": "720p"
}
}'Input Image Requirements
- Formats: JPEG, PNG, WebP
- Max size: 10MB
- One image only — the array must contain exactly one URL
Response
Success (200)
{
"success": true,
"data": {
"id": "task_abc123xyz",
"model": "grok-imagine-text-to-video",
"status": "pending",
"creditsUsed": 28,
"creditsRemaining": 972,
"createdAt": "2026-03-07T12:00:00Z"
}
}Poll for completion using Tasks API.
Task Complete
{
"taskId": "task_abc123xyz",
"status": "completed",
"output": {
"url": "https://cdn.vicsee.com/outputs/video_xyz.mp4",
"duration": 10,
"resolution": "720p",
"format": "mp4"
}
}Related Models
- Veo 3.1 - Cinematic videos with native audio
- Kling 3.0 - Multi-shot videos with flexible duration
- Seedance 1.5 Pro - Multilingual audio with reference images
Hailuo 2.3
Transform images into dynamic videos with Hailuo 2.3 through VicSee API. Superior motion quality, lifelike expressions, anime styles, and e-commerce content. 35-110 credits.
Grok Imagine Image
Generate images with xAI's Grok Imagine through VicSee API. Text-to-image generates 6 variations, image-to-image generates 2 variations. 10 credits per generation.