AI Gateway now supports video generation, so you can create cinematic videos with photorealistic quality, synchronized audio, generate personalized content with consistent identity, all through AI SDK 6.
Video generation is in beta and currently available for Pro and Enterprise plans and paid AI Gateway users.
Video models require more than just describing what you want. Unlike image generation, video prompts can include motion cues (camera movement, object actions, timing) and optionally audio direction. Each provider exposes different capabilities through
AI Gateway initially supports 4 types of video generation:
Across the model creators, their current capabilities across the models on AI Gateway are listed below:
Describe what you want, get a video. The model handles visuals, motion, and optionally audio. Great for hyperrealistic, production-quality footage with just a simple text prompt.
Example: Programmatic video at scale. Generate videos on demand for your app, platform, or content pipeline. No licencing fees or production required, just prompts and outputs.
This example uses
Example: Creative content generation. Turn a simple prompt into polished video clips for social media, ads, or storytelling with natural motion and cinematic quality.
By setting a very specific and descriptive prompt,
Provide a starting image and animate it. Control the initial composition, then let the model generate motion.
Example: Animate product images. Turn existing product photos into interactive videos.
The
Example: Animated illustrations. Bring static artwork to life with subtle motion. Perfect for thematic content or marketing at scale.
Example: Lifestyle and product photography. Add subtle motion to food, beverage, or lifestyle shots for social content.
Here, a picture of coffee is rendered for a more interactive video, with lighting direction and minute details.
Define the start and end states, and the model generates a seamless transition between them.
Example: Before/after reveals. Outfit swaps, product comparisons, changes over time. Upload two images, get a seamless transition.
The start and end states are defined here with two images that used in the prompt and provider options.
In this example,
Provide reference videos or images of a person/character, and the model extracts their appearance and voice to generate new scenes starring them with consistent identity.
In this example, 2 reference images of dogs are used to generate the final video.
Using
Transform existing videos with style transfer. Provide a video URL and describe the transformation you want. The model applies the new style while preserving the original motion.
Here,
For more examples and detailed configuration options for video models, check out the Video Generation Documentation. You can also find simple getting started scripts with the Video Generation Quick Start.
Check out the changelogs for these video models for more detailed examples and prompts.
Read more
Continue reading...
Two ways to get started
Video generation is in beta and currently available for Pro and Enterprise plans and paid AI Gateway users.
AI SDK 6: Generate videos programmatically with the same interface you use for text and images. One API, one authentication flow, one observability dashboard across your entire AI pipeline.
AI Gateway Playground: Experiment with video models with no code in the configurable AI Gateway playground that's embedded in each model page. Compare providers, tweak prompts, and download results without writing code. To access, click any video gen model in the model list.
Four initial video models; 17 variations
Grok Imagine from xAI is fast and great at instruction following. Create and edit videos with style transfer, all in seconds.
Wan from Alibaba specializes in reference-based generation and multi-shot storytelling, with the ability to preserve identity across scenes.
Kling excels at image to video and native audio. The new 3.0 models support multishot video with automatic scene transitions.
Veo from Google delivers high visual fidelity and physics realism. Native audio generation with cinematic lighting and physics.
Understanding video requests
Video models require more than just describing what you want. Unlike image generation, video prompts can include motion cues (camera movement, object actions, timing) and optionally audio direction. Each provider exposes different capabilities through
providerOptions that unlock fundamentally different generation modes. See the documentation for model-specific options.Generation types
AI Gateway initially supports 4 types of video generation:
Type | Inputs | Description | Example use cases |
Text-to-video | Text prompt | Describe a scene, get a video | Ad creative, explainer videos, social content |
Image-to-video | Image, text prompt optional | Animate a still image with motion | Product showcases, logo reveals, photo animation |
First and last frame | 2 images, text prompt optional | Define start and end states, model fills in between | Before/after reveals, time-lapse, transitions |
Reference-to-video | Images or videos | Extract a character from reference images or videos and place them in new scenes | Spokesperson content, consistent brand characters |
Across the model creators, their current capabilities across the models on AI Gateway are listed below:
Model Creator | Capabilities |
xAI | Text-to-video, image-to-video, video editing, audio |
Wan | Text-to-video, image-to-video, reference-to-video, audio |
Kling | Text-to-video, image-to-video, first and last frame, audio |
Veo | Text-to-video, image-to-video, audio |
Text-to-video
Describe what you want, get a video. The model handles visuals, motion, and optionally audio. Great for hyperrealistic, production-quality footage with just a simple text prompt.
Example: Programmatic video at scale. Generate videos on demand for your app, platform, or content pipeline. No licencing fees or production required, just prompts and outputs.
This example uses
klingai/kling-v2.6-t2v to generate video from a text prompt with a specified aspect ratio and duration.Example: Creative content generation. Turn a simple prompt into polished video clips for social media, ads, or storytelling with natural motion and cinematic quality.
By setting a very specific and descriptive prompt,
google/veo-3.1-generate-001 generates video with immense detail and the exact desired motion.Image-to-video
Provide a starting image and animate it. Control the initial composition, then let the model generate motion.
Example: Animate product images. Turn existing product photos into interactive videos.
The
klingai/kling-v2.6-i2v model animates a product image after you pass an image URL and motion description in the prompt.Example: Animated illustrations. Bring static artwork to life with subtle motion. Perfect for thematic content or marketing at scale.
Example: Lifestyle and product photography. Add subtle motion to food, beverage, or lifestyle shots for social content.
Here, a picture of coffee is rendered for a more interactive video, with lighting direction and minute details.
First and last frame
Define the start and end states, and the model generates a seamless transition between them.
Example: Before/after reveals. Outfit swaps, product comparisons, changes over time. Upload two images, get a seamless transition.
The start and end states are defined here with two images that used in the prompt and provider options.
In this example,
klingai/kling-v3.0-i2v lets you define the start frame in image and the end frame in lastFrameImage. The model generates the transition between them.Reference-to-video
Provide reference videos or images of a person/character, and the model extracts their appearance and voice to generate new scenes starring them with consistent identity.
In this example, 2 reference images of dogs are used to generate the final video.
Using
alibaba/wan-v2.6-r2v-flash here, you can instruct the model to utilize the people/characters within the prompt. Wan suggests using character1, character2, etc. in the prompt for multi-reference to video to get the best results.Video Editing
Transform existing videos with style transfer. Provide a video URL and describe the transformation you want. The model applies the new style while preserving the original motion.
Here,
xai/grok-imagine-video utilizes a source video from a previous generation to edit into a watercolor style.Get started
For more examples and detailed configuration options for video models, check out the Video Generation Documentation. You can also find simple getting started scripts with the Video Generation Quick Start.
Check out the changelogs for these video models for more detailed examples and prompts.
Read more
Continue reading...