Video Generation with AI Gateway

Jerilyn Zheng · Feb 19, 2026

AI Gateway now supports video generation, so you can create cinematic videos with photorealistic quality, synchronized audio, generate personalized content with consistent identity, all through AI SDK 6.

Two ways to get started

Video generation is in beta and currently available for Pro and Enterprise plans and paid AI Gateway users.

AI SDK 6: Generate videos programmatically with the same interface you use for text and images. One API, one authentication flow, one observability dashboard across your entire AI pipeline.

AI Gateway Playground: Experiment with video models with no code in the configurable AI Gateway playground that's embedded in each model page. Compare providers, tweak prompts, and download results without writing code. To access, click any video gen model in the model list.

Four initial video models; 17 variations

Grok Imagine from xAI is fast and great at instruction following. Create and edit videos with style transfer, all in seconds.
Wan from Alibaba specializes in reference-based generation and multi-shot storytelling, with the ability to preserve identity across scenes.
Kling excels at image to video and native audio. The new 3.0 models support multishot video with automatic scene transitions.
Veo from Google delivers high visual fidelity and physics realism. Native audio generation with cinematic lighting and physics.

Understanding video requests

Video models require more than just describing what you want. Unlike image generation, video prompts can include motion cues (camera movement, object actions, timing) and optionally audio direction. Each provider exposes different capabilities through providerOptions that unlock fundamentally different generation modes. See the documentation for model-specific options.

Generation types

AI Gateway initially supports 4 types of video generation:

Type	Inputs	Description	Example use cases
Text-to-video	Text prompt	Describe a scene, get a video	Ad creative, explainer videos, social content
Image-to-video	Image, text prompt optional	Animate a still image with motion	Product showcases, logo reveals, photo animation
First and last frame	2 images, text prompt optional	Define start and end states, model fills in between	Before/after reveals, time-lapse, transitions
Reference-to-video	Images or videos	Extract a character from reference images or videos and place them in new scenes	Spokesperson content, consistent brand characters

Across the model creators, their current capabilities across the models on AI Gateway are listed below:

Model Creator	Capabilities
xAI	Text-to-video, image-to-video, video editing, audio
Wan	Text-to-video, image-to-video, reference-to-video, audio
Kling	Text-to-video, image-to-video, first and last frame, audio
Veo	Text-to-video, image-to-video, audio

Text-to-video

Describe what you want, get a video. The model handles visuals, motion, and optionally audio. Great for hyperrealistic, production-quality footage with just a simple text prompt.

Example: Programmatic video at scale. Generate videos on demand for your app, platform, or content pipeline. No licencing fees or production required, just prompts and outputs.

This example uses klingai/kling-v2.6-t2v to generate video from a text prompt with a specified aspect ratio and duration.

Example: Creative content generation. Turn a simple prompt into polished video clips for social media, ads, or storytelling with natural motion and cinematic quality.

By setting a very specific and descriptive prompt, google/veo-3.1-generate-001 generates video with immense detail and the exact desired motion.

Image-to-video

Provide a starting image and animate it. Control the initial composition, then let the model generate motion.

Example: Animate product images. Turn existing product photos into interactive videos.

The klingai/kling-v2.6-i2v model animates a product image after you pass an image URL and motion description in the prompt.

Example: Animated illustrations. Bring static artwork to life with subtle motion. Perfect for thematic content or marketing at scale.

Example: Lifestyle and product photography. Add subtle motion to food, beverage, or lifestyle shots for social content.

Here, a picture of coffee is rendered for a more interactive video, with lighting direction and minute details.

First and last frame

Define the start and end states, and the model generates a seamless transition between them.

Example: Before/after reveals. Outfit swaps, product comparisons, changes over time. Upload two images, get a seamless transition.

The start and end states are defined here with two images that used in the prompt and provider options.

In this example, klingai/kling-v3.0-i2v lets you define the start frame in image and the end frame in lastFrameImage. The model generates the transition between them.

Reference-to-video

Provide reference videos or images of a person/character, and the model extracts their appearance and voice to generate new scenes starring them with consistent identity.

In this example, 2 reference images of dogs are used to generate the final video.

Using alibaba/wan-v2.6-r2v-flash here, you can instruct the model to utilize the people/characters within the prompt. Wan suggests using character1, character2, etc. in the prompt for multi-reference to video to get the best results.

Video Editing

Transform existing videos with style transfer. Provide a video URL and describe the transformation you want. The model applies the new style while preserving the original motion.

Here, xai/grok-imagine-video utilizes a source video from a previous generation to edit into a watercolor style.

Get started

For more examples and detailed configuration options for video models, check out the Video Generation Documentation. You can also find simple getting started scripts with the Video Generation Quick Start.

Check out the changelogs for these video models for more detailed examples and prompts.

Video Generation with AI Gateway

Two ways to get started​

Four initial video models; 17 variations​

Understanding video requests​

Generation types​

Text-to-video​

Image-to-video​

First and last frame​

Reference-to-video​

Video Editing​

Get started​