Use Cases/Image & Video Gen

Diffusion models. Video synthesis.
Consistent latency at scale.

High VRAM requirements. Latency sensitivity for interactive use. Throughput demands for batch asset pipelines. Match your deployment mode to the right GPU.

Deploy a generation endpoint Read the docs

The Generation Use Case

High VRAM requirements

SDXL base needs 10GB+ at fp16 just for the UNet. Add ControlNet, LoRA adapters, and a VAE decode, and you're pushing 20GB. Flux.1 dev runs north of 24GB. Shared GPUs don't have room.

Interactive latency

Generative APIs serving users in real-time need sub-2s end-to-end. Shared GPU contention adds unpredictable variance. Dedicated endpoints give you consistent generation time.

Batch throughput

Asset pipelines — product images, synthetic datasets, creative workflows — care about images-per-hour at minimum cost. Spot pricing with async jobs is the right model here.

Interactive Generation

Private inference endpoint for real-time image generation.

An always-on dedicated endpoint keeps your diffusion model loaded. No cold start, no VRAM eviction between requests. Generation time is predictable because it's yours.

Works with SDXL, Flux.1, ControlNet pipelines, img2img, inpainting, and any HuggingFace Diffusers-compatible model. Bring your own LoRA adapters.

Stable Diffusion XL base + refiner

Two-stage pipeline. L40S handles it comfortably. ~3s generation at 1024×1024.

Flux.1 dev / schnell

24–32GB VRAM. L40S or A100. Schnell gives 4-step generation for interactive latency.

ControlNet + SDXL

Add ControlNet on top of SDXL base. Needs 20GB+. A100 is the right call here.

Custom LoRA adapters

Load your style or character LoRAs at startup. They stay resident — no per-request loading overhead.

Batch Generation

Spot pricing for large asset pipelines.

Generating thousands of product images, synthetic training data, or creative variants doesn't need low interactive latency. It needs throughput and low cost per image.

Async batch jobs on spot instances. Output images write directly to your storage bucket. Cost drops 40–60% vs on-demand.

Throughput (SDXL, L40S)~120 images/hr at 1024×1024

Throughput (SDXL, A100)~180 images/hr at 1024×1024

Spot discount40–60% vs on-demand

Output formatPNG or JPEG to your S3 bucket

Batch size1 to millions — no upper limit

Seed trackingPer-image seed logged for reproducibility

Video Synthesis

High VRAM. H100 and A100 recommended.

Video generation models (CogVideoX, Wan, AnimateDiff, Mochi) carry significantly higher VRAM requirements than image-only diffusion. Temporal attention alone pushes memory usage well above what a 24GB card can handle at useful resolutions.

For production video generation, treat H100 as the baseline. A100 80GB works for shorter clips or lower resolutions.

Video Model Reference

CogVideoX-5B~40GB

A100 80GB720p, 6s clips

Wan 2.1 (14B)~80GB

H100 SXM51080p video synthesis

AnimateDiff v3~18GB

L40S 48GBSD-based, 16-frame clips

Mochi 1 preview~60GB

A100 80GBHigh motion fidelity

VRAM estimates at default precision. fp8 or INT8 quantization can reduce requirements.

GPU Recommendations by Model Type

SDXL / SD 1.5

GPUL40S 48GB

Typical VRAM10–20GB

Best $/image ratio for standard diffusion pipelines. Handles ControlNet and LoRA stacks comfortably.

Flux.1 dev / schnell

GPUL40S 48GB or A100 80GB

Typical VRAM24–32GB

Flux.1-dev needs A100 for comfortable throughput. Schnell runs on L40S with acceptable latency.

Video models (general)

GPUA100 SXM4 80GB

Typical VRAM40–60GB

AnimateDiff, CogVideoX-2B, shorter Wan clips. A100 80GB is the minimum for useful video generation.

Wan 2.1 / high-res video

GPUH100 SXM5 80GB

Typical VRAM60–80GB

High-resolution video synthesis at usable generation speed. H100 NVLink bandwidth matters here.

Multi-LoRA or ensemble pipelines

GPUA100 80GB or 2× L40S

Typical VRAMVaries

Multiple LoRA adapters plus base model. A100 80GB gives the memory headroom.

Development and prototyping

GPURTX PRO 6000 48GB

Typical VRAMUp to 24GB

Community tier. Great for testing pipelines at low cost before scaling to production GPU.

Your pipeline.
Dedicated GPU. No jitter.

Interactive endpoint for real-time generation, or async batch jobs at spot pricing for large pipelines. Both available today.

Get Started Talk to Sales

Diffusion models. Video synthesis.Consistent latency at scale.

High VRAM requirements

Interactive latency

Batch throughput

Private inference endpoint for real-time image generation.

Stable Diffusion XL base + refiner

Flux.1 dev / schnell

ControlNet + SDXL

Custom LoRA adapters

Spot pricing for large asset pipelines.

High VRAM. H100 and A100 recommended.

SDXL / SD 1.5

Flux.1 dev / schnell

Video models (general)

Wan 2.1 / high-res video

Multi-LoRA or ensemble pipelines

Development and prototyping

Your pipeline.Dedicated GPU. No jitter.

Diffusion models. Video synthesis.
Consistent latency at scale.

Your pipeline.
Dedicated GPU. No jitter.