LongLive: NVIDIA’s Secret Sauce for Generating Long Videos Without the Usual Slowdown

If you’ve ever tried generating a video longer than a few seconds with a diffusion model, you know the pain. It either crashes your GPU, takes forever, or produces flickering artifacts because the model forgets what it did earlier. That’s because video generation is essentially generating a sequence of frames while maintaining temporal consistency — and that’s computationally brutal on both memory and time.

But NVIDIA’s new repo, LongLive, takes a different approach. It leans hard on parallelism and quantization to keep things moving — even for really long videos. Think 128 frames or more, without the usual slowdown.

What It Does

LongLive is a framework for long video generation using diffusion transformers. The core idea is to break video generation into chunks that can be processed in parallel, then stitch them together coherently. It also uses model quantization (FP8) to reduce memory and compute overhead, letting you generate more frames on the same hardware.

The repo includes pretrained models and inference code, so you don’t need to train from scratch. You just feed it text prompts (or image prompts) and get a video out.

Why It’s Cool

Two things stand out:

Parallel chunk generation — Most video diffusion models are autoregressive: they generate frame by frame, which is slow and limits length. LongLive splits the video into overlapping segments that are predicted in parallel, then fused. This means you can generate 1 token at a time for a 128 frame video, then expand to all 128 tokens in parallel. The result is speed that scales with your GPU count, not with the video length.
FP8 quantization — They shove the model weights and activations into 8-bit floating point, cutting memory usage by nearly half. That means you can generate longer videos on a single GPU without hitting OOM errors. It’s not a gimmick — they show it works with negligible quality loss.

Bonus: it supports frame interpolation, so you can generate keyframes and let the model fill in between. That’s huge for consistency.

How to Try It

You can jump in from the GitHub repo:

git clone https://github.com/NVlabs/LongLive.git
cd LongLive
pip install -r requirements.txt

Then run inference with a config file. Example:

python scripts/inference.py configs/long_live_128.yaml \
  --prompt "A cat walking through a forest, cinematic lighting"

They provide pretrained checkpoints (download links in the repo). You’ll need a GPU with at least 16 GB of VRAM for 128 frames, but with quantization you might squeeze more.

Check the docs/ folder in the repo for detailed setup and options.

Final Thoughts

LongLive isn’t trying to be a magic bullet for video generation — it’s a pragmatic fix for a real bottleneck. If you’re building anything that needs long, temporally coherent video (think game cinematics, AI movie clips, or even training data for robotics), this repo removes the typical “can’t do long videos” barrier.

It’s also refreshing to see a research release that focuses on making existing models run faster rather than just adding complexity. The parallelism trick is clever, and the quantization is a nice bonus for people on consumer GPUs.

If you’re already playing with video diffusion models, give it a spin. Your GPU will thank you.

Follow us for more cool projects: @githubprojects

Repository: https://github.com/NVlabs/LongLive

Back to Projects

Last updated: May 25, 2026 at 05:40 PM

LongLive: NVIDIA’s Secret Sauce for Generating Long Videos Without the Usual Slowdown

What It Does

Why It’s Cool

How to Try It

Final Thoughts

Join our weekly newsletter

Love discovering amazing projects?