Skip to content

Commit d7ffe60

Browse files
Hunyuan Video Framepack (#11428)
* add transformer * add pipeline * fixes * make fix-copies * update * add flux mu shift * update example snippet * debug * cleanup * batch_size=1 optimization * add pipeline test * fix for model cpu offloading' * add last_image support; credits: lllyasviel/FramePack#167 * update example with flf2v * update penguin url * fix test * address review comment: #11428 (comment) * address review comment: #11428 (comment) * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_framepack.py --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
1 parent 10bee52 commit d7ffe60

File tree

12 files changed

+1871
-0
lines changed

12 files changed

+1871
-0
lines changed

docs/source/en/api/pipelines/hunyuan_video.md

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ The following models are available for the image-to-video pipeline:
5252
| [`Skywork/SkyReels-V1-Hunyuan-I2V`](https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V) | Skywork's custom finetune of HunyuanVideo (de-distilled). Performs best with `97x544x960` resolution. Performs best at `97x544x960` resolution, `guidance_scale=1.0`, `true_cfg_scale=6.0` and a negative prompt. |
5353
| [`hunyuanvideo-community/HunyuanVideo-I2V-33ch`](https://huggingface.co/hunyuanvideo-community/HunyuanVideo-I2V) | Tecent's official HunyuanVideo 33-channel I2V model. Performs best at resolutions of 480, 720, 960, 1280. A higher `shift` value when initializing the scheduler is recommended (good values are between 7 and 20). |
5454
| [`hunyuanvideo-community/HunyuanVideo-I2V`](https://huggingface.co/hunyuanvideo-community/HunyuanVideo-I2V) | Tecent's official HunyuanVideo 16-channel I2V model. Performs best at resolutions of 480, 720, 960, 1280. A higher `shift` value when initializing the scheduler is recommended (good values are between 7 and 20) |
55+
- [`lllyasviel/FramePackI2V_HY`](https://huggingface.co/lllyasviel/FramePackI2V_HY) | lllyasviel's paper introducing a new technique for long-context video generation called [Framepack](https://arxiv.org/abs/2504.12626). |
5556

5657
## Quantization
5758

src/diffusers/__init__.py

+4
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@
175175
"HunyuanDiT2DControlNetModel",
176176
"HunyuanDiT2DModel",
177177
"HunyuanDiT2DMultiControlNetModel",
178+
"HunyuanVideoFramepackTransformer3DModel",
178179
"HunyuanVideoTransformer3DModel",
179180
"I2VGenXLUNet",
180181
"Kandinsky3UNet",
@@ -376,6 +377,7 @@
376377
"HunyuanDiTPAGPipeline",
377378
"HunyuanDiTPipeline",
378379
"HunyuanSkyreelsImageToVideoPipeline",
380+
"HunyuanVideoFramepackPipeline",
379381
"HunyuanVideoImageToVideoPipeline",
380382
"HunyuanVideoPipeline",
381383
"I2VGenXLPipeline",
@@ -770,6 +772,7 @@
770772
HunyuanDiT2DControlNetModel,
771773
HunyuanDiT2DModel,
772774
HunyuanDiT2DMultiControlNetModel,
775+
HunyuanVideoFramepackTransformer3DModel,
773776
HunyuanVideoTransformer3DModel,
774777
I2VGenXLUNet,
775778
Kandinsky3UNet,
@@ -950,6 +953,7 @@
950953
HunyuanDiTPAGPipeline,
951954
HunyuanDiTPipeline,
952955
HunyuanSkyreelsImageToVideoPipeline,
956+
HunyuanVideoFramepackPipeline,
953957
HunyuanVideoImageToVideoPipeline,
954958
HunyuanVideoPipeline,
955959
I2VGenXLPipeline,

src/diffusers/models/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@
7979
_import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
8080
_import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"]
8181
_import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"]
82+
_import_structure["transformers.transformer_hunyuan_video_framepack"] = ["HunyuanVideoFramepackTransformer3DModel"]
8283
_import_structure["transformers.transformer_ltx"] = ["LTXVideoTransformer3DModel"]
8384
_import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"]
8485
_import_structure["transformers.transformer_mochi"] = ["MochiTransformer3DModel"]
@@ -156,6 +157,7 @@
156157
FluxTransformer2DModel,
157158
HiDreamImageTransformer2DModel,
158159
HunyuanDiT2DModel,
160+
HunyuanVideoFramepackTransformer3DModel,
159161
HunyuanVideoTransformer3DModel,
160162
LatteTransformer3DModel,
161163
LTXVideoTransformer3DModel,

src/diffusers/models/transformers/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from .transformer_flux import FluxTransformer2DModel
2424
from .transformer_hidream_image import HiDreamImageTransformer2DModel
2525
from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel
26+
from .transformer_hunyuan_video_framepack import HunyuanVideoFramepackTransformer3DModel
2627
from .transformer_ltx import LTXVideoTransformer3DModel
2728
from .transformer_lumina2 import Lumina2Transformer2DModel
2829
from .transformer_mochi import MochiTransformer3DModel

0 commit comments

Comments
 (0)