Cheap Framepack camera control loras with one training video.

Community Article Published June 1, 2025

First a disclaimer: This method is not intended to rival other existing methods. It's an idea that has been in my mind for a while, and I needed to get it out. I chose framepack to do it, but I would expect it work similarly with other I2V models.

This all started with two assumptions:

  1. You can create a camera lora for video models using computer generated graphics. This is of course true, because good enough cg is impossible to distinguish from real photage. So rather, can you make a camera lora using bad cg? Luckily, I have the perfect skill-set to try that one out!

  2. For I2V models, the model can filter out the style and use the motion of the training data alone.

I'm not one to dwell too much on theory, so I tried it out. I created a video, rotating around a character in Blender, and exported it to 48 frames. Then trained a Framepack lora.

This is the modest training video:

With the prompt: "The camera rotates 360 degrees around the subject.". Trained at 320x192 resolution. Note the janky rotation due to just interpolating between four points.

This is a result with the lora. Somewhat cherry-picked, but not overly so: <video controls autoplay src="

"> <video controls autoplay src="

"> So, something happens, at least. We got 270 degrees out of 360. I trained another one on what I call a "hero zoom", for lack of a better term. A low angle zoom that accelerates towards the face, used for dramatic effect.

<video controls autoplay src="

"> <video controls autoplay src="

">

<video controls autoplay src="

"> You can clearly see a two part zoom, slow at the start and then faster

I imagined this could be used as a sort of style transfer image + video to video, similar to how VACE does, with the exception of having to train a specific lora for each cut. Example:

Training video <video controls autoplay src="

">

Input image

image/png

Result

<video controls autoplay src="

">

Another result (I guess it was set up to fail with that background.)

<video controls autoplay src="

">

OK. We have trained a single video lora based on the semblance of a humanoid character. What if I would have opened Blender for the first time? Can we do something with the dreaded "Default Cube"?

Here are some training videos demonstrating certain camera actions. During the testing of the previous batch, I noticed that the lora could struggle if there was any background in the image. So, to give it some more reference, I added an environment texture.

<video controls autoplay src="

"> <video controls autoplay src="

">

Hmm, those paintings look familiar. So there is definitely style bleed, if it can get away with it.

<video controls autoplay src="

"> <video controls autoplay src="

">

The subject stays pretty still in those shots. Can we add something to the prompt to make him more lively? <video controls autoplay src="

">

OK, that works as well.

Limitations:

Flexibility is limited with this narrow lora, and it will most likely break down with complex prompts (or prompts that differ too much from what it's trained on).

Style will bleed in for previously unseen areas. At high strength or high steps training, the default cube would show up in the resulting video.

It doesn't seem to play well with other loras. Especially if they don't have much camera motions to begin with. A certain "cancel eachother out" effect can possibly be noticed.

Conclusion:

The method works! You can train a camera lora very cheaply, 30-60 minutes on a 3090. Even when it doesn't work exactly as intended, it helps prod the video model in the right direction.

It won't be able to do everything. Deviation from input, mixing with other loras, complex prompts may degrade the performance of the lora.

Training regime:

Single video, resolution ranging from 192 to 320 pixels. 40-48 frames. Rank 32 lora. Between 250 and 500 steps. I trained using https://github.com/a-r-r-o-w/finetrainers (v0.0.1) and https://github.com/neph1/finetrainers-ui

Each lora took between 30-60 minutes to train depending on the number of steps.

I've made some of the models available here: https://huggingface.co/neph1/framepack-camera-controls

This was written without much thought put into it. Let me know if you want clarifications, or if I missed something.

Community

Sign up or log in to comment