Disclosure: I am in the Vidu artists program, pretty sure this is technically sponsored content. This is a fixed version of the original post.
Of course, it's another Fauxstalgia Fake-Em-Up from yours truly, this time demonstrating the beta-access potential of the new Vidu Q2 reference-to-video mode (get 100 free credits if you click here to try it out for free).
So, lets talk about how it was done (an overview)
GIFs are unedited wherever possible to show raw results.
The big problem with using AI for anything actually productive is inconsistency. While there's always wobble and bad gens, Vidu's reference-to-video option is the best solution I've seen thus far.
Essentially, you build a little profile with up to three reference pics and a basic prompt for your characters, props, etc. From then on out you can use the reference in your prompts to generate new scenes. It decreases confusion about what character is what, and it lets you establish styles of movement and other effects characters should always have going without having to use prompt-space in every scene.
As you can see with Tilly Tepesh's jacket and the ghost of her uncle Drac's translucency, and with the fact that I can have four different kinds of near-identical blokroid in the same scene without them morphing into each other or making weird hybrids.
Images can also be used with ref-to-video without making a full reference profile. For the various Drac-and-Tilly playing scenes the robot kept making filly play the back of the keyboard and spaced them to far apart, so I composited a demonstration image together to help guide the process.
Which helped a lot.
I used a variation on my typical asset-creation process on this (tutorials), utilizing a lot of early gens and ref-to-images in Vidu's system (tutorial) to bulk up the process, along with Midjourney, Sora, and Civit.
For the audio, I recorded the lines myself, then loaded them into Suno to modify my voice into something more announcer-like and bring the music in under it.
All the editing, compositing, logos, etc were done the old fashioned way.
Quality Improvements (also an overview)
Basics:
Up to 8 second gens.
4:3 and 3:4 aspect ratios (in addition to 1:1, 9:16 and 16:9)
General improvements to quality, coherence, and prompt understanding.
Extend clips multiple times up to 5 mins (launched after the video above was finished)
While I'll get into the details of the improvements to quality as well as a number of other new features (up to 8 second generations, clip extending, and sound (very early on that one)) in other posts, those who have followed TyrannoMax may remember the issue with boulders.
Well, after a lot of coaching and getting Max a new trailer, he's finally stopped screwing around on set.
In short, a lot of things that just weren't doable before now work, and the general quality of gens is higher and sharper than ever before.
What I was not expecting was for it to emulate Gerry Anderson-style puppetry well. Ironically, the kind of jank you get from pre-digital media is hard for AI to duplicate. If you want something to look polished, smooth, and modern, that's easy.
It did take a lot of prompting for non-moving faces/immobile dolls, and editing is always needed, but the differences between Q1 and Q2 is apparent when you compare the same prompt in both:
The deadly starmantis gingerly setting the Queen Seltza prop down is adorable, but not what the director called for.
All in all, it took me about a week to go from concept to video, mostly on the back of my needing to make base character, prop, and set assets for everything that wasn't TyrannoMax more or less from scratch. If you want to give it a try yourself, here's that link again, and I'm posting a bunch of tutorials, old and (hopefully) new this week.
And as always, like, comment and subscribe.








