So it turns out that this system can interpret any 2d image sequence, not just live webcam feeds. I decided to test that by feeding it some animation reference clips. While it's not perfect, I'm quite impressed with how closely it tracks!
The clips are originally from Kevin Parry, which btw has an amazing library for animation reference: https://youtu.be/31VloqP-Fpo?si=LoK72RvGCZtz_OIb















