AI: Inventing the Future
Well, its officially the beginning of December and if you happen to be a university student, like myself, you’re probably reading this post through an endless stream of tears, holding two mitt-fulls of hair the you’ve pulled out, while desperately trying to keep track of 37 open web-browser tabs, as assignment due dates relentlessly tap on your window with fully extended claws. The stress is very real for students this time of year: final assignments and essays take up the time needed to work on group projects, which in turn eats into study time for final exams. And those readings you’re supposed to finish for those last couple classes? Good luck. Sometimes it feels as if it’s never going to end, like you’re stuck on the spin cycle, like every crest on the mountain hides more in the clouds. Imagine if you could watch a video looking to the version of yourself in the near future, leaning back, sipping some tea, triumphantly closing all 37 browser tabs one by one with the world’s most relieved smile on your face. Wouldn’t that be great? Wouldn’t that give you some hope that school mountain is somehow conquerable?
PhD student Carl Vondrick and MIT Professor Antonio Torralba at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), along with former CSAIL postdoc and current professor at the University of Maryland Baltimore County Hamed Pirsiavash, and their CSAIL team, are currently using their computer-science-magic abilities to create very brief videos that simulate the future of given still images. These predictive videos are based on a deep-learning algorithm the team at MIT is developing. In other words, researchers input a still image, the algorithm goes to work, and outputs a video that predicts the next 1-2 seconds of action in the scene. The algorithm was originally trained on 2 million videos representing about 2 years’ worth of footage.
(Simple right?)
MIT News reporters Adam Conner-Simons and Rachel Gordon report that the process works using two competing neural networks, one which “generates video, while the second discriminates between real and generated videos”. If the video can fool the discriminator network, then it could potentially also fool people. After running into issues with frame-by-frame predictions the researchers used techniques (that I most certainly do not possess the capacity to explain) to train the model to simultaneously generate the frames all at once, which increases the accuracy of the predictions while drastically increasing the complexity. The researchers also had to teach the model to generate backgrounds separate from foregrounds and to determine which objects in the images move and which remain stationary. The project explanation video shows generated scenes on a beach, at a train station, in a hospital, and of sporting events. Simons and Gordon report that “the algorithm generated videos that human subjects deemed to be realistic 20 percent more often than a baseline model”.
Here is the part where I offer my opinion, and more specifically some questions about this project; namely: Is this art that this algorithm is creates? Composition, lighting, movement, focus, depth of field, perception, these are all factors that professional directors and creators practice and train to master. These current generated videos are, by technical necessity, extremely short and rudimentary, but the fact that an algorithm can create videos that can fool a person into believing them to be real is pretty darn mind blowing. It is also, in my opinion, slightly concerning when it comes to the future if math and computing can even take our creativity away from us?
Or perhaps maybe I’m a bit loopy from all the homework. Maybe I can send MIT picture of my blank Word document and the algorithm can “predict” my next essay for me? No? Darn.
Stay safe and good luck with finals my friends.















