Can AI be Used to Create Hollywood Movies?
With large-scale global films raking in billions (Avengers Endgame made 2.796 billion USD) at the box office, anyone with the means would try the same. But with the costs of production for blockbuster films (Avengers Endgame budget 356 million USD), only the most profitable of companies can afford to produce such monumental successes. But what if more people with creative ideas had the means to produce Hollywood movies at a percentage of the cost? Through the power of Artificial Intelligence, and a mix of other technologies and techniques, we may just be on the verge of a technological revolution unlike any other to date.
How could AI influence the quality of a scene?
Production time on large-scale moves varies, and the level of detail varies due to scene priority, film priority, last-minute changes, and many other factors. Time is the main factor in the difference between stunning 4k digital images that look like filmed scenes (James Cameron’s Avatar) and things that don’t look all that great, like the fight scene in Marvel’s Black Panther where the protagonist and antagonist careen into a mine near the end of the film. An AI would be able to compile complex images much faster than any of our most talented artists.
But how would it work? And do we have the technology to make it happen?
This article attempts to answer those questions by discussing some of the technology we currently have and how it can be used to build a Hollywood AI.
We already use facial recognition on a daily basis. AI is already able to recognize faces quite accurately. A facial recognition system verifies a person’s image by analyzing the patterns present on a face. By taking these data points and comparing them to a set that includes how faces move in certain contexts, such as speech or emotional reaction, a computer can assign values and manipulate facial data to force an outcome. These outcomes can be used to make a digital mouth move to match certain data. The AI would use this to construct still frames, one by one, of a character’s face moving toward a certain goal, such as reciting a given string of speech. This can be made even more complex with the addition of the rest of a scene, reactions to environments, reactions to other characters, or circumstances. Finding the data to train such an AI is probably the easy part. There are plenty of source files for this kind of data that can be found in motion capture footage. Of course this is an over-simplified explanation but, in the scope of this post, there’s no devil in these details.
Artificial world generation and simulation
Another key point in the quest to find a movie building AI is artificial world generation and simulation. A movie built entirely by an AI would take place in a fully digital world. Although one could insert digital footage the computer could manipulate, that isn’t nearly as cool as a world built entirely free of human influence. We already have this type of technology and it used in video games. Procedurally generated worlds are nothing new and have been around in some form since 1991 with the release of Sid Meyer’s Civilization. Procedural technology has steadily advanced in the nearly two decades since and reached a perceptible peak in 2016's No Man’s Sky where extraterrestrial worlds, flora, and fauna are procedurally generated up to two hundred and fifty six separate galaxies. But what does this mean for films? When a script specifies something like “a futuristic dystopian world with early 20th century design queues” audiences would not get a recoloured version of New York City, but a procedurally generated world that draws influence from early 20th century architecture. This would be a welcome change to be sure. No longer would Resident Evil be filmed in Toronto, but in a procedurally generated Raccoon City. The addition of varying and unrecognizable locations can only add to audience immersion.
In addition to reacting to our environment, we are creating entirely new environments digitally. And the most interesting part is that the environments we create react to us in nearly the same way our natural one does. These environments, typically used in video games, model the real world in terms of textures, lighting, and physics. With a few more years or perhaps a decade or two's work, AI could be used to model the behaviours of individuals and animals in large groups to construct cities and large-scale behaviours that mirror our own. Nvidia, one of the leading producers of computer graphics technology, released their PhysX simulator to open-source last year. Because of decisions like this, anyone with the knowhow can contribute to the wealth of knowledge that we currently have and are expanding on.
Deepfakes are images and videos that use computer vision and similar technologies to overlap faces and audio to produce something new. Videos using these techniques have been floating around the internet form some time now. Aside from potential security risks that may occur as this tech develops, the results so far have proven to be both astonishing and amusing. There is already a precedent for using Deepfake techniques in film. Famously, Peter Cushing was resurrected to reprise the role of Tarkin in Star Wars Rogue One. And, at a glance, the performance was very convincing. Perfectly blending digital images with a living performer’s face is quite difficult. When a younger version of the Carrie Fisher was required to portray Leia Organa, also in Rogue One, artists blended images of the late actress onto the living Ingvild Delia. The techniques employed, although remarkable, are not perfect. Any person watching the scene would be able to point out that what they were seeing was in fact not Carrie Fisher. But what if the Deepfake was done not by an artist, but a machine? Would it be better? Probably. What the world saw in Rogue One was the work of motion capture artists, but what if motion capture was no longer needed?
Are another key point. Although not immediately apparent as a necessity, these kinds of algorithms would be needed in any scene involving large amounts of people or animals. People and animals behave in predictable ways in large groups in response to certain stimuli. For example, if a movie were to involve an alien attack or a horde of undead, the computer would need to be able to model the group’s behaviour to produce an accurate scene. In using data such as this, more massive scenarios could created and analyzed through the extrapolation of points and a film would sport natural human reactions to threats, leading to a much more convincing and terrifying scene.
Digital Characters From Celebrity Likenesses
One of the most exciting prospects of this potential technology is the added possibility of computer generated personalities based on those of celebrities of the past. There are many stars of stage and screen who have passed on but are remembered fondly for their exceptional talents. The ability to include these personas in future films may not be the product of magic or science fiction in the near future. In analyzing patters, machines could model the behaviour of characters and reproduce an accurate model. Although this may not be ethically kosher, there is already precedent for this type of technology in movies now. Of course this is not limited to resurrecting performers, but the same type of technology could be applied to contemporary actors/actresses to reimagine their characters as a different age or species. This would have the added benefit of performers not needing to wear uncomfortable prosthetics or undergo bodily changes to play a certain role.
Microsoft text to image
Although in its infancy, currently, Microsoft has developed a technology that generates images from text. Users can input a simple description and the program will generate an image based on it. Up close, it is not fantastically detailed. Although, from a distance, the image is surprisingly accurate. One can easily image how clear these types of images will become in the future. With a more advanced version of this, a user could input much more detailed data and receive a life-like image in return. Combine all of these images together and you have a segment of film.
Speech Translation - Movies Produced Naturally in Other Languages
Global releases such as Star Wars and the Marvel movies use teams of voice over actors to perform characters’ lines so speakers of languages other than English can watch the film as well. But dubbed and subbed films cannot carry the level of immersion that native language films can, that’s obvious. Humans are wired to see speech as well as hear it, its part of how we learn to speak as children. When we hear recorded speech, like in a film, we naturally expect what we see (the actors’ mouths) to reflect the audio. In dubbed films this isn’t the case. However, in combining technologies, we may find a solution to this problem. By taking elements of Deepfake algorithms and mixing them with services such as Microsoft’s Speech Translation, we may be able to create a program that automatically translates performer speech with the added bonus of having near-to-life vocal tract articulations upon which we may feast our ever critical eyes.
Modelling director styles
So much in a movie depends on the director. The director is the first line of defence we have between a terrible waste of time and an artistic masterpiece. Directors, like performers in films, are people too, and like all people they don’t last forever. And like actors, each director brings an entire dataset of advanced filming techniques, innovation, and practises that can be analyzed by machines to be faithfully reproduced. With techniques like this, the world could see the Napoleon film Stanley Kubrick never made.
AI Movies may be More Accurate Than “real-life” Movies
One of the most interesting factors in this topic is that of hyper-realism. An AI interacts with mathematics. It does not have human eyes, ears, or emotions. It deals in numbers and patterns. It is the product of these operations that are the most intriguing aspects of Artificial Intelligence. The movie, Interstellar, showed audiences the product of advanced computational algorithms working in tandem with visual effects to produce a stunning image of a black hole. The black hole in Interstellar was first shown to audiences in 2014. We didn’t know what they actually looked like until April of 2019. The image produced by a computer is strikingly similar to the actual photo. By feeding scenarios into a moving making AI, we may well just see with our own eyes what advanced science is rushing to discover.
What it Means for Education
The boons to education are unprecedented with this type of technology. Historical films are just that, historical. Although plenty of work has been done colourizing old photos, the process is very time consuming and expensive. Pictures do say 1000 words, but what if those pictures could speak. For instance, Lincoln’s Gettysburg Address was delivered before the advent of recording technology and has been lost to time. But events as historically significant as this could be digitally reconstructed using advanced technologies. We know what Lincoln looked like, how he spoke, and how he thought through testimonials given by his contemporaries. By feeding this information into a program, a new digital version of America’s 16th president could be constructed to educate youth on topics contemporary to one of the most volatile periods of American history. The Gettysburg Address is just one small example of what could be achieved.
This overview has discussed the potential of using AI to construct films. We can use Machine Learning and pattern recognition to completely revamp films, tv, and education. By combining aspects of technologies and techniques we currently employ, an entirely new process of filmmaking will erupt and change the media world forever. Deepfakes, speech recognition, language processing, and behavioural algorithms all play a part in the future of film and tv. The future is promising, but we need to work together to ensure it is handled responsibly.
If any of the previous points haven’t convinced you how important this technology is now and will be very soon, just bear in mind… It could be used to remake Game of Thrones season 8.
Thanks for reading!