This article is part ofDemystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.
2018年，尼古拉斯笼的一个大粉丝向我们展示了什么戒指的奖学金would look like if Cage starred as Frodo, Aragorn, Gimly, and Legolas. The technology he used was deepfake, a type of application that useswww.yabovip4人工智能算法操纵视频。
Deepfakes are mostly known for their capability to swap the faces of actors from one video to another. They first appeared in 2018 and quickly rose to fame after they were used to modify adult videos to feature the faces of Hollywood actors and politicians.
In the past couple of years, deepfakes have caused much concern about the rise of a new wave ofAI-doctored videosthat can spread fake news and enable forgers and scammers.
The “deep” in deepfake comes from the use ofdeep learning, the branch of AI that has become very popular in the past decade. Deep learning algorithms roughly mimic the experience-based learning capabilities of humans and animals. If you train them on enough examples of a task, they will be able to replicate it under specific conditions.
The basic idea is to train a set ofartificial neural networks，深度学习算法的主要成分，在演员和目标面的多个例子上。通过足够的培训，神经网络将能够创造每个面部特征的数值表示。然后，您需要做的就是将神经网络重新覆盖以将演员的脸部映射到目标。
Deep learning algorithms come in different formats. Many people think deepfakes are created withgenerative adversarial networks (GAN), a deep learning algorithm that learns to generate realistic images from noise. And it is true, there are variations of GANs that can create deepfakes.
But the main type of neural network used in deefakes is the “autoencoder.” An autoencoder is a special type of deep learning algorithm that performs two tasks. First, it encodes an input image into a small set of numerical values. (In reality, it could be any other type of data, but since we’re talking about deepfakes, we’ll stick to images.) The encoding is done through a series of layers that start with many variables and gradually become smaller until they reach a “bottleneck” layer. The bottleneck layer contains the target number of variables.
During the training, the autoencoder is provided with a series of images. The goal of the training is to find a way to tune the parameters in the encoder and decoder layers so that the output image is as similar to the input image as possible.
You can think of an autoencoder as a super-smart compression-decompression algorithm. For instance, you can run an image into the encoding part of the neural network, and use the bottleneck representation for small storage or fast network transfer of data. When you want to view the image, you only need to run the encoded values through the decoding half and return it to its original state.
But there are other things that the autoencoder can do. For instance, you can use it for noise reduction or generating new images.
Deepfake applications use a special configuration of autoencoders. In fact, a deepfake generator uses two autoencoders, one trained on the face of the actor and another trained on the target.
Training the deepfake autoencoder
The concept of deepfake is very simple. But training it requires considerable effort. Say you want to create a deepfake version of Forrest Gump that stars John Travolta instead of Tom Hanks.
首先，您需要组装演员（John Travolta）和目标（Tom Hanks）AutoEncoders的培训数据集。这意味着收集每个人的数千次视频帧并裁剪它们只是展示面部。理想情况下，您必须包含来自不同角度和照明条件的图像，因此您的神经网络可以学习编码和转移面部和环境的不同细微差别。因此，您不能只拍摄每个人的一个视频并裁剪视频帧。您必须使用多个视频。有些工具可以自动化裁剪过程，但它们并不完美，仍然需要手动努力。
The need for large datasets is why most deepfake videos you see target celebrities. You can’t create a deepfake of your neighbor unless you have hours of videos of them in different settings.
收集数据集后，您必须培训神经网络。如果你知道如何代码机学习算法, you can create your own autoencoders. Alternatively, you can use a deepfake application such as Faceswap, which provides an intuitive user interface and shows the progress of the AI model as the training of the neural networks proceeds.
Depending on the type of hardware you use, the deepfake training and generation can take from several hours to several days. Once the process is over, you’ll have your deepfake video. Sometimes the result will not be optimal and even extending the training process won’t improve the quality. This can be due to bad training data or choosing the wrong configuration of your deep learning models. In this case, you’ll need to readjust the settings and restart the training from scratch.
Manipulated videos are nothing new. Movie studios have been using them in the cinema for decades. But previously, they required tremendous effort from experts and access to expensive studio gear. Although not trivial yet, deepfakes put video manipulation at the disposal of everyone. Basically, anyone who has a few hundred dollars to spare and the nerves to go through the process can create a deepfake from their own basement.
自然,deepfakes已经成为烦恼之源and are perceived as a threat to public trust. Government agencies, academic research labs, and social media companies are all engaged in efforts to build tools that can detect AI-doctored videos.
Facebookis looking intodeepfake detectionto prevent the spread of fake news on its social network. The Defense Advanced Research Projects Agency (DARPA), the research arm of the U.S. Department of Defense, has also启动了一项倡议停止Deepeakes和其他自动化缺点工具。而微软最近推出了一个deepfake detection tool领先于美国总统选举。
AI研究人员已经开发了各种工具来检测德国。例如，早期的Deepfakes包含视觉伪像，例如unblinking eyesand unnatural skin color variations. One tool flagged videos in which people didn’t blink or blinked at abnormal intervals.
But the fight against deepfakes has effectively turned into a cat-and-mouse chase. As deepfakes constantly get better, many of these tools lose their efficiency. As one computer vision professor去年告诉我：“我认为Deepfakes几乎就像一场军备竞赛。因为人们正在产生越来越令人信服的肚皮糕，有一天有一天可能变得不可能发现它们。“