深度学习 3d人脸 重建

Snapchat was made popular by putting funny dog ears on people’s head, swapping faces and other tricks, that beyond funny, look impossible, even magical. I am in the digital visual effects industry so I am familiar with that magic.. and the desire to understand how it works behind the scene.

通过将滑稽的狗耳朵放在人们的头上,交换面Kong和其他技巧来使S napchat变得流行,这超出了滑稽,看起来不可能,甚至是魔术。 我属于数字视觉效果行业,所以我熟悉这种魔术。并且渴望了解它在幕后的工作原理。

魔术的背后 (Behind the magic)

Modifying people faces is routine work in Hollywood visual effects, it’s a well understood craft nowadays, but it typically requires tens of digital artists to achieve a photorealistic face transformation. How can we automate that?

修改人脸是好莱坞视觉效果中的常规工作,如今已广为人知,但是通常需要数十位数字艺术家才能实现逼真的人脸转换。 我们该如何自动化呢?

Here’s a simplified breakdown of the steps these artists follow:

这是这些艺术家遵循的步骤的简化分类:

  1. Tracking the position, shape and movement of the face relative to the camera in 3D

    以3D方式跟踪面部相对于相机的位置,形状和运动

  2. Animation of the 3D models to snap on the tracked face (e.g. a dog nose)

    3D模型的动画可捕捉到跟踪的面部(例如,鼻子)

  3. Lighting and rendering of the 3D models into 2D images

    将3D模型照明并渲染为2D图像

  4. Compositing of the rendered CGI images with the live action footage

    将渲染的CGI图像与实景镜头合成

Automation of steps 2 and 3 is not very different from what happens in video games, it’s relatively straightforward. Compositing can be simplified to 3D foreground over live background, easy. The challenge is the tracking, how can a program ‘see’ the complex motion of a human head?

第2步和第3步的自动化与视频游戏中的自动化并没有太大不同,它相对简单。 合成可以简化为实时背景下的3D前景,非常简单。 挑战在于跟踪,程序如何“看到”人头的复杂运动?

用人工智能追踪人脸 (Tracking faces with Artificial Intelligence)

The Computer Science community has been trying to track faces automatically for a long time and it’s hard. In the recent years, Machine Learning came to the rescue and many Deep Learning papers are published every year on the topic. I’ve spent a while looking for the “state of the art” and realised doing this in real-time is VERY HARD! A good reason to try and tackle the challenge (and that would work nicely with the AR beauty mode I have implemented).

很长时间以来,计算机科学界一直在尝试自动跟踪人脸,这很难。 近年来,机器学习得到了广泛的应用,每年都有很多关于该主题的深度学习论文发表。 我花了一段时间寻找“最先进的技术”,并意识到实时地做到这一点非常困难! 尝试解决挑战的一个很好的理由(这与我实现的AR美容模式很好搭配)。

“trying to track faces.. it’s hard.. doing this in real-time is VERY HARD!”

“试图跟踪人脸……很难。实时做到这一点非常困难!”

Here’s how I did it.

这是我的方法。

设计网络 (Designing the network)

Convolutional Neural Networks are popular for visual analysis of images and commonly used for applications such as object detection and image recognition.

卷积神经网络广泛用于图像的视觉分析,并且通常用于诸如对象检测和图像识别等应用。

this publication⁹)本出版物publication)

For a deep neural network to be evaluated in real-time (at least 30 times per second), a compact network is desired¹. With the popularity of Machine Learning and smart phones, new models are discovered every year that push the limit of efficiency — offering a trade-off between computational precision and overhead. Among such models, MobileNet, SqueezeNet and ShuffleNet are popular for applications on mobile devices, thanks to their compactness.

对于要实时(每秒至少30次)进行评估的深度神经网络,需要一个紧凑的网络¹。 随着机器学习和智能手机的普及,每年都会发现新的模型,这些模型推动了效率的极限—在计算精度开销之间进行权衡。 在这些模型中,由于其紧凑性,MobileNet,SqueezeNet和ShuffleNet在移动设备上很受欢迎。

Architecture of ShuffleNet V2 for different levels of complexity (from the authors¹)
ShuffleNet V2的体系结构,可实现不同程度的复杂性(作者作者¹)

ShuttleNet V2¹ was recently introduced and offers state of the art performances, coming in various sizes to balance between speed and accuracy. It ships with PyTorch, one more reason to pick that model.

ShuttleNet V2¹是最近推出的,可提供最先进的性能,具有各种尺寸,可以在速度和精度之间取得平衡。 它与PyTorch一起提供,这是选择该模型的另一个原因。

选择要学习的功能 (Choosing the features to learn)

this paper²)本文²)

Now I need to find what features the CNN should learn. A common approach is defining a list of anchor points for different key parts of the face, also called ‘facial landmarks’.

现在,我需要找到CNN应该学习的功能。 一种常见的方法是为面部的不同关键部位定义锚定点列表,也称为“面部标志”。

The points are numbered and associated strategically around the eyes, eyebrows, nose, mouth and jawline. I want to train the network to identify the coordinate of each point, so I can later reconstruct masks or geometric meshes based on them.

这些点已编号,并且在眼睛,眉毛,鼻子,嘴巴和下巴周围有策略地关联。 我想训练网络以识别每个点的坐标,以便以后可以基于它们重建蒙版或几何网格。

建立训练数据集 (Building a training dataset)

Because I want to augment videos with 3D effects, I looked for a dataset with 3D landmark coordinates. 300W-LP is one of the few dataset that comes with 3D positions, it’s pretty large and as a bonus offers a good diversity of face angles. I want to benchmark my solution against the state of the art, recent publications test their models on AFLW2000–3D so I go for 300W-LP for training and test on AFLW2000–3D for comparison.

因为我想用3D效果来增强视频,所以我寻找了具有3D地标坐标的数据集。 300W-LP是少数具有3D位置的数据集之一,它非常大,并且额外提供了很好的面部角度多样性。 我想以最新技术为基准对我的解决方案进行基准测试,最近的出版物在AFLW2000-3D上测试了他们的模型,因此我选择300W-LP进行培训,并在AFLW2000-3D上进行测试以进行比较。

300W-LP³, profile views are generated mathematically300W- LP³的图像,轮廓视图是通过数学方式生成的

A note on these datasets, they are meant for the research community and generally not free for commercial use.

关于这些数据集的注释,它们仅供研究人员使用,通常不免费用于商业用途。

扩充数据集 (Augmenting the dataset)

Dataset augmentation improves the accuracy of the training by adding even more variations to the set that it already has. I apply the following transformations to each image and landmark, to create new ones, by a random amount: rotation up to -/+ 40° around the centre, up to 10% translation and scale, and horizontal flip. I apply a different random transformation in memory on each image and for each learning pass (epoch) for additional augmentation.

数据集扩充通过向其已有的集合中添加更多变体来提高训练的准确性。 我对每个图像和地标应用以下变换,以随机的数量创建新的变换:围绕中心旋转最多-/ + 40°,最大平移和缩放10%,以及水平翻转。 我对每个图像和每个学习通道(时期)在内存中应用了不同的随机变换,以进行其他增强。

It’s also necessary to crop the input image close to the bounding box of the landmarks for the CNN to recognise the landmarks at their relative locations. That’s done as a preprocess to save on load time from disk during training.

还必须将输入图像裁剪到地标边界框附近,以使CNN能够识别其相对位置处的地标。 这样做是为了节省培训期间磁盘加载时间的预处理。

设计损失函数 (Designing the loss function)

the publication⁴)出版物⁴中的图像)

Typically an L2 loss function is used to measure the prediction error for landmark positions. A recent publication⁴ describes a so-called Wing loss function, that performs better for this application, which I could verify. I parametrise it with w=10 and ε = 2 as suggested by the author and sum the result over all landmark coordinates.

通常,L2损失函数用于测量界标位置的预测误差。 最近的出版物⁴描述了一种所谓的Wing损失函数,该函数对该应用程序的性能更好,我可以验证一下。 根据作者的建议,我用w = 10和ε= 2对其参数化,并对所有界标坐标上的结果求和。

训练网络 (Training the network)

Training a deep neural network is a very expensive operation that requires powerful computers. Using my laptop would have taken weeks, literally, for one training phase and building a decent setup costs thousands of dollars. I decided to leverage the cloud so I can pay just for the compute power I need.

训练深度神经网络是一项非常昂贵的操作,需要功能强大的计算机。 实际上,使用我的笔记本电脑要花几个星期才能完成一个培训阶段,而建立一个像样的安装程序则要花费数千美元。 我决定利用云,以便我可以仅为所需的计算能力付费。

I chose Genesis Cloud, that offers very competitive prices and $50 free credit to get started. I build a Linux VM with a GeForce GTX 1080 Ti, prepare an OS and storage image where I setup PyTorch and upload my code and the datasets, all through ssh. Once the system is setup, it can be started and shut down on demand, creating a snapshot allows to resume the work where I left it.

我选择了Genesis Cloud ,它提供了极具竞争力的价格和$ 50的免费赠金,可以开始使用。 我使用GeForce GTX 1080 Ti构建了Linux VM,准备了操作系统和存储映像,并在其中设置PyTorch并通过ssh上传了我的代码和数据集。 设置好系统后,就可以按需启动和关闭它了,创建快照可以在我离开系统的地方恢复工作。

Plot of mean error for each epoch
每个时期的平均误差图

The inner training loop processes mini-batches of 32 images to maximise the parallel computation on GPU. A learning pass (epoch) process the entire set of about 60,000 images and takes about 4 minutes. The training converges around 70 epochs so I let it run overnight for 100 epochs to be safe.

内部训练循环处理32幅图像的小批量,以最大程度地利用GPU进行并行计算。 学习通行证(时代)处理大约60,000张图像的整个过程,大约需要4分钟。 培训大约收敛了70个纪元,因此为了安全起见,我让它连续运行100个纪元。

I use the popular Adam optimiser that automatically adapts the learning rate, starting with a rate of 0.001. I found that setting the initial learning rate right is critical, if it’s too small the training converges too early in a sub-optimal solution. If it’s too large it has difficulties converging at all. I found the value through trial and error, which is time consuming.. and actually costly when paying the cloud per use!

我使用流行的亚当优化器,该器会自动调整学习率,从0.001开始。 我发现正确设置初始学习率至关重要,如果它太小,则训练在次优解决方案中收敛得太早。 如果太大,将很难收敛。 我通过反复试验发现了价值,这很费时..而且每次使用云支付时实际上很昂贵!

评价 (Evaluation)

All these efforts paid off, with the bigger network ShuffleNet V2 2x, I obtain a Normalised Mean Error (NME) of 2.796 on AFLW2000–3D. That’s better than the state of the art model⁵ on that dataset and its NME of 3.07, by a good margin, despite that model being much heavier!

深度学习 3d人脸 重建_深度学习实时3D人脸跟踪相关推荐

  1. 深度学习中交叉熵_深度计算机视觉,用于检测高熵合金中的钽和铌碎片

    深度学习中交叉熵 计算机视觉 (Computer Vision) Deep Computer Vision is capable of doing object detection and image ...

  2. 重拾强化学习的核心概念_强化学习的核心概念

    重拾强化学习的核心概念 By Hannah Peterson and George Williams (gwilliams@gsitechnology.com) 汉娜·彼得森 ( Hannah Pet ...

  3. 深度学习之对象检测_深度学习时代您应该阅读的12篇文章,以了解对象检测

    深度学习之对象检测 前言 (Foreword) As the second article in the "Papers You Should Read" series, we a ...

  4. 深度学习背后的数学_深度学习背后的简单数学

    深度学习背后的数学 Deep learning is one of the most important pillars in machine learning models. It is based ...

  5. 深度学习领域专业词汇_深度学习时代的人文领域专业知识

    深度学习领域专业词汇 It's a bit of an understatement to say that Deep Learning has recently become a hot topic ...

  6. 微软三维人脸重建论文总结——《Accurate 3D Face Reconstruction with Weakly-Supervised Learning》

    原作:https://arxiv.org/abs/1903.08527 研究机构:微软研究院 写在前面 想象一下,通过二维的人脸图片,生成高度还原的三维人脸模型.真是异想天开,可是人工智能要做的不就是 ...

  7. 半学期学计算机有感论文,【计算机学习心得论文】_计算机学习心得论文参考资料-毕业论文范文网...

    英语学习的一点心得 英语学习的一点心得英语学习的一点心得,一提到学习英语,很多同学就觉得是个头疼的问题.更有同学说,我天生没有英语细胞.我觉得,英语成绩上不去,还是跟自己的学习态度和方法有很大关系.英 ...

  8. java人脸识别_自从加了PC人脸识别登录功能,网站立马显得高大上

    之前不是做了个开源项目嘛,在做完GitHub登录后,想着再显得有逼格一点,说要再加个人脸识别登录,就我这佛系的开发进度,过了一周总算是抽时间安排上了. 源码在文末 其实最近对写文章有点小抵触,写的东西 ...

  9. 深度学习机器臂控制_深度学习新进展:可自建任务解决模型的机器人问世

    机器学习已被验证为让计算机完成特定任务的有效策略,通过不断的试错和数据解析训练"教会"计算机学习任务的策略,通过自学习培育出精通这项任务的人工智能机器或者程序计算模型.来自哥大工程 ...

最新文章

  1. [转]cocos2d-js 3.0 屏幕适配方案 分辨率适应
  2. Py之utils:utils库的简介、安装、使用方法之详细攻略
  3. 任正非公开信:投入 20 亿美元全面提升华为软件质量
  4. oracle中persons,oracle 简单备注
  5. messagehub讲解_艾舜杰SAP Data Hub 数据服务总线技术深度讲解
  6. 常见的几种负载均衡技术
  7. java中eq、ne、gt、lt、ge、le分别代表含义
  8. python类和对象实验报告_python类和对象
  9. 影音先锋 android下载地址,影音先锋app官方普通下载-影音先锋 安卓版v5.8.2-PC6安卓网...
  10. SPSS数据分析常见问题(差异性研究)
  11. 欧洲机器人实验室盘点
  12. win10注册表开机自启
  13. 有舍有得,解散20个群后 ...
  14. java水面倒影效果,Java实现图片倒影代码详解
  15. mysql架构学习——数据库结构优化笔记
  16. OSS简单上传下载整理
  17. 微信小程序|使用小程序制作一个足球拼图小游戏
  18. 深度linux夜间模式,更新Deepin 15.9.3系统后无法自动调节色温的解决方法
  19. Luogu P4735(可持久化字典树)
  20. Virbox Protector AAB 加固-兼容 Google Play 上架

热门文章

  1. python读取文件夹下所有图片
  2. MemBrain2.0_论文
  3. linux挂载u盘在哪个文件夹,如何在Linux挂载U盘
  4. 汉诺塔问题——递归算法
  5. 梯度是什么,简单说梯度下降是什么
  6. c#遍历字典并删除元素
  7. 什么是堆栈?堆栈的操作方式有哪两种?
  8. python安装卸载及查看python版本/第三方包版本
  9. UTM的XY坐标转换为WGS84经纬度坐标
  10. 自动驾驶定位技术之争:融合定位才是出路