论文阅读:Generating Talking Face Landmarks from Speech

文章目录

前言
正文
- 摘要
- 方法
- - 人脸landmarks对齐（Face Landmark Alignment）
  - 从landmarks中删除身份信息（Removing Identity Information from Landmarks）
  - LSTM 网络
程序分析
附录
- 1. dlib检测人脸
- 2. 对人脸的处理
- - Face Morphing
  - Delaunay Triangulation
  - Voronoi Diagram
  - 回到正题

前言

给岁月以文明，而不是给文明以岁月。

正文

摘要

这篇文章主要是用了LSTM网络，主要是变换视频帧到一个固定的位置，然后将整个landmarks转变为平均脸来删除身份信息。同时它输入的是log-mel频谱的一阶和二阶时间差作为输入来预测landmarks，计算的误差使用MSE loss和一阶和二阶时间差？

方法

作者使用了GRID数据库进行训练，使用720*576的分辨率视频， 25帧每秒提取帧，音频采样率为44.1kHz
使用40ms的汉宁窗计算音频64位的log-mel频谱，没有加overlap来匹配视频帧。然后计算 log-mel 谱的一阶和二阶时间差异，并将它们用作我们网络的输入（128 维特征序列，两个64自然是128）。

人脸landmarks对齐（Face Landmark Alignment）

将每个视频的第一帧中的两个外眼角简单的固定到图像坐标中的两个固定位置 (180, 200), (420, 200)
然后通过一个6 DOF 的仿射变换，然后用相同的变换变换所有视频帧中的landmarks，具体还需要看程序
这里假设头部不会在视频中显著移动，否则相同的变换无法对齐不同帧中的人脸

从landmarks中删除身份信息（Removing Identity Information from Landmarks）

对齐后不同说话人的人脸大小和大致位置相似，但是他们的形状和嘴部的位置仍然不同, 所以希望在训练网络之前从landmarks中删除身份信息。
具体是这样的：

平均整个训练集中的所有已经对齐的landmarks来计算平均人脸形状。
对于每个face landmarks，计算平均人脸形状与第一帧之间的变换
计算当前帧与第一帧之间的差异，然后把第2步得到的变换矩阵乘以差异？
加上平均结果，获得没有身份的人脸标志（没看懂）

LSTM 网络

我们来瞧瞧这个网络

这个有四层LSTM，对于输入提供了当前帧和前N帧对数谱的一阶和二阶时间差。输出是预测的当前帧（如果没有添加延迟）或前一帧（如果添加延迟）的面部标志的 x 和 y 坐标。我们引入的延迟量介于 1（40 毫秒）和 5 帧之间（200 毫秒），因为1s分为了25帧。误差是MSE函数。

程序分析

有部分代码用到的知识放在了附录中。

附录

1. dlib检测人脸

这里用一个小例子单独说明
可以看到这里是很简单的，就是图片检测人脸，然后检测人脸坐标，没了

import cv2
import matplotlib.pyplot as plt
import numpy as np
import dlibimage = cv2.imread('../0001.jpeg')
# 这里的路径是带人脸的图片detector = dlib.get_frontal_face_detector()predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')# gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)faces = detector(image) # 也可以使用参数1或者2放大
pos = []
for face in faces:cv2.rectangle(image, (face.left(), face.top()), (face.right(), face.bottom()), (122, 122, 123), 3)shape = predictor(image, face)  # 得到68个关键点坐标print(shape.parts())for pt in shape.parts():pt_position = (pt.x, pt.y)pos.append(pt_position)pos = pos[48:66]for position in pos:print(position)cv2.circle(image, position, 3, (123, 123, 0), -1)plt.imshow(image[:,:,::-1])
plt.axis('off')
plt.show()

2. 对人脸的处理

其中一段代码用到了这个技术，所以先介绍一下。
这里介绍一下face-morph，就是把一张照片变换为另一张照片。

背后的想法也很naive，就是通过混合两个图像来创建中间的图像，就像下面的公式：

当 α\alphaα 为0时是另一张图， 1的时候是另一张图的样子，对应的操作是像素级。当然这样做效果会不好，如上图。

出现这种问题的原因也很好理解，就是对应像素并不匹配，假如对于图像中的每个像素我们都能神奇的找到对应关系，然后就可以对每个像素用下面的公式：

xix_ixi 对应图 III 的像素点坐标, xjx_jxj 对应图 JJJ 的像素点坐标, xmx_mxm 对应要 morph的图的像素位置, 举个例子就是假如说都是眼睛, 在两张图上的位置不同, 可以通过调整参数确定眼睛的新位置.

然后我们确定 morph 的图片每个像素的强度, 也就是颜色吧.

当然这是比较复杂且不必要的, 应该有更好的方法去做这个。其实我们可以先确定几个点的位置然后其余像素做插值，然后我们看看怎么具体操作。
方法简述就是用三角剖分的形式，在图像之间按三角的范围变换。

Face Morphing

在这里我按作者的意思描述，源网址在这。
作者首先 dlib 检测了68个点，然后在人的右手边耳朵上加了1个点，脖子上加了1个点，左右肩膀上加了2个点，图片四周定位加了8个点，这总共是80个点了(当然越多点越好)。如下图：

Delaunay Triangulation

中文是德劳内三角化，是三角剖分的一种算法。那么什么是德劳内三角化呢？
在Delaunay三角剖分中，选择的三角形没有点在任何三角形的外接圆内。就像下图， C需要在ΔABD\Delta ABDΔABD 的外接圆外。
Delaunay三角剖分的一个有趣的特性是它不喜欢“瘦”三角形（即具有一个大角度的三角形）。

可以看到上图中的B和D点移动了位置，然后为了切分∠BCD∠BCD∠BCD，防止它太大，所以切分的三角形变化了。

最明显的（但不是最有效的）方法是从任何三角形开始，检查任何三角形的外接圆是否包含另一个点。如果有，翻转并继续，直到没有一个三角形外接圆包含点。说到德劳内三角剖分，就需要先了解Voronoi Diagram，也就是维诺图。

Voronoi Diagram

有些像每两个点之间的垂直平分线, 假如你连接在维诺图中相邻的点，就会得到三角剖分。如下图：
这里的相邻是指的互相接壤。

放代码，咬人！

使用 subdiv.getTriangleList 获取 Delaunay 三角形列表

#!/usr/bin/pythonimport cv2
import numpy as np
import random# Check if a point is inside a rectangle
def rect_contains(rect, point) :if point[0] < rect[0] :return Falseelif point[1] < rect[1] :return Falseelif point[0] > rect[2] :return Falseelif point[1] > rect[3] :return Falsereturn True# Draw a point
def draw_point(img, p, color ) :cv2.circle( img, p, 2, color, cv2.FILLED, cv2.LINE_AA, 0 )# Draw delaunay triangles
def draw_delaunay(img, subdiv, delaunay_color ) :triangleList = subdiv.getTriangleList()size = img.shaper = (0, 0, size[1], size[0])for t in triangleList :print(t)pt1 = (int(t[0]), int(t[1]))pt2 = (int(t[2]), int(t[3]))pt3 = (int(t[4]), int(t[5]))if rect_contains(r, pt1) and rect_contains(r, pt2) and rect_contains(r, pt3) :cv2.line(img, pt1, pt2, delaunay_color, 1, cv2.LINE_AA, 0)cv2.line(img, pt2, pt3, delaunay_color, 1, cv2.LINE_AA, 0)cv2.line(img, pt3, pt1, delaunay_color, 1, cv2.LINE_AA, 0)# Draw voronoi diagram
def draw_voronoi(img, subdiv) :( facets, centers) = subdiv.getVoronoiFacetList([])for i in range(0,len(facets)) :ifacet_arr = []for f in facets[i] :ifacet_arr.append(f)ifacet = np.array(ifacet_arr, np.int)color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))cv2.fillConvexPoly(img, ifacet, color, cv2.LINE_AA, 0);ifacets = np.array([ifacet])cv2.polylines(img, ifacets, True, (0, 0, 0), 1, cv2.LINE_AA, 0)cv2.circle(img, (int(centers[i][0]), int(centers[i][1])), 3, (0, 0, 0), cv2.FILLED, cv2.LINE_AA, 0)if __name__ == '__main__':# Define window nameswin_delaunay = "Delaunay Triangulation"win_voronoi = "Voronoi Diagram"# Turn on animation while drawing trianglesanimate = True# Define colors for drawing.delaunay_color = (255,255,255)points_color = (0, 0, 255)# Read in the image.img = cv2.imread("ted.jpg")# Keep a copy aroundimg_orig = img.copy()# Rectangle to be used with Subdiv2Dsize = img.shapeprint(size)rect = (0, 0, size[1], size[0])# Create an instance of Subdiv2Dsubdiv = cv2.Subdiv2D(rect)# Create an array of points.points = []# Read in the points from a text filewith open("ted_points.txt") as file :for line in file :x, y = line.split()points.append((int(x), int(y)))# Insert points into subdivfor p in points :subdiv.insert(p)# Show animationif animate :img_copy = img_orig.copy()# Draw delaunay trianglesdraw_delaunay( img_copy, subdiv, (255, 255, 255) )cv2.imshow(win_delaunay, img_copy)cv2.waitKey(100)# Draw delaunay trianglesdraw_delaunay( img, subdiv, (255, 255, 255) )# Draw pointsfor p in points :draw_point(img, p, (0,0,255))# Allocate space for Voronoi Diagramimg_voronoi = np.zeros(img.shape, dtype = img.dtype)# Draw Voronoi diagramdraw_voronoi(img_voronoi,subdiv)# Show resultscv2.imshow(win_delaunay,img)cv2.imshow(win_voronoi,img_voronoi)cv2.waitKey(0)

回到正题

我们的目的是进行图片变换，那么现在有了三角区域对应，然后我们可以进行变换了。

在morph 图像中确定特征点的位置，也就是像下图的公式：

计算仿射变换

现在我们有图片1， 2 的80个点，还有要morph图片的80个点
使用opencv的getAffineTransform函数，计算第一张图到morph图的仿射变换，同理计算图片2和morph图片的仿射变换。 80个点对应149个三角形

Warp triangles (中文直译扭曲三角)

上一步我们获得了仿射变换矩阵，现在我们可以把图片1中对应三角的所有像素变换为morph的图像的，然后重复对所有的三角操作，获得morph的图片，同样的也对图片2进行操作。Opencv对应的函数是 warpAffine。但是warpAffine 接收的是图像而不是三角形，所以trick 是对每个三角创建一个bounding box ，使用warpAffine扭曲在bounding box内的所有像素，然后mask在bounding box外的所有像素。这个三角形的mask是用fillConvexPoly 创造的。确保使用warpAffine是使用blendMode BORDER_REFLECT_101，这能够比较好的隐藏接缝。

Alpha blend warped images
在上一步中，我们获得了图像1和图像2的扭曲版本。这两个图像可以使用公式进行alpha混合，这是最终的变形图像。

上代码！

#!/usr/bin/env pythonimport numpy as np
import cv2
import sys# Read points from text file
def readPoints(path):# Create an array of points.points = []# Read pointswith open(path) as file:for line in file:x, y = line.split()points.append((int(x), int(y)))return points# Apply affine transform calculated using srcTri and dstTri to src and
# output an image of size.
def applyAffineTransform(src, srcTri, dstTri, size):# Given a pair of triangles, find the affine transform.warpMat = cv2.getAffineTransform(np.float32(srcTri), np.float32(dstTri))# Apply the Affine Transform just found to the src imagedst = cv2.warpAffine(src, warpMat, (size[0], size[1]), None, flags=cv2.INTER_LINEAR,borderMode=cv2.BORDER_REFLECT_101)return dst# Warps and alpha blends triangular regions from img1 and img2 to img
def morphTriangle(img1, img2, img, t1, t2, t, alpha):# Find bounding rectangle for each triangler1 = cv2.boundingRect(np.float32([t1]))r2 = cv2.boundingRect(np.float32([t2]))r = cv2.boundingRect(np.float32([t]))# Offset points by left top corner of the respective rectanglest1Rect = []t2Rect = []tRect = []for i in range(0, 3):tRect.append(((t[i][0] - r[0]), (t[i][1] - r[1])))t1Rect.append(((t1[i][0] - r1[0]), (t1[i][1] - r1[1])))t2Rect.append(((t2[i][0] - r2[0]), (t2[i][1] - r2[1])))# Get mask by filling trianglemask = np.zeros((r[3], r[2], 3), dtype=np.float32)cv2.fillConvexPoly(mask, np.int32(tRect), (1.0, 1.0, 1.0), 16, 0);# Apply warpImage to small rectangular patchesimg1Rect = img1[r1[1]:r1[1] + r1[3], r1[0]:r1[0] + r1[2]]img2Rect = img2[r2[1]:r2[1] + r2[3], r2[0]:r2[0] + r2[2]]size = (r[2], r[3])warpImage1 = applyAffineTransform(img1Rect, t1Rect, tRect, size)warpImage2 = applyAffineTransform(img2Rect, t2Rect, tRect, size)# Alpha blend rectangular patchesimgRect = (1.0 - alpha) * warpImage1 + alpha * warpImage2# Copy triangular region of the rectangular patch to the output imageimg[r[1]:r[1] + r[3], r[0]:r[0] + r[2]] = img[r[1]:r[1] + r[3], r[0]:r[0] + r[2]] * (1 - mask) + imgRect * maskif __name__ == '__main__':filename1 = 'hillary.jpg'filename2 = 'ted.jpg'alpha = 0.5# Read imagesimg1 = cv2.imread(filename1)img2 = cv2.imread(filename2)# Convert Mat to float data typeimg1 = np.float32(img1)img2 = np.float32(img2)# Read array of corresponding pointspoints1 = readPoints('ted_points.txt')points2 = readPoints('hillary.txt')points = []# Compute weighted average point coordinatesfor i in range(0, len(points1)):x = (1 - alpha) * points1[i][0] + alpha * points2[i][0]y = (1 - alpha) * points1[i][1] + alpha * points2[i][1]points.append((x, y))# Allocate space for final outputimgMorph = np.zeros(img1.shape, dtype=img1.dtype)# Read triangles from tri.txtwith open("tri.txt") as file:for line in file:x, y, z = line.split()x = int(x)y = int(y)z = int(z)t1 = [points1[x], points1[y], points1[z]]t2 = [points2[x], points2[y], points2[z]]t = [points[x], points[y], points[z]]# Morph one triangle at a time.morphTriangle(img1, img2, imgMorph, t1, t2, t, alpha)# Display Resultcv2.imshow("Morphed Face", np.uint8(imgMorph))cv2.waitKey(0)