PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

摘要/介绍/相关工作

Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling.（长程时间关系）

反例：Unifying frame rate and temporal dilations for improved remote pulse detection（SCI三区水论文）

the temporal difference transformers

提出了：global spatio-temporal attention based on the fine-grained temporal skin color differences

差异性：

subtle skin color changes
long-time monitoring task
a video sequence to signal sequence problem

we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting.

网络结构

TDC模块

埋个伏笔下次再讲差分卷积在计算机视觉中的应用 - 知乎 (zhihu.com)

class CDC_T(nn.Module):def __init__(self, in_channels, out_channels, kernel_size=3, stride=1,padding=1, dilation=1, groups=1, bias=False, theta=0.6):super(CDC_T, self).__init__()self.conv = nn.Conv3d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding,dilation=dilation, groups=groups, bias=bias)self.theta = thetadef forward(self, x):out_normal = self.conv(x)if math.fabs(self.theta - 0.0) < 1e-8:return out_normalelse:# pdb.set_trace()[C_out, C_in, t, kernel_size, kernel_size] = self.conv.weight.shape# only CD works on temporal kernel size>1if self.conv.weight.shape[2] > 1:kernel_diff = self.conv.weight[:, :, 0, :, :].sum(2).sum(2) + self.conv.weight[:, :, 2, :, :].sum(2).sum(2)kernel_diff = kernel_diff[:, :, None, None, None]out_diff = F.conv3d(input=x, weight=kernel_diff, bias=self.conv.bias, stride=self.conv.stride,padding=0, dilation=self.conv.dilation, groups=self.conv.groups)return out_normal - self.theta * out_diffelse:return out_normal

注意力模块

    def forward(self, x, gra_sharp):    # [B, 4*4*40, 128]"""x, q(query), k(key), v(value) : (B(batch_size), S(seq_len), D(dim))mask : (B(batch_size) x S(seq_len))* split D(dim) into (H(n_heads), W(width of head)) ; D = H * W"""# (B, S, D) -proj-> (B, S, D) -split-> (B, S, H, W) -trans-> (B, H, S, W)[B, P, C]=x.shapex = x.transpose(1, 2).view(B, C, P//16, 4, 4)      # [B, dim, 40, 4, 4]q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)q = q.flatten(2).transpose(1, 2)  # [B, 4*4*40, dim]k = k.flatten(2).transpose(1, 2)  # [B, 4*4*40, dim]v = v.flatten(2).transpose(1, 2)  # [B, 4*4*40, dim]q, k, v = (split_last(x, (self.n_heads, -1)).transpose(1, 2) for x in [q, k, v])# (B, H, S, W) @ (B, H, W, S) -> (B, H, S, S) -softmax-> (B, H, S, S)scores = q @ k.transpose(-2, -1) / gra_sharpscores = self.drop(F.softmax(scores, dim=-1))# (B, H, S, S) @ (B, H, S, W) -> (B, H, S, W) -trans-> (B, S, H, W)h = (scores @ v).transpose(1, 2).contiguous()# -merge-> (B, S, D)h = merge_last(h, 2)self.scores = scoresreturn h, scores

整体结构

    def forward(self, x, gra_sharp):b, c, t, fh, fw = x.shapex = self.Stem0(x)x = self.Stem1(x)x = self.Stem2(x)  # [B, 64, 160, 64, 64]x = self.patch_embedding(x)  # [B, 64, 40, 4, 4]x = x.flatten(2).transpose(1, 2)  # [B, 40*4*4, 64]Trans_features, Score1 =  self.transformer1(x, gra_sharp)  # [B, 4*4*40, 64]Trans_features2, Score2 =  self.transformer2(Trans_features, gra_sharp)  # [B, 4*4*40, 64]Trans_features3, Score3 =  self.transformer3(Trans_features2, gra_sharp)  # [B, 4*4*40, 64]#Trans_features3 = self.normLast(Trans_features3)# upsampling heads#features_last = Trans_features3.transpose(1, 2).view(b, self.dim, 40, 4, 4) # [B, 64, 40, 4, 4]features_last = Trans_features3.transpose(1, 2).view(b, self.dim, t//4, 4, 4) # [B, 64, 40, 4, 4]features_last = self.upsample(features_last)         # x [B, 64, 7*7, 80]features_last = self.upsample2(features_last)           # x [B, 32, 7*7, 160]features_last = torch.mean(features_last,3)     # x [B, 32, 160, 4]features_last = torch.mean(features_last,3)     # x [B, 32, 160]rPPG = self.ConvBlockLast(features_last)    # x [B, 1, 160]#pdb.set_trace()rPPG = rPPG.squeeze(1)return rPPG, Score1, Score2, Score3

Label Distribution Learning

新的loss计算方式

Curriculum Learning Guided Dynamic Loss

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer相关推荐

【论文笔记】Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video 前言 1. Backg ...
论文阅读 (64)：Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
文章目录 1 引入 1.1 题目 1.2 代码 1.3 摘要 1.4 Bib 2 RTFM 2.1 理论动机 2.2 多尺度时间特征学习 2.3 特征量级学习 2.4 RTFM帧级分类器 3 实验 3 ...
[学习笔记·翻译稿] Video Based Face Recognition by Using Discriminatively Learned Convex Models
机翻+手动调整仅供学习之用 PDF已上传至蓝奏云:https://wwi.lanzous.com/iAcIyl9vthc Video Based Face Recognition by Using ...
【阅读笔记】《TDN: Temporal Difference Networks for Efficient Action Recognition》阅读笔记
<TDN: Temporal Difference Networks for Efficient Action Recognition> 论文连接:https://arxiv.org/ab ...
ADPRL - 近似动态规划和强化学习 - Note 10 - 蒙特卡洛法和时序差分学习及其实例（Monte Carlo and Temporal Difference）
Note 10 蒙特卡洛法和时序差分学习 Monte Carlo and Temporal Difference 蒙特卡洛法和时序差分学习 Note 10 蒙特卡洛法和时序差分学习 Monte Car ...
每日一佳——Least-Squares Temporal Difference Learning（Justin A. Boyan，ICML，1999）
PDF 这篇Paper获得ICML1999年的Best Paper Award.好吧,看到题目我就傻眼了,讲的是啥?没办法,只能Duang一下了.(^_^) Least-Squares:最小二乘 Te ...
使用 Temporal Fusion Transformer 进行时间序列预测
转:Deephub Imba 目前来看表格类的数据的处理还是树型的结构占据了主导地位.但是在时间序列预测中,深度学习神经网络是有可能超越传统技术的. 为什么需要更加现代的时间序列模型? 专为单个时间序 ...
Temporal Fusion Transformer (TFT) 各模块功能和代码解析(pytorch)
Temporal Fusion Transformer (TFT) 各模块功能和代码解析(pytorch) 文章目录 Temporal Fusion Transformer (TFT) 各模块功能和代 ...
文献笔记：Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement viaSpatiotemporal Con
问题: 关于使用无监督学习:涉及到头部运动或视频是异构的,基于dl的方法可能比传统的手工方法更健壮.但是,基于dl的rPPG方法需要包括人脸视频和真实生理信号在内的大规模数据集.虽然大量获取人脸视频相 ...

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

摘要/介绍/相关工作

Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling.（长程时间关系）

the temporal difference transformers

we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting.

网络结构

TDC模块

注意力模块

整体结构

Label Distribution Learning

Curriculum Learning Guided Dynamic Loss

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer相关推荐

最新文章

热门文章