论文阅读： Channel Augmented Joint Learning for Visible-Infrared Recognition

code: https://gitee.com/mindspore/contrib/tree/master/papers/CAJ

动机：

现有的图像增广策略主要针对单模态的可见光图像，没有考虑可见光-红外图像匹配时的图像特性。

主要工作：

数据增广：通过随机交换颜色通道生成与颜色无关的图像，能和现有增广方法相结合，增强对颜色变化的鲁棒性。——模拟随机遮挡，丰富了图像的多样性。
针对跨模态度量学习，提出Channel-mixed learning strategy，利用平方差，同时处理类内和类间变化；进一步提出channel-augmented joint learning strategy：明确优化增广图像的输出。

图像增广

Random Channel Exchangeable Augmentation

该增广方法可以被理解为：均匀产生可见光图像的三个通道。这样鼓励模型去学习每个颜色通道与单通道可见光图像见的关系。
Channel-Level Random Erasing （CRE)

替换成为：从ImageNet中获取的 R, G and B channels 的均值。
另外，也采用grayscale trasformation(GA)，random horizontal flip (FP)

代码如下：

from __future__ import absolute_import
import random
import math在这里插入图片描述class ChannelAdap():""" Adaptive selects a channel or two channels.Args:probability: The probability that the Random Erasing operation will be performed.sl: Minimum proportion of erased area against input image.sh: Maximum proportion of erased area against input image.r1: Minimum aspect ratio of erased area.mean: Erasing value."""def __init__(self, probability=0.5):self.probability = probabilitydef __call__(self, img):# if random.uniform(0, 1) > self.probability:# return imgidx = random.randint(0, 3)if idx == 0:# random select R Channelimg[1, :, :] = img[0, :, :]img[2, :, :] = img[0, :, :]elif idx == 1:# random select B Channelimg[0, :, :] = img[1, :, :]img[2, :, :] = img[1, :, :]elif idx == 2:# random select G Channelimg[0, :, :] = img[2, :, :]img[1, :, :] = img[2, :, :]else:img = imgreturn imgclass ChannelAdapGray():""" Adaptive selects a channel or two channels.Args:probability: The probability that the Random Erasing operation will be performed.sl: Minimum proportion of erased area against input image.sh: Maximum proportion of erased area against input image.r1: Minimum aspect ratio of erased area.mean: Erasing value."""def __init__(self, probability=0.5):self.probability = probabilitydef __call__(self, img):# if random.uniform(0, 1) > self.probability:# return imgidx = random.randint(0, 3)if idx == 0:# random select R Channelimg[1, :, :] = img[0, :, :]img[2, :, :] = img[0, :, :]elif idx == 1:# random select B Channelimg[0, :, :] = img[1, :, :]img[2, :, :] = img[1, :, :]elif idx == 2:# random select G Channelimg[0, :, :] = img[2, :, :]img[1, :, :] = img[2, :, :]else:if random.uniform(0, 1) > self.probability:# return imgimg = imgelse:tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]img[0, :, :] = tmp_imgimg[1, :, :] = tmp_imgimg[2, :, :] = tmp_imgreturn imgclass ChannelRandomErasing():""" Randomly selects a rectangle region in an image and erases its pixels.'Random Erasing Data Augmentation' by Zhong et al.Args:probability: The probability that the Random Erasing operation will be performed.sl: Minimum proportion of erased area against input image.sh: Maximum proportion of erased area against input image.r1: Minimum aspect ratio of erased area.mean: Erasing value."""def __init__(self, probability=0.5, sl=0.02, sh=0.4, r1=0.3):self.probability = probabilityself.mean = [0.4914, 0.4822, 0.4465]self.sl = slself.sh = shself.r1 = r1def __call__(self, img):if random.uniform(0, 1) > self.probability:return imgfor _ in range(100):area = img.shape[1] * img.shape[2]target_area = random.uniform(self.sl, self.sh) * areaaspect_ratio = random.uniform(self.r1, 1/self.r1)h = int(round(math.sqrt(target_area * aspect_ratio)))w = int(round(math.sqrt(target_area / aspect_ratio)))if w < img.shape[2] and h < img.shape[1]:x1 = random.randint(0, img.shape[1] - h)y1 = random.randint(0, img.shape[2] - w)if img.shape[0] == 3:img[0, x1:x1+h, y1:y1+w] = self.mean[0]img[1, x1:x1+h, y1:y1+w] = self.mean[1]img[2, x1:x1+h, y1:y1+w] = self.mean[2]# TODO when will img.shape != 3else:img[0, x1:x1+h, y1:y1+w] = self.mean[0]return imgreturn imgclass ChannelExchange():""" Adaptive selects a channel or two channels.Args:probability: The probability that the Random Erasing operation will be performed.sl: Minimum proportion of erased area against input image.sh: Maximum proportion of erased area against input image.r1: Minimum aspect ratio of erased area.mean: Erasing value."""def __init__(self, gray=2):self.gray = graydef __call__(self, img):idx = random.randint(0, self.gray)if idx == 0:# random select R Channelimg[1, :, :] = img[0, :, :]img[2, :, :] = img[0, :, :]elif idx == 1:# random select B Channelimg[0, :, :] = img[1, :, :]img[2, :, :] = img[1, :, :]elif idx == 2:# random select G Channelimg[0, :, :] = img[2, :, :]img[1, :, :] = img[2, :, :]else:tmp_img = 0.2989 * img[0, :, :] + 0.5870 * img[1, :, :] + 0.1140 * img[2, :, :]img[0, :, :] = tmp_imgimg[1, :, :] = tmp_imgimg[2, :, :] = tmp_imgreturn img

跨模态度量学习

1. Enhanced Channel-Mixed Learning
构建一个包括不同模态的图像，不去考虑模态的差异进行直接优化它们的关系。优化身份损失和 weighted regularization triplet loss（加权规则化的triplet loss）。

值得注意的是：pj和pk可以来自统一模态，也可以来自不同模态。——这就是作者提出mixed的含义吧，就是从混合模态组成的batch里随机去选图像，从而不去考虑模态的差异，直接优化intra-和inter-modality learning.

这里的d是欧式距离：

加权策略通过自适应考虑每个Triplet的贡献，增加困难样本的贡献（具有较大/较小距离的正/负对），从而能够充分利用batch中的所有三元组。

Enhanced Squared Difference
常用的公式是L1L1L1，本文采用増广的平方差。

作者通过将函数曲线进行展示分析这样做的好处：

实验效果

**2. Channel-Augmented Joint Learning **
明确将通道増广图像看成一个辅助模态，这样一个batch中同时包含可见光RGB图像，通道増广图像，和红外图像。这样使得Batch增大，和之前一样，共享分类和度量学习模型。作者尝试采用不同的模型，但并未获得较好的结果。

实验

分析了各种増广方法的效果
分析了平方距离的性能
分析不同学习策略的性能
与其他方法的性能比较