Global Average Pooling(GAP)

Network In Network中对GAP的描述:
In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.




import torch
import torch.nn.functional as Fdef find_vgg_layer(arch, target_layer_name):"""Find vgg layer to calculate GradCAM and GradCAM++Args:arch: default torchvision densenet modelstarget_layer_name (str): the name of layer with its hierarchical information. please refer to usages below.target_layer_name = 'features'target_layer_name = 'features_42'target_layer_name = 'classifier'target_layer_name = 'classifier_0'Return:target_layer: found layer. this layer will be hooked to get forward/backward pass information."""hierarchy = target_layer_name.split('_')if len(hierarchy) >= 1:target_layer = arch.featuresif len(hierarchy) == 2:target_layer = target_layer[int(hierarchy[1])]return target_layerclass GradCAM(object):"""Calculate GradCAM salinecy map.A simple example:# initialize a model, model_dict and gradcamresnet = torchvision.models.resnet101(pretrained=True)resnet.eval()model_dict = dict(model_type='resnet', arch=resnet, layer_name='layer4', input_size=(224, 224))gradcam = GradCAM(model_dict)# get an image and normalize with mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)img = load_img()normed_img = normalizer(img)# get a GradCAM saliency map on the class index 10.mask, logit = gradcam(normed_img, class_idx=10)# make heatmap from mask and synthesize saliency map using heatmap and imgheatmap, cam_result = visualize_cam(mask, img)Args:model_dict (dict): a dictionary that contains 'model_type', 'arch', layer_name', 'input_size'(optional) as keys.verbose (bool): whether to print output size of the saliency map givien 'layer_name' and 'input_size' in model_dict."""def __init__(self, model_dict, verbose=False):model_type = model_dict['type']layer_name = model_dict['layer_name']self.model_arch = model_dict['arch']self.gradients = dict()self.activations = dict()def backward_hook(module, grad_input, grad_output):self.gradients['value'] = grad_output[0]return Nonedef forward_hook(module, input, output):self.activations['value'] = outputreturn Noneif 'vgg' in model_type.lower():target_layer = find_vgg_layer(self.model_arch, layer_name)target_layer.register_forward_hook(forward_hook)target_layer.register_backward_hook(backward_hook)if verbose:try:input_size = model_dict['input_size']except KeyError:print("please specify size of input image in model_dict. e.g. {'input_size':(224, 224)}")passelse:device = 'cuda' if next(self.model_arch.parameters()).is_cuda else 'cpu'self.model_arch(torch.zeros(1, 3, *(input_size), device=device))print('saliency_map size :', self.activations['value'].shape[2:])def forward(self, input, class_idx=None, retain_graph=False):"""Args:input: input image with shape of (1, 3, H, W)class_idx (int): class index for calculating GradCAM.If not specified, the class index that makes the highest model prediction score will be used.Return:mask: saliency map of the same spatial dimension with inputlogit: model output"""b, c, h, w = input.size()logit = self.model_arch(input)print(logit.shape)if class_idx is None:score = logit[:, logit.max(1)[-1]].squeeze()  # get the max socreprint(score)else:score = logit[:, class_idx].squeeze()self.model_arch.zero_grad()score.backward(retain_graph=retain_graph)gradients = self.gradients['value']activations = self.activations['value']# print(gradients.shape, activations.shape)  # torch.Size([1, 512, 14, 14]) torch.Size([1, 512, 14, 14])b, k, u, v = gradients.size()alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])# alpha = F.relu(gradients.view(b, k, -1)).mean(2)weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])saliency_map = (weights * activations).sum(1, keepdim=True)saliency_map = F.relu(saliency_map)print('saliency_map', saliency_map.shape)saliency_map = F.upsample(saliency_map, size=(h, w), mode='bilinear', align_corners=False)saliency_map_min, saliency_map_max = saliency_map.min(), saliency_map.max()saliency_map = (saliency_map - saliency_map_min).div(saliency_map_max - saliency_map_min).datareturn saliency_map, logitdef __call__(self, input, class_idx=None, retain_graph=False):return self.forward(input, class_idx, retain_graph)

输入预训练模型,要提取的层(这里用vgg16最后一个MaxPool2d()前的relu(),即features_29),使用hook提取features_29层的激活值和梯度,这层的特征图、激活值和梯度大小为 1 × 512 × 14 × 14 1 \times 512 \times 14 \times 14 1×512×14×14。将每一个特征图用一个GAP获得神经元重要性权重 α k c \alpha_k^c αkc​,对应代码和公式:
α k c = 1 Z ∑ i ∑ j ∂ y c ∂ A i j k \alpha_k^c=\frac{1}{Z}\sum_i \sum_j{\frac{\partial y^c}{\partial A^k_{ij}}} αkc​=Z1​i∑​j∑​∂Aijk​∂yc​

alpha = gradients.view(b, k, -1).mean(2)  # torch.Size([1, 512])
weights = alpha.view(b, k, 1, 1)  # torch.Size([1, 512, 1, 1])

We perform a weighted combination of forward activation maps, and follow it by a ReLU to obtain:
L G r a d − C A M c = R e L U ( ∑ k α k c A k ) L_{Grad-CAM}^c=ReLU(\sum_k{\alpha_k^cA^k}) LGrad−CAMc​=ReLU(k∑​αkc​Ak)

saliency_map = (weights * activations).sum(1, keepdim=True)
saliency_map = F.relu(saliency_map)

关于上采样的可以看pytorch torch.nn 实现上采样——nn.Upsample



