
Content C + Style S = Generated image G

What are Deep ConvNet Learning?

More abstract features in deeper layer.

Cost Function

loss(G;C,S)=αlosscontent(S,G)+βlossstyle(C,G)loss⁡(G;C,S)=αlosscontent⁡(S,G)+βlossstyle⁡(C,G)\operatorname {loss} \left ( G; C, S \right ) = \alpha \operatorname {loss} _{content} \left ( S, G \right ) + \beta \operatorname {loss} _{style} \left ( C, G \right )

Content Cost Function

  • Say you use hidden layer lll to compute content cost.
  • User pre-trained ConvNet.
  • Let a[l](C)" role="presentation">a[l](C)a[l](C)a ^{[l] (C)} and a[l](G)a[l](G)a ^{[l] (G)} be the activation of layer lll on the images.
  • If a[l](C)" role="presentation">a[l](C)a[l](C)a ^{[l] (C)} and a[l](G)a[l](G)a ^{[l] (G)} are similar, both images have similar content.

losscontent(S,G)=12∥∥a[l](C)−a[l](G)∥∥2losscontent⁡(S,G)=12‖a[l](C)−a[l](G)‖2\operatorname {loss} _{content} \left ( S, G \right ) = \dfrac {1} {2} \left \lVert a ^{[l] (C)} - a ^{[l] (G)} \right \rVert ^{2}

Style Cost Function

  • Say you use hidden layer lll to compute style cost.
  • Define style as correlation between activation across different channels.
  • Style Matrix G[l]" role="presentation">G[l]G[l]G ^{[l]}:
    Let a[l]i,j,k=ai,j,k[l]=a ^{[l]} _{i,j, k} = activation at (i,j,k)(i,j,k)(i,j, k) . Let G[l]k,k′=∑i=1n[l]h∑j=1n[l]wa[l]i,j,ka[l]i,j,k′,1≤k,k′≤n[l]cGk,k′[l]=∑i=1nh[l]∑j=1nw[l]ai,j,k[l]ai,j,k′[l],1≤k,k′≤nc[l]G ^{[l]} _{k, k'} = \sum \limits_{i = 1} ^{n ^{[l]} _{h}} \sum \limits_{j = 1} ^{n ^{[l]} _{w}} a ^{[l]} _{i,j, k} a ^{[l]} _{i,j, k'}, 1 \le k, k' \le n ^{[l]} _{c}

loss[l]style(C,G)=∥∥G[l](S)−G[l](G)∥∥2Flossstyle[l]⁡(C,G)=‖G[l](S)−G[l](G)‖F2\operatorname {loss} _{style} ^{[l]} \left ( C, G \right ) = \left \lVert G ^{[l] (S)} - G ^{[l] (G)} \right \rVert ^{2} _{F}
lossstyle(C,G)=∑l=1Lβ[l]loss[l]style(C,G)lossstyle⁡(C,G)=∑l=1Lβ[l]lossstyle[l]⁡(C,G)\operatorname {loss} _{style} \left ( C, G \right ) = \sum \limits_{l = 1} ^{L} \beta ^{[l]} \operatorname {loss} _{style} ^{[l]} \left ( C, G \right )

