CS231n 02 Loss Functions and Optimization

Loss Functions and Optimization

Preview the Goal in this lecture

Define a loss function
Come up with a way of finding the paras that minimize the (1)
(optimization)

The Remain Problem from last lecture

How to choose the W para ?

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ss70CYIY-1604970144386)(https://s1.ax1x.com/2020/11/08/BTZxgK.png)]

Loss function

A loss function tells how good our current classifier is.

(xi,yi)i=1N{(x_i,y_i)}_{i=1}^N(xi,yi)i=1N

The XiX_iXi is image and the yiy_iyi is label (int)

The Total loss is defined as the func follows.

L=1N∑iLi(f(xi,W),yi)L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)L=N1i∑Li(f(xi,W),yi)
Which is the sum of every single test’s loss

Muticlass SVM loss

Given an example (xi,yi)(x_i,y_i)(xi,yi) where xix_ixi is the image and where yiy_iyi is the (int) label, using the shorthand for the score vec s=f(xi,W)s = f(x_i,W)s=f(xi,W)

The SVM loss has the form:

if the incorrect score is smaller than the right score (x margin), we set the loss to 0.
in this case the safe margin is set to one
Margin choice depends on our need

Then we loop the class

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-I2VQ33KJ-1604970144389)(https://s1.ax1x.com/2020/11/08/BTZLNR.png)]

What if we use

L=1N∑iLi(f(xi,W),yi)2L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)^2L=N1i∑Li(f(xi,W),yi)2

This is not a linear function and totally different, it’s may be useful sometimes depends on the way you care about the errors.

Example Code

def L_i_vectorized(x, y, W):scores = W.dot(x)margins = np.maximun(0, scores - scores[y] + margin)margins[y] = 0loss_i = np.sum(margins)return loss_i# pretty easy

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JbB4CUcY-1604970144390)(https://s1.ax1x.com/2020/11/08/BTZO41.png)]

It just change the gap bettween scores

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BSwLNCok-1604970144392)(https://s1.ax1x.com/2020/11/08/BTZzjO.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7330C6pB-1604970144394)(https://s1.ax1x.com/2020/11/08/BTe9De.png)]

often use L2 regularization just Euclid norm.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fEo4KOUq-1604970144394)(https://s1.ax1x.com/2020/11/08/BTepuD.png)]

In this case the L1 and L2 reg is equal, but we can tell that L1 prefers the w1w_1w1 for it contains more zero, while the L2 prefers the w2w_2w2 for the weight is evenly spreaded through the test case.

The Multiclass SVM loss just care about the gap bettween the right labels and the wrongs.

Softmax Classifier

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qF5qD5Hi-1604970144395)(https://s1.ax1x.com/2020/11/08/BTeiEd.png)]

We just want to make the true probability closer to 1 (closer the better, eq is the best), so the loss func can be chosed by using the -log on the PPP.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IPQTMmLZ-1604970144396)(https://s1.ax1x.com/2020/11/08/BTeCHH.png)]

If we want to get the zero loss, the score may goes to inf! But Computer don’t like that.

Debugging Way
outcomes might be logClogClogC

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SSQzpARG-1604970144397)(https://s1.ax1x.com/2020/11/08/BTek4I.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WibZdaqJ-1604970144398)(https://s1.ax1x.com/2020/11/08/BTeECt.png)]

Optimization

Random Search - The Naive but Simplest way

Really Slow !!!

Gradient Descent

We just get the Gradient of W and go down to the bottom (maybe local best?)

Code

# Vanilla Gradient Descentwhile True:weight_grad = evaluate_gradient(loss_fun, data, weights)weights += -step_size * weight_grad

Step size is called elearning rate which is important

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fbkOtSdW-1604970144403)(https://s1.ax1x.com/2020/11/08/BTeV8P.png)]

Since the N might be super large, we sample some sets called minibatch and use it to estimate the true gradient.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f7MlW5nk-1604970144404)(https://s1.ax1x.com/2020/11/08/BTeZgf.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IYgPftTP-1604970144405)(https://s1.ax1x.com/2020/11/08/BTenKS.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eqyviMgg-1604970144406)(https://s1.ax1x.com/2020/11/08/BTeuDg.png)]

Color Feature
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-imYM2LYI-1604970144408)(https://s1.ax1x.com/2020/11/08/BTeQEj.png)]

Gradient Extract the edge info
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vNHNimwU-1604970144409)(https://s1.ax1x.com/2020/11/08/BTelUs.png)]

NLP?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-okmeYIzg-1604970144410)(https://s1.ax1x.com/2020/11/08/BTeG80.png)]

clustering different image patches from images

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ghKuObqE-1604970144410)(https://s1.ax1x.com/2020/11/08/BTe15n.png)]

Differences

Extract the Feature at first and feed into the linear classificator
Convolutional Neutral Network would learn the feature automatically during the training process.