Loss Functions and Optimization


Preview the Goal in this lecture

  1. Define a loss function
  2. Come up with a way of finding the paras that minimize the (1)
    (optimization)

The Remain Problem from last lecture

  • How to choose the W para ?

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ss70CYIY-1604970144386)(https://s1.ax1x.com/2020/11/08/BTZxgK.png)]

Loss function

A loss function tells how good our current classifier is.

(xi,yi)i=1N{(x_i,y_i)}_{i=1}^N(xi​,yi​)i=1N​

The XiX_iXi​ is image and the yiy_iyi​ is label (int)

The Total loss is defined as the func follows.

L=1N∑iLi(f(xi,W),yi)L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)L=N1​i∑​Li​(f(xi​,W),yi​)
Which is the sum of every single test’s loss


Muticlass SVM loss

Given an example (xi,yi)(x_i,y_i)(xi​,yi​) where xix_ixi​ is the image and where yiy_iyi​ is the (int) label, using the shorthand for the score vec s=f(xi,W)s = f(x_i,W)s=f(xi​,W)

The SVM loss has the form:

if the incorrect score is smaller than the right score (x margin), we set the loss to 0.
in this case the safe margin is set to one
Margin choice depends on our need

  • Then we loop the class

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-I2VQ33KJ-1604970144389)(https://s1.ax1x.com/2020/11/08/BTZLNR.png)]

  • What if we use

L=1N∑iLi(f(xi,W),yi)2L = \frac{1}{N}\sum\limits_iL_i(f(x_i,W),y_i)^2L=N1​i∑​Li​(f(xi​,W),yi​)2

This is not a linear function and totally different, it’s may be useful sometimes depends on the way you care about the errors.

Example Code

def L_i_vectorized(x, y, W):scores = W.dot(x)margins = np.maximun(0, scores - scores[y] + margin)margins[y] = 0loss_i = np.sum(margins)return loss_i# pretty easy

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JbB4CUcY-1604970144390)(https://s1.ax1x.com/2020/11/08/BTZO41.png)]

It just change the gap bettween scores

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BSwLNCok-1604970144392)(https://s1.ax1x.com/2020/11/08/BTZzjO.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7330C6pB-1604970144394)(https://s1.ax1x.com/2020/11/08/BTe9De.png)]

often use L2 regularization just Euclid norm.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fEo4KOUq-1604970144394)(https://s1.ax1x.com/2020/11/08/BTepuD.png)]

In this case the L1 and L2 reg is equal, but we can tell that L1 prefers the w1w_1w1​ for it contains more zero, while the L2 prefers the w2w_2w2​ for the weight is evenly spreaded through the test case.

The Multiclass SVM loss just care about the gap bettween the right labels and the wrongs.

Softmax Classifier

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qF5qD5Hi-1604970144395)(https://s1.ax1x.com/2020/11/08/BTeiEd.png)]

We just want to make the true probability closer to 1 (closer the better, eq is the best), so the loss func can be chosed by using the -log on the PPP.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IPQTMmLZ-1604970144396)(https://s1.ax1x.com/2020/11/08/BTeCHH.png)]

If we want to get the zero loss, the score may goes to inf! But Computer don’t like that.

  • Debugging Way
    outcomes might be logClogClogC

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SSQzpARG-1604970144397)(https://s1.ax1x.com/2020/11/08/BTek4I.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WibZdaqJ-1604970144398)(https://s1.ax1x.com/2020/11/08/BTeECt.png)]


Optimization

Random Search - The Naive but Simplest way

Really Slow !!!

Gradient Descent

We just get the Gradient of W and go down to the bottom (maybe local best?)

Code

# Vanilla Gradient Descentwhile True:weight_grad = evaluate_gradient(loss_fun, data, weights)weights += -step_size * weight_grad

Step size is called elearning rate which is important

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fbkOtSdW-1604970144403)(https://s1.ax1x.com/2020/11/08/BTeV8P.png)]

Since the N might be super large, we sample some sets called minibatch and use it to estimate the true gradient.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f7MlW5nk-1604970144404)(https://s1.ax1x.com/2020/11/08/BTeZgf.png)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IYgPftTP-1604970144405)(https://s1.ax1x.com/2020/11/08/BTenKS.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eqyviMgg-1604970144406)(https://s1.ax1x.com/2020/11/08/BTeuDg.png)]

Color Feature
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-imYM2LYI-1604970144408)(https://s1.ax1x.com/2020/11/08/BTeQEj.png)]

Gradient Extract the edge info
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vNHNimwU-1604970144409)(https://s1.ax1x.com/2020/11/08/BTelUs.png)]

NLP?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-okmeYIzg-1604970144410)(https://s1.ax1x.com/2020/11/08/BTeG80.png)]

clustering different image patches from images

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ghKuObqE-1604970144410)(https://s1.ax1x.com/2020/11/08/BTe15n.png)]

  • Differences
  1. Extract the Feature at first and feed into the linear classificator
  2. Convolutional Neutral Network would learn the feature automatically during the training process.

CS231n 02 Loss Functions and Optimization相关推荐

  1. Saturating VS Non-Saturating Loss functions in GANs

    最近再看The relativistic discriminator: a key element missing from standard GAN这篇文章,里面提到了Saturating和Non- ...

  2. PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

    摘要 在分类问题中,常使用Cross-entropy loss和Focal loss.通过泰勒展开来逼近函数,作者提出了一个简单的框架,称为PolyLoss,将损失函数设计为多项式函数的线性组合.Po ...

  3. 【论文笔记_目标检测_2022】POLYLOSS: A POLYNOMIAL EXPANSION PERSPECTIVE OF CLASSIFICATION LOSS FUNCTIONS

    多元损失:分类损失函数的多项式展开观点 摘要 交叉熵损失和焦点损失是为分类问题训练深度神经网络时最常见的选择.然而,一般来说,一个好的损失函数可以采取更加灵活的形式,并且应该针对不同的任务和数据集进行 ...

  4. CS 61A 2020 Fall Disc 02: Higher-Order Functions, Self Reference

    1.1 Write a function that takes in a function cond and a number n and prints numbers from 1 to n whe ...

  5. CS231n——指南向report1及assignment1 solution

    Lecture1 Course Introduction Lecture2 Image Classification 图像识别的困难 向量范数度量图片差异 验证集用于超参数调优及交叉验证 Assign ...

  6. CS231a I winter

    CS231a I winter Lecture 1 Lecture2 image classification pipeline Lecture3 Loss Functions and optimiz ...

  7. CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

    CV:翻译并解读2019<A Survey of the Recent Architectures of Deep Convolutional Neural Networks>第一章~第三 ...

  8. [转]CV codes代码分类整理合集

    转自:http://www.sigvc.org/bbs/thread-72-1-1.html https://netfiles.uiuc.edu/jbhuang1/www/resources/visi ...

  9. CV codes代码分类整理合集

    转载 http://www.sigvc.org/bbs/thread-72-1-1.html 一.特征提取Feature Extraction:    SIFT [1] [Demo program][ ...

最新文章

  1. CentOS 6.7安装Storm 0.9.7
  2. NYOJ 420 P次方求和
  3. 程序员面试系列——插入排序
  4. 棋盘问题(信息学奥赛一本通-T1217)
  5. php 跳转到另外一个php,PHP: 其他变更 - Manual
  6. linux和windows下的“回车符”和“换行符”
  7. 线程的五大状态及转换
  8. 自动量策略的开发和优化
  9. 更新导致Svchost CPU100%(转)
  10. 题解报告:hdu1205吃糖果(插空法)
  11. 老马的原创空间搬家通告
  12. 有道翻译js逆向解析
  13. 腾讯云数据库TDSQL-C(原CynosDB)的外网访问配置
  14. Win11+RTX3060显卡 配置cuda和cudnn
  15. 织梦网站如何上传服务器还原,网站转移教程:织梦系统数据库备份和还原的方法步骤...
  16. python格式化输出(二)--字符串的格式化输出
  17. 经典数学问题——三门问题(数据分析面试题)
  18. 【Android】线上自助点餐系统
  19. 昆仑通泰(MCGS)官方资料
  20. 中国气敏传感器市场供需调研与投资战略分析报告2022-2028年

热门文章

  1. Unity集成穿山甲后打包报错android:networkSecurityConfig , Picked up JAVA_TOOL_OPTIONS:-Dfile.encoding=UTF-8
  2. mysql中的dual
  3. UG NX 12 鼠标操作
  4. java.sql.SQLNonTransientConnectionException Public Key Retrieval is not allowed
  5. Promise 拦截器 eslink reject报错 Expected the Promise rejection reason to be an Error
  6. 经典快速制作套打证书模板(doc)大全
  7. 【疑难问题】——Game中子弹的代码结构设计(未完)——是每个实例去监听某个事件
  8. [35期] 神奇的rgb
  9. 蜡烛图plotly_Python数据可视化:如何用mplfinance创建蜡烛图
  10. PKI介绍及搭建Linux私有CA (SSL 示例)