Regularized Normal Equation for Linear Re-gression Given a data set
{ar(), y()}i=1,.-.,m with x()∈ R" and g(∈ R, the generalform of
regularized linear regression is as follows n (he(zr)- g)3+入>0号 (1) ”
2m i=1 j=1 Derive the normal equation.


X = [ ( x ( 1 ) ) T ( x ( 2 ) ) T . . . ( x ( m ) ) T ] X=\begin{bmatrix} (x^{(1)})^T \\ (x^{(2)})^T \\ ... \\ (x^{(m)})^T \end{bmatrix} X= ​(x(1))T(x(2))T...(x(m))T​ ​ , Y = [ y ( 1 ) y ( 2 ) . . . y ( m ) ] Y=\begin{bmatrix} y^{(1)} \\ y^{(2)} \\ ... \\ y^{(m)} \end{bmatrix} Y= ​y(1)y(2)...y(m)​ ​,

因此, X θ − Y = [ ( x ( 1 ) ) T θ ( x ( 2 ) ) T θ . . . ( x ( m ) ) T θ ] − [ y ( 1 ) y ( 2 ) . . . y ( m ) ] = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) . . . h θ ( x ( m ) ) − y ( m ) ] X \theta-Y=\begin{bmatrix} (x^{(1)})^T\theta \\ (x^{(2)})^T\theta \\ ... \\ (x^{(m)})^T\theta \end{bmatrix}-\begin{bmatrix} y^{(1)} \\ y^{(2)} \\ ... \\ y^{(m)} \end{bmatrix}=\begin{bmatrix} h_{\theta}(x^{(1)})-y^{(1)} \\ h_{\theta}(x^{(2)})-y^{(2)} \\ ... \\ h_{\theta}(x^{(m)})-y^{(m)} \end{bmatrix} Xθ−Y= ​(x(1))Tθ(x(2))Tθ...(x(m))Tθ​ ​− ​y(1)y(2)...y(m)​ ​= ​hθ​(x(1))−y(1)hθ​(x(2))−y(2)...hθ​(x(m))−y(m)​ ​,

损失函数可以表达为 J ( θ ) = 1 2 m [ ( X θ − Y ) T ( X θ − Y ) + λ θ T θ ] J(\theta)=\frac{1}{2m}[(X \theta-Y)^T(X \theta-Y)+\lambda\theta^T\theta] J(θ)=2m1​[(Xθ−Y)T(Xθ−Y)+λθTθ],

∇ θ J ( θ ) = ∇ θ 1 2 m [ ( X θ − Y ) T ( X θ − Y ) + λ θ T θ ] \nabla_{\theta}J(\theta)=\nabla_{\theta}\frac{1}{2m}[(X \theta-Y)^T(X \theta-Y)+\lambda\theta^T\theta] ∇θ​J(θ)=∇θ​2m1​[(Xθ−Y)T(Xθ−Y)+λθTθ]
= 1 2 m [ ∇ θ ( X θ − Y ) T ( X θ − Y ) + ∇ θ λ θ T θ ] =\frac{1}{2m}[\nabla_{\theta}(X \theta-Y)^T(X \theta-Y)+\nabla_{\theta}\lambda\theta^T\theta] =2m1​[∇θ​(Xθ−Y)T(Xθ−Y)+∇θ​λθTθ]

∇ θ λ θ T θ = λ ∇ θ θ T θ = λ ∇ θ t r ( θ θ T ) = λ L θ \nabla_{\theta}\lambda\theta^T\theta=\lambda\nabla_{\theta}\theta^T\theta=\lambda\nabla_{\theta}tr(\theta\theta^T)=\lambda L\theta ∇θ​λθTθ=λ∇θ​θTθ=λ∇θ​tr(θθT)=λLθ

因此, ∇ θ J ( θ ) = 1 2 m ( X T X θ − X T Y + λ L θ ) \nabla_{\theta}J(\theta)=\frac{1}{2m}(X^TX\theta-X^TY+\lambda L\theta) ∇θ​J(θ)=2m1​(XTXθ−XTY+λLθ)

令 ∇ θ J ( θ ) = 0 \nabla_{\theta}J(\theta)=0 ∇θ​J(θ)=0,当 X X X矩阵各列向量线性独立时, X T X X^TX XTX矩阵可逆,存在唯一解 θ = ( X T X + λ L ) − 1 X T Y \theta=(X^TX+\lambda L)^{-1}X^TY θ=(XTX+λL)−1XTY.

aussian Discriminant Analysis Model Given m training data {x() ,
g)}i=1,… ,m,assume that y ~ Bernoulli(b),ay =0~N(uo,2),x \ y = 1
~N(u1,>).Hence, we have p(y)= ”(1 一 )1一u .p(zl y =0)=(2z7)"/72 3(1/
exp(一士(a一 uo)T>-1(a 一o))op(al y= 1)=(2n)n/l2(1/a exp (一是(a
一u1)TE-1(a一ui))The log-likelihood function is m l(, /Lo,41,>)= log ][
[p(r(), g); o, uo,41,) i二1 m
=logp(x()| g() ; ,uo,41,2)p(g() ; ) i—1 Solve p,o,u1 and 2 by maximizing l(, Lo,u1,>). Hint: xtr(AX-1B)=一(X-1BAX-1)T,VA|A=|A|(A-1)T


这里 高斯判别分析(GDA)公式推导

3MLE for Naive Bayes Consider the following definition of MLE problem
for multinomials. Theinput to the problem is a finite set J,and a
weight cg > 0 for each gy ∈ y. The output from the problem is the
distribution p* that solves the followingmaximization problem. p*= arg
max > c y log py y∈ (i) Prove that,the vector p* has components p,-Cy
for Vy ∈ y,where N = >ucycy.(Hint: Use the theory of
Lagrangemultiplier) (1i) Using the above consequence,prove that,the
maximum-likelihood esti- mates for Naive Bayes model are as follows
p)=之1 1(y()=gy) m and Ps(a l y)=>E1 1(g=y Aa,=z) 〉岩11(g(阈)= g)



(i)设拉格朗日函数为 L ( Ω , α ) = ∑ y ∈ Y c y l o g p y − α ( ∑ y ∈ Y p y − 1 ) L(\Omega,\alpha)=\sum_{y\in Y}c_ylogp_y-\alpha(\sum_{y\in Y}p_y-1) L(Ω,α)=∑y∈Y​cy​logpy​−α(∑y∈Y​py​−1),其中 α \alpha α为拉格朗日乘子,

对 p y p_y py​求偏导,令 ∂ ∂ p y L ( Ω , α ) = 0 \frac{\partial}{\partial p_y}L(\Omega,\alpha)=0 ∂py​∂​L(Ω,α)=0,

求得 p y ∗ = c y α p_y^{*}=\frac{c_y}{\alpha} py∗​=αcy​​,代入 ∑ y ∈ Y p y ∗ = 1 \sum_{y\in Y} p_y^{*}=1 ∑y∈Y​py∗​=1得 ∑ y ∈ Y c y α = 1 \frac{\sum_{y\in Y}c_y}{\alpha}=1 α∑y∈Y​cy​​=1,

而 N = ∑ y ∈ Y c y N=\sum_{y\in Y}c_y N=∑y∈Y​cy​,因此 α = N \alpha=N α=N,进而 p y ∗ = c y N p_y^{*}=\frac{c_y}{N} py∗​=Ncy​​

(ii)贝叶斯的最大似然模型的目标函数为

m a x ∑ i = 1 m l o g p ( y ( i ) ) + ∑ i = 1 m ∑ j = 1 n l o g p j ( x j ( i ) ∣ y ( i ) ) max\ {\sum^{m}_{i=1}logp(y^{(i)})}+\sum^{m}_{i=1}\sum^{n}_{j=1}logp_j(x_j^{(i)}|y^{(i)}) max ∑i=1m​logp(y(i))+∑i=1m​∑j=1n​logpj​(xj(i)​∣y(i))

设标签种类数为 k k k,则 p ( y ) p(y) p(y)满足约束 ∑ i = 1 k p ( y ) = 1 \sum^k_{i=1} p(y)=1 ∑i=1k​p(y)=1,以及 p ( x j ∣ y ) p(x_{j}|y) p(xj​∣y)满足约束 ∑ j = 1 n p ( x j ∣ y ) = 1 \sum^n_{j=1} p(x_{j}|y)=1 ∑j=1n​p(xj​∣y)=1,且所有概率均是非负的。

注意到加号两边可以分开独立进行优化,对于加号左边考虑优化模型:

m a x ∑ i = 1 m l o g p ( y ( i ) ) max\ {\sum^{m}_{i=1}logp(y^{(i)})} max ∑i=1m​logp(y(i))

s . t . ∑ i = 1 k p ( y ) = 1 s.t. \sum^k_{i=1} p(y)=1 s.t.∑i=1k​p(y)=1

将标签 y y y在训练集中的出现次数 c n t ( y ) cnt(y) cnt(y)视为权重 c y c_y cy​,其中 c n t ( y ) = ∑ i = 1 m 1 ( y ( i ) = y ) cnt(y)=\sum^m_{i=1}1(y^{(i)}=y) cnt(y)=∑i=1m​1(y(i)=y),因此

m a x ∑ i = 1 m l o g p ( y ( i ) ) = m a x ∑ i = 1 k c n t ( y ) l o g p ( y ) max\ {\sum^{m}_{i=1}logp(y^{(i)})}=max\ {\sum^{k}_{i=1}cnt(y)logp(y)} max ∑i=1m​logp(y(i))=max ∑i=1k​cnt(y)logp(y),根据第一问的结论有 p ∗ ( y ) = c n t ( y ) m = ∑ i = 1 m 1 ( y ( i ) = y ) m p^*(y)=\frac{cnt(y)}{m}=\frac{\sum^m_{i=1}1(y^{(i)}=y)}{m} p∗(y)=mcnt(y)​=m∑i=1m​1(y(i)=y)​.

同理,将特征 x j x_j xj​在训练集标签为 y y y的样本中的出现次数 c n t ( x j ∣ y ) cnt(x_j|y) cnt(xj​∣y)视为权重 c y c_y cy​,其中 c n t ( x j ∣ y ) = ∑ i = 1 m 1 ( y ( i ) = y ∧ x j ( i ) = x ) cnt(x_j|y)=\sum^m_{i=1}1(y^{(i)}=y \land x_j^{(i)}=x) cnt(xj​∣y)=∑i=1m​1(y(i)=y∧xj(i)​=x),因此

m a x ∑ i = 1 m ∑ j = 1 n l o g p j ( x j ( i ) ∣ y ( i ) ) = m a x ∑ j = 1 n ∑ i = 1 m l o g p j ( x j ( i ) ∣ y ( i ) ) = m a x ∑ j = 1 n c n t ( x j ∣ y ) l o g p j ( x j ∣ y ) max\ \sum^{m}_{i=1}\sum^{n}_{j=1}logp_j(x_j^{(i)}|y^{(i)})\\=max\ \sum^{n}_{j=1}\sum^{m}_{i=1}logp_j(x_j^{(i)}|y^{(i)})\\=max \sum^{n}_{j=1}cnt(x_j|y)logp_j(x_j|y) max ∑i=1m​∑j=1n​logpj​(xj(i)​∣y(i))=max ∑j=1n​∑i=1m​logpj​(xj(i)​∣y(i))=max∑j=1n​cnt(xj​∣y)logpj​(xj​∣y)

根据第一问的结论有 p j ∗ ( x j ∣ y ) = c n t ( x j ∣ y ) c n t ( y ) = ∑ i = 1 m 1 ( y ( i ) = y ∧ x j ( i ) = x ) ∑ i = 1 m 1 ( y ( i ) = y ) p^*_j(x_j|y)=\frac{cnt(x_j|y)}{cnt(y)}=\frac{\sum^m_{i=1}1(y^{(i)}=y \land x_j^{(i)}=x)}{\sum^m_{i=1}1(y^{(i)}=y)} pj∗​(xj​∣y)=cnt(y)cnt(xj​∣y)​=∑i=1m​1(y(i)=y)∑i=1m​1(y(i)=y∧xj(i)​=x)​,证毕。

Problem Set 2相关推荐

  1. linux下yum错误:[Errno 14] problem making ssl connection Trying other mirror.

    所有的base 都要取消注释 mirrorlist 加上注释 另外所有的enable都要设为零 目录 今天是要yum命令安装EPEL仓库后 yum install epel-release 突然发现y ...

  2. A + B Problem

    1001: A + B Problem Description 计算 A + B. Input 多组测试数据,每组测试数据占一行,包括2个整数. Output 在一行中输出结果. Sample Inp ...

  3. Error:(49, 1) A problem occurred evaluating project ':guideview'. Could not read script 'https://r

    出现问题如下: Error:(49, 1) A problem occurred evaluating project ':guideview'. > Could not read script ...

  4. #418 Div2 Problem B An express train to reveries (构造 || 全排列序列特性)

    题目链接:http://codeforces.com/contest/814/problem/B 题意 : 有一个给出两个含有 n 个数的序列 a 和 b, 这两个序列和(1~n)的其中一个全排列序列 ...

  5. ADPRL - 近似动态规划和强化学习 - Note 3 - Stochastic Infinite Horizon Problem

    Stochastic Infinite Horizon Problem 3.Stochastic Infinite Horizon Problem 定义3.1 无限范围的马尔可夫决策过程 (Marko ...

  6. ADPRL - 近似动态规划和强化学习 - Note 2 - Stochastic Finite Horizon Problem

    2. Stochastic Finite Horizon Problem 在这一节中主要介绍了随机DP算法来解决不确定性下的有限地范围问题,如Denition 1.4所述,它被表述为一个组合优化问题. ...

  7. There was a problem confirming the ssl certificate ……

    在安装一个Python库onetimepass时发生下面的问题: pip install onetimepass Could not fetch URL https://pypi.python.org ...

  8. HDU 1757 A Simple Math Problem

    Problem Description Lele now is thinking about a simple function f(x). If x < 10 f(x) = x. If x & ...

  9. The C10K problem原文翻译

    原文地址:http://www.cnblogs.com/fll/archive/2008/05/17/1201540.html The C10K problem 如今的web服务器需要同时处理一万个以 ...

  10. This is probably not a problem with npm. There is likely additional logging output above

    微信小程序开发交流qq群   173683895    承接微信小程序开发.扫码加微信. E:\weii_objct\invoice-manage-web-view>npm start > ...

最新文章

  1. 转:在 .NET 中实现异步回调访问数据库
  2. PHP之SQL防注入代码(360提供)
  3. 【数据结构-排序】3.图解选择排序两种实现(简单选择排序/堆排序)
  4. 第三次学JAVA再学不好就吃翔(part93)--LinkedHashMap
  5. php析构函数使用,php析构函数__destruct()使用方法及实例讲解
  6. Thinkphp3.2版本Controller和Action的访问方法
  7. 微波感应模块电路图_关于人体感应灯,你不知道的“冷”知识
  8. java导出word文档_PHPWord导出word文档
  9. 【揭秘】一个小团队真正能落地的微服务架构实践
  10. mindoc源码编译和部署
  11. 如何注册gmail邮箱
  12. 卡塔尔能源每年将向中石化供应400万吨液化天然气;哪吒汽车发布技术品牌浩智 | 美通企业日报...
  13. Cassandra启动过程详解
  14. [AT ZONe2021]Sneaking
  15. 逆向破解——和什么叫渗透
  16. 无线连接服务器678,上网显示,“错误678,远程服务器~”是什么意思,怎么处理?...
  17. 安全狗入选信通院“业务安全推进计划”名单
  18. 孟岩:通证经济设计的七个原则,八个陷阱和十一个模板
  19. 29个省市,364个门店,新飘柔营销背后,是一场80天的AR狂欢
  20. HTML筑基知识点二

热门文章

  1. 笔记:Docker命令自动补全
  2. 用 UE 虚幻引擎做个捏脸小功能~~
  3. 大淘宝技术斩获NTIRE视频增强和超分比赛冠军(内含夺冠方案)
  4. python语音识别代码_python语音识别教程
  5. 一款很棒的图片放大镜插件
  6. 办公家具厂家教你合理地布置办公家具
  7. 网络管理常用命令之Pathping 命令使用详解(图文)
  8. SetTimer 定时器使用方法
  9. 分析理解 vue2.x和3.0的响应式系统的异同
  10. 讨论组——下班之余讨论DotA的游戏发烧友好去处