Cost function of Logistic Regression and Neural Network
Logistic / Sigmoid function
g(x)=11+e−x=ex1+exg(x)=11+e−x=ex1+exg(x) = \dfrac {1} {1 + e ^{-x}} = \dfrac {e ^{x}} {1 + e ^{x}}
Cost function
Logistic Regression
hθ(X)=f(X⊺θ)=P(y=1|X;θ)hθ(X)=f(X⊺θ)=P(y=1|X;θ)h _{\theta}\left (X\right ) = f\left (X ^{\intercal} \theta\right ) = P\left (y = 1 | X; \theta\right )
令 z=X⊺θ,z=X⊺θ,z = X ^{\intercal} \theta, 则
lnP(y=y|X;θ)lnP(y=y|X;θ) \ln P\left (y = \mathrm {y} | X; \theta\right )
=ylnP(y=1|X;θ)+(1−y)lnP(y=0|X;θ)=ylnP(y=1|X;θ)+(1−y)lnP(y=0|X;θ)= \mathrm {y} \ln P\left (y = 1 | X; \theta\right ) + \left (1 - \mathrm {y} \right ) \ln P\left (y = 0 | X; \theta\right )
=ylnhθ(X)+(1−y)ln[1−hθ(X)]=ylnhθ(X)+(1−y)ln[1−hθ(X)]= \mathrm {y} \ln h _{\theta}\left (X\right ) + \left (1 - \mathrm {y} \right ) \ln \left [1 - h _{\theta}\left (X\right ) \right ]
=ylng(z)+(1−y)ln[1−g(z)]=ylng(z)+(1−y)ln[1−g(z)]= \mathrm {y} \ln g(z) + \left (1 - \mathrm {y} \right ) \ln \left [1 - g(z) \right ]
因此 dlnP(y=y|X;θ)=ydlng(z)+(1−y)dln[1−g(z)]dlnP(y=y|X;θ)=ydlng(z)+(1−y)dln[1−g(z)]\operatorname{d} \ln P\left (y = \mathrm {y} | X; \theta\right ) = \mathrm {y} \operatorname{d} \ln g(z) + \left (1 - \mathrm {y} \right ) \operatorname{d} \ln \left [1 - g(z) \right ]
=y⋅1g(z)g(z)[1−g(z)]dz+(1−y)11−g(z)(−1)g(z)[1−g(z)]dz=y⋅1g(z)g(z)[1−g(z)]dz+(1−y)11−g(z)(−1)g(z)[1−g(z)]dz= \mathrm {y} \cdot \dfrac {1} {g(z) } g(z) \left [ 1 - g(z) \right ] \operatorname{d} z + \left (1 - \mathrm {y} \right ) \dfrac {1} {1 - g(z) } (- 1) g(z) \left [ 1 - g(z) \right ] \operatorname{d} z
={y⋅[1−g(z)]−(1−y)g(z)}dz={y⋅[1−g(z)]−(1−y)g(z)}dz= \left \{ \mathrm {y} \cdot \left [ 1 - g(z) \right ] - \left (1 - \mathrm {y} \right ) g(z) \right \} \operatorname{d} z
=[y−g(z)]dz=[y−g(z)]dz= \left [ \mathrm {y} - g(z) \right ] \operatorname{d} z
=[y−g(X⊺θ)]X⊺dθ=[y−g(X⊺θ)]X⊺dθ= \left [ \mathrm {y} - g( X ^{\intercal} \theta) \right ] X ^{\intercal} \operatorname{d} \theta
最大似然函数 L(θ)=ln[∏i=1mP(y=yi|Xi;θ)]=∑i=1mlnP(y=yi|Xi;θ)L(θ)=ln[∏i=1mP(y=yi|Xi;θ)]=∑i=1mlnP(y=yi|Xi;θ)\operatorname{L}\left ({\theta}\right ) = \ln \left [ \prod \limits_{i = 1} ^{m} P\left (y = y_i | X_i; \theta\right ) \right ] = \sum \limits_{i = 1} ^{m} \ln P\left (y = y_i | X_i; \theta\right )
令 cost(θ)=−1mL(θ)=−1m∑i=1mlnP(y=yi|Xi;θ)cost(θ)=−1mL(θ)=−1m∑i=1mlnP(y=yi|Xi;θ)\operatorname {cost} (\theta) = - \dfrac {1} {m} \operatorname{L}\left ({\theta}\right ) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \ln P\left (y = y_i | X_i; \theta\right )
=−1m∑i=1m{yilnhθ(Xi)+(1−yi)ln[1−hθ(Xi)]}=−1m∑i=1m{yilnhθ(Xi)+(1−yi)ln[1−hθ(Xi)]}= - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln h _{\theta}\left (X_i \right ) + \left (1 - y_i \right ) \ln \left [1 - h _{\theta}\left (X_i \right ) \right ] \right \}
=−1m∑i=1m{yilng(zi)+(1−yi)ln[1−g(zi)]}=−1m∑i=1m{yilng(zi)+(1−yi)ln[1−g(zi)]}= - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln g(z_i) + \left (1 - y_i \right ) \ln \left [1 - g(z_i) \right ] \right \} ,其中 zi=X⊺iθzi=Xi⊺θz_i = X_i ^{\intercal} \theta
则 maxL(θ)=−mmincost(θ)maxL(θ)=−mmincost(θ)\max \operatorname{L}\left ({\theta}\right ) = - m \min \operatorname {cost} \left ({\theta}\right )
cost(θ)cost(θ)\operatorname {cost} (\theta) 即为代价函数。
令 g(θ)=−L(θ)g(θ)=−L(θ)g \left ({\theta}\right ) = - \operatorname{L}\left ({\theta}\right )
则 d[g(θ)]=−∑i=1m[yi−g(X⊺iθ)]X⊺idθd[g(θ)]=−∑i=1m[yi−g(Xi⊺θ)]Xi⊺dθ\operatorname{d} \left [ g \left ({\theta}\right ) \right ]= - \sum \limits_{i = 1} ^{m} \left [ y_i - g( X _i ^{\intercal} \theta) \right ] X _i ^{\intercal} \operatorname{d} \theta
=∑i=1m[g(X⊺iθ)−yi]X⊺idθ=∑i=1m[g(Xi⊺θ)−yi]Xi⊺dθ= \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] X_i ^{\intercal} \operatorname{d} \theta
因此 ∇[g(θ)]=∑i=1m[g(X⊺iθ)−yi]Xi∇[g(θ)]=∑i=1m[g(Xi⊺θ)−yi]Xi\nabla\left [ g \left ({\theta}\right ) \right ] = \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] X_i
=X⊺[g(X⊺θ)−y]=X⊺[g(X⊺θ)−y]= \mathbf {X} ^{\intercal} \left [ g \left (\mathbf {X} ^{\intercal} \theta \right ) - \mathrm {y} \right ]
其中 X=⎛⎝⎜⎜X⊺1⋮X⊺m⎞⎠⎟⎟,y=⎛⎝⎜⎜y⊺1⋮y⊺m⎞⎠⎟⎟,g(X⊺θ)=⎛⎝⎜⎜g(X⊺1θ)⋮g(X⊺mθ)⎞⎠⎟⎟,X=(X1⊺⋮Xm⊺),y=(y1⊺⋮ym⊺),g(X⊺θ)=(g(X1⊺θ)⋮g(Xm⊺θ)),\mathbf {X} = \begin{pmatrix} X_1^{\intercal} \\ \vdots \\ X_m ^{\intercal} \end{pmatrix} , \mathrm {y} = \begin{pmatrix} \mathrm {y}_1^{\intercal} \\ \vdots \\ \mathrm {y}_m ^{\intercal} \end{pmatrix} , g \left (\mathbf {X} ^{\intercal} \theta \right ) = \begin{pmatrix} g \left (X_1 ^{\intercal} \theta \right ) \\ \vdots \\ g \left (X_m ^{\intercal} \theta \right ) \end{pmatrix},
则 d{∇[g(θ)]}=∑i=1md[g(X⊺iθ)]Xid{∇[g(θ)]}=∑i=1md[g(Xi⊺θ)]Xi \operatorname{d} \left \{ \nabla\left [ g \left ({\theta}\right ) \right ] \right \} = \sum \limits_{i = 1} ^{m} \operatorname{d} \left [ g \left (X_i ^{\intercal} \theta \right ) \right ] X_i
=∑i=1mg′(X⊺iθ)(X⊺idθ)Xi=∑i=1mg′(Xi⊺θ)(Xi⊺dθ)Xi= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) \left (X_i ^{\intercal} \operatorname{d}{\theta} \right ) X_i
=∑i=1mg′(X⊺iθ)XiX⊺idθ=∑i=1mg′(Xi⊺θ)XiXi⊺dθ= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal} \operatorname{d}{\theta}
因此 Hg(θ)=∑i=1mg′(X⊺iθ)XiX⊺iHg(θ)=∑i=1mg′(Xi⊺θ)XiXi⊺\operatorname{H}_{g(\theta)} = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal}
注
∂∂θjg(θ)=∑i=1m[g(X⊺iθ)−yi]xij,j∈N,1≤j≤n∂∂θjg(θ)=∑i=1m[g(Xi⊺θ)−yi]xij,j∈N,1≤j≤n\dfrac {\partial } {\partial {\theta}_{j}} g \left ({\theta}\right )= \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] x_{ij}, j \in \mathbb N, 1 \le j \le n
Regularized Logistic Regression
cost(θ)=−1m∑i=1m{yilnhθ(Xi)+(1−yi)ln[1−hθ(Xi)]}+λ2n∑j=1nθ2jcost(θ)=−1m∑i=1m{yilnhθ(Xi)+(1−yi)ln[1−hθ(Xi)]}+λ2n∑j=1nθj2\operatorname {cost} (\theta) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln h _{\theta}\left (X_i \right ) + \left (1 - y_i \right ) \ln \left [1 - h _{\theta}\left (X_i \right ) \right ] \right \} + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} \theta _j ^2
则
Hcost(θ)=∑i=1mg′(X⊺iθ)XiX⊺i+λ2n⎛⎝⎜⎜⎜⎜⎜01⋱1⎞⎠⎟⎟⎟⎟⎟Hcost(θ)=∑i=1mg′(Xi⊺θ)XiXi⊺+λ2n(01⋱1)\operatorname{H}_{\operatorname {cost} (\theta)} = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal} + \dfrac {\lambda} {2 n} \begin{pmatrix} 0 & & & \\ & 1 & & \\ & & \ddots & \\ & & & 1 \end{pmatrix}
性质
Hcost(θ)Hcost(θ)\operatorname{H}_{\operatorname {cost} (\theta)} 为正定矩阵。
证明
∀Z=⎛⎝⎜⎜z0⋮zn⎞⎠⎟⎟∈Rn+1,∀Z=(z0⋮zn)∈Rn+1,\forall Z = \begin{pmatrix} z_0 \\ \vdots \\ z_n \end{pmatrix} \in \mathbb R^{n + 1},
Z⊺Hcost(θ)Z=∑i=1mg′(X⊺iθ)Z⊺XiX⊺iZ+λ2n∑j=1nz2jZ⊺Hcost(θ)Z=∑i=1mg′(Xi⊺θ)Z⊺XiXi⊺Z+λ2n∑j=1nzj2Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) Z ^{\intercal} X_i X_i ^{\intercal} Z + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} z _j ^2
=∑i=1mg′(X⊺iθ)(X⊺iZ)2+λ2n∑j=1nz2j≥0=∑i=1mg′(Xi⊺θ)(Xi⊺Z)2+λ2n∑j=1nzj2≥0= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) { \left ( X_i ^{\intercal} Z \right ) } ^2 + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} z _j ^2 \ge 0
若 Z⊺Hcost(θ)Z=0,Z⊺Hcost(θ)Z=0,Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = 0, 则 ∀j∈N,1≤j≤n,zj=0,∀j∈N,1≤j≤n,zj=0,\forall j \in \mathbb N, 1 \le j \le n, z_j = 0,
于是 Z⊺Hcost(θ)Z=∑i=1mg′(X⊺iθ)z02=0⇒z0=0Z⊺Hcost(θ)Z=∑i=1mg′(Xi⊺θ)z02=0⇒z0=0Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) { z_0 } ^2 = 0 \Rightarrow z_0 = 0
于是 Z=0Z=0Z = 0
因此 Hcost(θ)Hcost(θ)\operatorname{H}_{\operatorname {cost} (\theta)} 为正定矩阵。
Neural Network for Classification
cost(θ)=−1m∑i=1m∑k=1K{yik(lnhθ(Xi))k+(1−yik)(ln[1−hθ(Xi)])k}cost(θ)=−1m∑i=1m∑k=1K{yik(lnhθ(Xi))k+(1−yik)(ln[1−hθ(Xi)])k}\operatorname {cost} (\mathbf {\theta}) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \sum \limits_{k = 1} ^{K} \left \{ y_{ik} \left (\ln h _{\mathbf {\theta}} \left (X_i \right ) \right )_k + \left (1 - y_{ik} \right ) \left (\ln \left [1 - h _{\mathbf {\theta}}\left (X_i \right ) \right ] \right )_k \right \}
+λ2m∑l=1L−1∑i=1sl+1∑j=1slθ2lij+λ2m∑l=1L−1∑i=1sl+1∑j=1slθlij2+ \dfrac {\lambda} {2m} \sum \limits_{l = 1} ^{L - 1} \sum \limits_{i = 1} ^{s_{l + 1}} \sum \limits_{j = 1} ^{s_l} {\theta} _{lij} ^2
Cost function of Logistic Regression and Neural Network相关推荐
- Logistic Regression Model
Logistic Regression Model Cost Function Logistic Function will not be a convex function, it will cau ...
- Linear Regression Logistic Regression
Linear Regression & Logistic Regression Keywords List keywords you may encounter when exploring ...
- linear regression and logistic regression
①linear regression target function的推导 线性回归是一种做拟合的算法: 通过工资和年龄预测额度,这样就可以做拟合来预测了.有两个特征,那么就要求有两个参数了,设置 , ...
- linear regression and logistic regression 1
①linear regression target function的推导 线性回归是一种做拟合的算法: 通过工资和年龄预测额度,这样就可以做拟合来预测了.有两个特征,那么就要求有两个参数了,设置 , ...
- Speeding up your Neural Network with Theano and the GPU
In a previous blog post we build a simple Neural Network from scratch. Let's build on top of this an ...
- 2.深度学习练习:Logistic Regression with a Neural Network mindset
本文节选自吴恩达老师<深度学习专项课程>编程作业,在此表示感谢. 课程链接:https://www.deeplearning.ai/deep-learning-specialization ...
- logistic regression中的cost function选择
一般的线性回归使用的cost function为: 但由于logistic function: 本身非凸函数(convex function), 如果直接使用线性回归的cost function的话, ...
- cs230 深度学习 Lecture 2 编程作业: Logistic Regression with a Neural Network mindset
本文结构: 将 Logistic 表达为 神经网络 的形式 构建模型 导入包 获得数据 并进行预处理: 格式转换,归一化 整合模型: A. 构建模型 a. 初始化参数:w 和 b 为 0 b. 前向传 ...
- A Convolutional Neural Network Model for Predicting a Product’s Function, Given Its Form
ABSTRACT 量化数字设计概念执行功能的能力目前需要使用昂贵的软件以及密集型解决方案,例如计算流体动力学.为了缓解这些挑战,这项工作的作者提出了一种基于3维卷积的深度学习方法,该方法可预测数字设计 ...
- Linear Regression、Logistic Regression、激励函数activation function(relu,sigmoid, tanh, softplus)
1.5.2.Linear Regression 1.5.2.1.Linear Regression 1.5.2.1.1.案例1 1.5.2.1.2.案例2 1.5.2.1.3.案例3源码(带有保存模型 ...
最新文章
- Distilling the Knowledge in a Neural Network 论文笔记蒸馏
- Confluence 6 创建小组的公众空间
- Wiki为什么会流行
- centos7.6成功安装nerdtree插件
- OpenCV_008-OpenCV 中的图像算术运算
- 著名开源项目_著名开源项目案例研究
- leetcode 220. 存在重复元素 III(排序)
- idea整合jboos_在 idea 中 启动 jboss 后, 没有运行部署(通过idea部署)的ssm项目,打开后项目404...
- webpack4.0各个击破(10)—— Integration篇
- 计算方法 matlab,计算方法及其MATLAB实现
- Displaytag 详解
- 更改Web应用地址栏显示的图标
- 【摘录】哈利 · 波特 与密室(1998)
- 人脸识别与Disentangled Representation
- ensp中ap获取不到ip_[网络求助]华为ap无法获取到ip
- 嵌入式开发语言-C语言编程
- 小镇故事介绍 这个世界很喧哗,有的时候只需要一个人静一静
- math-常见导数公式
- python 之Entry
- Windows10切换屏幕
热门文章
- Linux inittab和oracle lsntctl 启动的问题解决办法
- ThinkPHP自定义404页面
- python安装request方法mac_Mac下python3使用requests库出现No module named 'requests'解决方法...
- 苹果电脑拷贝文件到u盘很慢_小米最硬核U盘!20g,3.1接口,120MB/S读取速度,可连iphone华为...
- oracle sql 分区查询语句_Oracle分区表跨分区查询数据为空
- mysql数据库官网怎么下载安装_MySQL数据库的下载与安装
- python能做的java能做吗_java – Jython不能做什么Python?
- python curl invalid syntax_将CURL Post转换为Python请求失败
- 最短路经典 昂贵的聘礼(1062)
- NumPy学习笔记之random.randn()函数