ML Notes: Week 2 - Multivariate Linear Regression
1. The basic theory of the multivariate linear regression
Hypothesis: hθ(x)=θ0x0+θ1x1+…+θnxn=θTXh_\theta(x)=\theta_0x_0+\theta_1x_1+\ldots+\theta_nx_n = \theta^TXhθ(x)=θ0x0+θ1x1+…+θnxn=θTX
Parameters: θ0,θ1,…,θn\theta_0, \theta_1, \ldots, \theta_nθ0,θ1,…,θn
Cost Function: J(θ0,θ1,…,θn)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta_0, \theta_1, \ldots, \theta_n)=\frac{1}{2m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2J(θ0,θ1,…,θn)=2m1i=1∑m(hθ(x(i))−y(i))2
We also can use the gradient descent methop to come up with the optimzed θ\thetaθ.
2. Feature scaling
- Method1: ximax−min\frac{x_i}{\max-\min}max−minxi
- Method2(Mean Normalization): xi−μmax−min\frac{x_i-\mu}{\max-\min}max−minxi−μ
The data could be scaled which ranges in −1≤xi≤1-1\le x_i\le1−1≤xi≤1, or in −0.5≤xi≤0.5-0.5\le x_i\le0.5−0.5≤xi≤0.5
3. Learning rate
- Too small: slow convergence
- Too Large: (a) × convergence; (b) × decreas on every iteration; © slow convergence
TRY!!!
α=0.0001,0.01,0.1,1\alpha = 0.0001, 0.01, 0.1, 1α=0.0001,0.01,0.1,1
4. Normal equation
We can utilize the equation to solve out the θ\thetaθ directly.
θ=(XTX)−1XTy\theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
Derivation of the formula:
Cost Function: J(θ)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta)=\frac{1}{2m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
so, we can vectorization the Cost Function as follows:
J(θ)=12(Xθ−y)T⏟1∗m(Xθ−y)⏟m∗1=12(θTXTXθ−θTXTy−yTXθ−yTy)\begin{aligned} J(\theta) &=\frac{1}{2}\underbrace{(X\theta-y)^T}_{1*m} \underbrace{(X\theta-y)}_{m*1}\\ &=\frac{1}{2}(\theta^TX^TX\theta-\theta^TX^Ty-y^TX\theta-y^Ty) \end{aligned}J(θ)=211∗m(Xθ−y)Tm∗1(Xθ−y)=21(θTXTXθ−θTXTy−yTXθ−yTy)
*the mmm could be igonred.The θ\thetaθ that fit to ∂J(θ)∂θ=0\frac{\partial J(\theta)}{\partial \theta} =0∂θ∂J(θ)=0 could be considered as the optimum, so
∂J(θ)∂θ=12(2XTXθ−XTy−(yTX)T−0)=12(2XTXθ−XTy−XTy−0)=XTXθ−XTy=0\begin{aligned} \frac{\partial J(\theta)}{\partial \theta} &=\frac{1}{2}(2X^TX\theta-X^Ty-(y^TX)^T-0)\\ &= \frac{1}{2}(2X^TX\theta-X^Ty-X^Ty-0)\\ &= X^TX\theta-X^Ty=0 \end{aligned}∂θ∂J(θ)=21(2XTXθ−XTy−(yTX)T−0)=21(2XTXθ−XTy−XTy−0)=XTXθ−XTy=0
XTXθ=XTyX^TX\theta=X^TyXTXθ=XTy
we can solve out that θn∗1=(XTn∗mXm∗n)−1XTn∗mym∗1\mathop \theta\limits_{n*1} =(\mathop {X^T} \limits_{n*m} \mathop X\limits_{m*n})^{-1} \mathop {X^T}\limits_{n*m} \mathop y\limits_{m*1}n∗1θ=(n∗mXTm∗nX)−1n∗mXTm∗1y
*(1)∂Aθ∂θ=AT\frac{\partial A\theta}{\partial\theta} = A^T∂θ∂Aθ=AT
*(2)∂θTAθ∂θ=2Aθ\frac{\partial \theta^T A\theta}{\partial\theta} = 2A\theta∂θ∂θTAθ=2Aθ
%% ============= normal equation ==========
theta_normal = zeros(2,1);
theta_normal = inv(X'*X) * X' * y;
More information: Derivation of the Normal Equation for linear regression
5. Vectorization in univariate gradient descent
- Vectorization
% Vectorization to calculate the \theta
itera = 3000;
theta_matrix = [0 0];
theta_itera = zeros(itera,2); % record all the theta values during the process
for j = 1:iteratheta_itera(j,:) = theta_matrix;hypothesis = X * theta_matrix';theta_matrix = theta_matrix - (alpha/m) * ((hypothesis - y)'* X);
end
- “for” Loop
% "for" loop to calculate the \theta
itera = 3000;
theta_itera = zeros(length(y),2);
for j = 1:iteratheta_itera(j,:) = theta'; % record all the theta values during the processhypothesis = X * theta;for i = 1:theta_lengththeta(i) = theta(i) - (alpha/m) * ((hypothesis - y)'* X(:,i)); endend
**** What if XTXX^TXXTX is non-invertible?
(1) Delete the linearly dependent features (e.g. x2=2x1x2=2x1x2=2x1);
(2) Delete some features to make m(# sample) ≤\le≤ n(# features);
(3) Use regularization.
ML Notes: Week 2 - Multivariate Linear Regression相关推荐
- ML Notes: Week 1 - Univariate Linear Regression
1. The Basic Theory Hypothesis: hθ(x)=θ0+θ1xh_\theta(x)=\theta_0+\theta_1xhθ(x)=θ0+θ1x Parameters ...
- Multivariate Linear Regression的参数估计
多元线性回归的表现形式是(1)hθ(x)=y=θ0+θ1x1+θ2x2+...+θnxnh_\theta(x)=y=\theta_0 + \theta_1x_1 + \theta_2x_2+...+\ ...
- Machine Learning experiment2 Multivariate Linear Regression 详解+源代码实现
数据预处理: 根据题目,我们得知,该数据为房价关于住房面积,卧室数量的相关数据,共47组数据,我们可以观测到住房面积接近等于卧室数量的1000倍左右,所以我们需要对数据进行标准化. 首先,我们作图如下 ...
- Machine Learning week 2 quiz: Linear Regression with Multiple Variables
Linear Regression with Multiple Variables 5 试题 1. Suppose m=4 students have taken some class, and th ...
- Machine Learning – 第2周(Linear Regression with Multiple Variables、Octave/Matlab Tutorial)
Machine Learning – Coursera Octave for Microsoft Windows GNU Octave官网 GNU Octave帮助文档 (有900页的pdf版本) O ...
- AI(006) - 笔记 - 回顾线性回归(Linear Regression)
线性回归(Linear Regression) 之前并未做过笔记,所以这篇文章是对线性模型的回顾,对应: 第一周:(06)1.6 线性回归模型 第一周:(07)1.6 线性回归模型-优化算法 第一周: ...
- ML - 线性回归(Linear Regression)
文章目录 关于线性回归 线性回归特点 和 kNN 图示的区别 简单线性回归 算法原理 如何求解机器学习算法? 编程实现简单线性回归 向量化运算 封装线性回归类 评估方法 向量化运算的性能测试 线性回归 ...
- 李宏毅 2020 ML Homework 1: Linear Regression
(注:本文转载自GitHub:https://github.com/Iallen520/lhy_DL_Hw) (所有作业的数据,链接:https://pan.baidu.com/s/1m1QbhrzK ...
- Coursera公开课笔记: 斯坦福大学机器学习第四课“多变量线性回归(Linear Regression with Multiple Variables)”
Coursera公开课笔记: 斯坦福大学机器学习第四课"多变量线性回归(Linear Regression with Multiple Variables)" 斯坦福大学机器学习第 ...
最新文章
- 快速搭建一个网关服务,动态路由、鉴权看完就会(含流程图)
- 独家 | 13大技能助你成为超级数据科学家!(附链接)
- 管理员端API——任仲行
- Redhat Linux网卡配置与绑定
- 静态变量的多线程同步问题
- Transformer开始往动态路由的方向前进了!厦大华为提出TRAR,在VQA、REC上性能SOTA!(ICCV 2021)...
- java 删除桌面快捷方式_能否在桌面创建快捷方式运行java程序?
- arcgis构建金字塔失败什么原因_新西兰创业移民转永居失败!原因是什么?
- 排序算法:编程算法助程序员走上高手之路
- GoogleEarth的安装与使用
- “我玩某宝第1年,还清所有欠款”:会挣钱的人,都活成什么样 ?
- 【论文精读1】CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
- 躲避地震,不要钻入桌子下方
- 机场部队都在用的无人机干扰设备----- TFN MR09
- 【裴蜀定理】CF1055C Lucky Days
- 网关(Gateway)
- 国际数学日 | 有π的日子,来一场数学派对
- 加速ssh连接的方法(优化ssh服务)
- 概率论基础 —— 4.五种重要的概率分布模型
- 【Proteus仿真】【51单片机】智能电饭煲系统设计
热门文章
- linux chrome 硬件加速,在Chrome上开启硬件加速和预先渲染的方法技巧
- 【增大C盘内存——拓展卷】C盘与未分配空间之间有恢复分区的解决方法——安装diskgenius
- 平移、旋转和缩放矩阵推导
- 35.前端笔记-CSS3-3D转换
- qq,tim聊天的人太多,如何找到和自己聊过天的网友
- 非参数统计:两样本和多样本的Brown-Mood中位数检验;Wilcoxon(Mann-Whitney)秩和检验及有关置信区间;Kruskal-Wallis秩和检验
- 每日新闻 | 人造肉销售火爆全食超市CEO却吐槽:不健康
- 第12课 Altium Designer20(AD20)+VESC6.4实战教程:原理图最后验证(北冥有鱼)
- USB接口那么多!!你都认识吗??知道他们的区别吗??
- NLP文本生成的评价指标有什么?