1. The basic theory of the multivariate linear regression

Hypothesis: hθ(x)=θ0x0+θ1x1+…+θnxn=θTXh_\theta(x)=\theta_0x_0+\theta_1x_1+\ldots+\theta_nx_n = \theta^TXhθ​(x)=θ0​x0​+θ1​x1​+…+θn​xn​=θTX

Parameters: θ0,θ1,…,θn\theta_0, \theta_1, \ldots, \theta_nθ0​,θ1​,…,θn​

Cost Function: J(θ0,θ1,…,θn)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta_0, \theta_1, \ldots, \theta_n)=\frac{1}{2m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2J(θ0​,θ1​,…,θn​)=2m1​i=1∑m​(hθ​(x(i))−y(i))2

We also can use the gradient descent methop to come up with the optimzed θ\thetaθ.

2. Feature scaling

  • Method1: ximax⁡−min⁡\frac{x_i}{\max-\min}max−minxi​​
  • Method2(Mean Normalization): xi−μmax⁡−min⁡\frac{x_i-\mu}{\max-\min}max−minxi​−μ​

The data could be scaled which ranges in −1≤xi≤1-1\le x_i\le1−1≤xi​≤1, or in −0.5≤xi≤0.5-0.5\le x_i\le0.5−0.5≤xi​≤0.5

3. Learning rate

  • Too small: slow convergence
  • Too Large: (a) × convergence; (b) × decreas on every iteration; © slow convergence

TRY!!!
α=0.0001,0.01,0.1,1\alpha = 0.0001, 0.01, 0.1, 1α=0.0001,0.01,0.1,1

4. Normal equation

We can utilize the equation to solve out the θ\thetaθ directly.
θ=(XTX)−1XTy\theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy

Derivation of the formula:
Cost Function: J(θ)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta)=\frac{1}{2m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2J(θ)=2m1​i=1∑m​(hθ​(x(i))−y(i))2
so, we can vectorization the Cost Function as follows:
J(θ)=12(Xθ−y)T⏟1∗m(Xθ−y)⏟m∗1=12(θTXTXθ−θTXTy−yTXθ−yTy)\begin{aligned} J(\theta) &=\frac{1}{2}\underbrace{(X\theta-y)^T}_{1*m} \underbrace{(X\theta-y)}_{m*1}\\ &=\frac{1}{2}(\theta^TX^TX\theta-\theta^TX^Ty-y^TX\theta-y^Ty) \end{aligned}J(θ)​=21​1∗m(Xθ−y)T​​m∗1(Xθ−y)​​=21​(θTXTXθ−θTXTy−yTXθ−yTy)​
*the mmm could be igonred.

The θ\thetaθ that fit to ∂J(θ)∂θ=0\frac{\partial J(\theta)}{\partial \theta} =0∂θ∂J(θ)​=0 could be considered as the optimum, so
∂J(θ)∂θ=12(2XTXθ−XTy−(yTX)T−0)=12(2XTXθ−XTy−XTy−0)=XTXθ−XTy=0\begin{aligned} \frac{\partial J(\theta)}{\partial \theta} &=\frac{1}{2}(2X^TX\theta-X^Ty-(y^TX)^T-0)\\ &= \frac{1}{2}(2X^TX\theta-X^Ty-X^Ty-0)\\ &= X^TX\theta-X^Ty=0 \end{aligned}∂θ∂J(θ)​​=21​(2XTXθ−XTy−(yTX)T−0)=21​(2XTXθ−XTy−XTy−0)=XTXθ−XTy=0​
XTXθ=XTyX^TX\theta=X^TyXTXθ=XTy
we can solve out that θn∗1=(XTn∗mXm∗n)−1XTn∗mym∗1\mathop \theta\limits_{n*1} =(\mathop {X^T} \limits_{n*m} \mathop X\limits_{m*n})^{-1} \mathop {X^T}\limits_{n*m} \mathop y\limits_{m*1}n∗1θ​=(n∗mXT​m∗nX​)−1n∗mXT​m∗1y​

*(1)∂Aθ∂θ=AT\frac{\partial A\theta}{\partial\theta} = A^T∂θ∂Aθ​=AT

*(2)∂θTAθ∂θ=2Aθ\frac{\partial \theta^T A\theta}{\partial\theta} = 2A\theta∂θ∂θTAθ​=2Aθ

%% ============= normal equation ==========
theta_normal = zeros(2,1);
theta_normal = inv(X'*X) * X' * y;

More information: Derivation of the Normal Equation for linear regression

5. Vectorization in univariate gradient descent

  • Vectorization
% Vectorization to calculate the \theta
itera = 3000;
theta_matrix = [0 0];
theta_itera = zeros(itera,2); % record all the theta values during the process
for j = 1:iteratheta_itera(j,:) = theta_matrix;hypothesis = X * theta_matrix';theta_matrix = theta_matrix - (alpha/m) * ((hypothesis - y)'* X);
end
  • “for” Loop
% "for" loop to calculate the \theta
itera = 3000;
theta_itera = zeros(length(y),2);
for j = 1:iteratheta_itera(j,:) = theta';  % record all the theta values during the processhypothesis = X * theta;for i = 1:theta_lengththeta(i) = theta(i) - (alpha/m) * ((hypothesis - y)'* X(:,i));  endend

**** What if XTXX^TXXTX is non-invertible?

(1) Delete the linearly dependent features (e.g. x2=2x1x2=2x1x2=2x1);
(2) Delete some features to make m(# sample) ≤\le≤ n(# features);
(3) Use regularization.

ML Notes: Week 2 - Multivariate Linear Regression相关推荐

  1. ML Notes: Week 1 - Univariate Linear Regression

    1. The Basic Theory Hypothesis: hθ(x)=θ0+θ1xh_\theta(x)=\theta_0+\theta_1xhθ​(x)=θ0​+θ1​x Parameters ...

  2. Multivariate Linear Regression的参数估计

    多元线性回归的表现形式是(1)hθ(x)=y=θ0+θ1x1+θ2x2+...+θnxnh_\theta(x)=y=\theta_0 + \theta_1x_1 + \theta_2x_2+...+\ ...

  3. Machine Learning experiment2 Multivariate Linear Regression 详解+源代码实现

    数据预处理: 根据题目,我们得知,该数据为房价关于住房面积,卧室数量的相关数据,共47组数据,我们可以观测到住房面积接近等于卧室数量的1000倍左右,所以我们需要对数据进行标准化. 首先,我们作图如下 ...

  4. Machine Learning week 2 quiz: Linear Regression with Multiple Variables

    Linear Regression with Multiple Variables 5 试题 1. Suppose m=4 students have taken some class, and th ...

  5. Machine Learning – 第2周(Linear Regression with Multiple Variables、Octave/Matlab Tutorial)

    Machine Learning – Coursera Octave for Microsoft Windows GNU Octave官网 GNU Octave帮助文档 (有900页的pdf版本) O ...

  6. AI(006) - 笔记 - 回顾线性回归(Linear Regression)

    线性回归(Linear Regression) 之前并未做过笔记,所以这篇文章是对线性模型的回顾,对应: 第一周:(06)1.6 线性回归模型 第一周:(07)1.6 线性回归模型-优化算法 第一周: ...

  7. ML - 线性回归(Linear Regression)

    文章目录 关于线性回归 线性回归特点 和 kNN 图示的区别 简单线性回归 算法原理 如何求解机器学习算法? 编程实现简单线性回归 向量化运算 封装线性回归类 评估方法 向量化运算的性能测试 线性回归 ...

  8. 李宏毅 2020 ML Homework 1: Linear Regression

    (注:本文转载自GitHub:https://github.com/Iallen520/lhy_DL_Hw) (所有作业的数据,链接:https://pan.baidu.com/s/1m1QbhrzK ...

  9. Coursera公开课笔记: 斯坦福大学机器学习第四课“多变量线性回归(Linear Regression with Multiple Variables)”

    Coursera公开课笔记: 斯坦福大学机器学习第四课"多变量线性回归(Linear Regression with Multiple Variables)" 斯坦福大学机器学习第 ...

最新文章

  1. 快速搭建一个网关服务,动态路由、鉴权看完就会(含流程图)
  2. 独家 | 13大技能助你成为超级数据科学家!(附链接)
  3. 管理员端API——任仲行
  4. Redhat Linux网卡配置与绑定
  5. 静态变量的多线程同步问题
  6. Transformer开始往动态路由的方向前进了!厦大华为提出TRAR,在VQA、REC上性能SOTA!(ICCV 2021)...
  7. java 删除桌面快捷方式_能否在桌面创建快捷方式运行java程序?
  8. arcgis构建金字塔失败什么原因_新西兰创业移民转永居失败!原因是什么?
  9. 排序算法:编程算法助程序员走上高手之路
  10. GoogleEarth的安装与使用
  11. “我玩某宝第1年,还清所有欠款”:会挣钱的人,都活成什么样 ?
  12. 【论文精读1】CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
  13. 躲避地震,不要钻入桌子下方
  14. 机场部队都在用的无人机干扰设备----- TFN MR09
  15. 【裴蜀定理】CF1055C Lucky Days
  16. 网关(Gateway)
  17. 国际数学日 | 有π的日子,来一场数学派对
  18. 加速ssh连接的方法(优化ssh服务)
  19. 概率论基础 —— 4.五种重要的概率分布模型
  20. 【Proteus仿真】【51单片机】智能电饭煲系统设计

热门文章

  1. linux chrome 硬件加速,在Chrome上开启硬件加速和预先渲染的方法技巧
  2. 【增大C盘内存——拓展卷】C盘与未分配空间之间有恢复分区的解决方法——安装diskgenius
  3. 平移、旋转和缩放矩阵推导
  4. 35.前端笔记-CSS3-3D转换
  5. qq,tim聊天的人太多,如何找到和自己聊过天的网友
  6. 非参数统计:两样本和多样本的Brown-Mood中位数检验;Wilcoxon(Mann-Whitney)秩和检验及有关置信区间;Kruskal-Wallis秩和检验
  7. 每日新闻 | 人造肉销售火爆全食超市CEO却吐槽:不健康
  8. 第12课 Altium Designer20(AD20)+VESC6.4实战教程:原理图最后验证(北冥有鱼)
  9. USB接口那么多!!你都认识吗??知道他们的区别吗??
  10. NLP文本生成的评价指标有什么?