6.4在线性回归模型中使用梯度下降法04-Implement-Gradient-Descent-in-Linear-Regression

使用梯度下降法训练

封装我们的线性回归算法

LinearRegression.py

6-5 梯度下降的向量化和数据标准化

梯度下降法的向量化

使用梯度下降法

使用梯度下降法前进行数据归一化

梯度下降法的优势

6-6 随机梯度下降法

随机梯度下降法

在线性回归模型中使用梯度下降法04-Implement-Gradient-Descent-in-Linear-Regression

使用梯度下降法训练

def J(theta, X_b, y):try:return np.sum((y - X_b.dot(theta))**2) / len(X_b)except:return float('inf')

def dJ(theta, X_b, y):res = np.empty(len(theta))res[0] = np.sum(X_b.dot(theta) - y)for i in range(1, len(theta)):res[i] = (X_b.dot(theta) - y).dot(X_b[:,i])return res * 2 / len(X_b)

res[0]的计算公式不懂，y,X_b都是矩阵呀？？？？？？？？

def gradient_descent(X_b, y, initial_theta, eta, n_iters = 1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif(abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return theta

X_b = np.hstack([np.ones((len(x), 1)), x.reshape(-1,1)])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01theta = gradient_descent(X_b, y, initial_theta, eta)

封装我们的线性回归算法

LinearRegression.py

import numpy as np
from .metrics import r2_scoreclass LinearRegression:def __init__(self):"""初始化Linear Regression模型"""self.coef_ = Noneself.intercept_ = Noneself._theta = Nonedef fit_normal(self, X_train, y_train):"""根据训练数据集X_train, y_train训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"X_b = np.hstack([np.ones((len(X_train), 1)), X_train])self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4):"""根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"def J(theta, X_b, y):try:return np.sum((y - X_b.dot(theta)) ** 2) / len(y)except:return float('inf')def dJ(theta, X_b, y):res = np.empty(len(theta))res[0] = np.sum(X_b.dot(theta) - y)for i in range(1, len(theta)):res[i] = (X_b.dot(theta) - y).dot(X_b[:, i])return res * 2 / len(X_b)def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return thetaX_b = np.hstack([np.ones((len(X_train), 1)), X_train])initial_theta = np.zeros(X_b.shape[1])self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef predict(self, X_predict):"""给定待预测数据集X_predict，返回表示X_predict的结果向量"""assert self.intercept_ is not None and self.coef_ is not None, \"must fit before predict!"assert X_predict.shape[1] == len(self.coef_), \"the feature number of X_predict must be equal to X_train"X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])return X_b.dot(self._theta)def score(self, X_test, y_test):"""根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""y_predict = self.predict(X_test)return r2_score(y_test, y_predict)def __repr__(self):return "LinearRegression()"

6-5 梯度下降的向量化和数据标准化

numpy中表示不分行和列向量，但上面是1*（n+1）的行向量

但计算时要区分,梯度是列向量，要转置

将求梯度的过程进行了向量化

梯度下降法的向量化

通过正规方程求解

model_selection.py

import numpy as npdef train_test_split(X, y, test_ratio=0.2, seed=None):"""将数据 X 和 y 按照test_ratio分割成X_train, X_test, y_train, y_test"""assert X.shape[0] == y.shape[0], \"the size of X must be equal to the size of y"assert 0.0 <= test_ratio <= 1.0, \"test_ration must be valid"if seed:np.random.seed(seed)shuffled_indexes = np.random.permutation(len(X))test_size = int(len(X) * test_ratio)test_indexes = shuffled_indexes[:test_size]train_indexes = shuffled_indexes[test_size:]X_train = X[train_indexes]y_train = y[train_indexes]X_test = X[test_indexes]y_test = y[test_indexes]return X_train, X_test, y_train, y_test

LinearRegression.py

import numpy as np
from .metrics import r2_scoreclass LinearRegression:def __init__(self):"""初始化Linear Regression模型"""self.coef_ = Noneself.intercept_ = Noneself._theta = Nonedef fit_normal(self, X_train, y_train):"""根据训练数据集X_train, y_train训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"X_b = np.hstack([np.ones((len(X_train), 1)), X_train])self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4):"""根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"def J(theta, X_b, y):try:return np.sum((y - X_b.dot(theta)) ** 2) / len(y)except:return float('inf')def dJ(theta, X_b, y):return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(y)def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return thetaX_b = np.hstack([np.ones((len(X_train), 1)), X_train])initial_theta = np.zeros(X_b.shape[1])self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef predict(self, X_predict):"""给定待预测数据集X_predict，返回表示X_predict的结果向量"""assert self.intercept_ is not None and self.coef_ is not None, \"must fit before predict!"assert X_predict.shape[1] == len(self.coef_), \"the feature number of X_predict must be equal to X_train"X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])return X_b.dot(self._theta)def score(self, X_test, y_test):"""根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""y_predict = self.predict(X_test)return r2_score(y_test, y_predict)def __repr__(self):return "LinearRegression()"

使用梯度下降法

报错了，有警告， overflow

coef是NAN是无穷大

之前的没有除m会出现这样的情况

这个真实的数据集，每一个特征的其规模不同有的0.1, 有的大于100

手动的给一个eta很小的值

现在没有错，但其结果不好，不是最小值，下降的很慢，也许需要更多的迭代次数才能有好的结果，n_iters = 1e6，这么多可能比较耗时，记时一下

0.754,可以需要更多的循环次数，但太耗时，所以其问题是其特征不在一个规模上，需要数据规一化处理

不在一个维度上其步长或者太大，或者太小

使用梯度下降法前进行数据归一化

梯度下降法的优势

维数增大时，正规方程处理的矩阵耗时多，

样本数小于特征数，要让每一个样本都参与计算，这使得计算比较慢，有一个改进的方案即随机梯度下降法

6-6 随机梯度下降法

上面的所有样本都计算，所以称批量的，但样本太大时太耗时

随机一个，指搜索的方向，xb是一行，任意的取一个i值

批量的方向固定，一直向前

随机不能保证下降最快或在下降的方向，有一定的不可欲知性，但实验证明

如果样本太大时愿意用精度换时间

随机时其学习率的选择就非常重要，学习率前面大后面小，最简单的方法就是循环次数的倒数

其问题是循环次数太小时，其变化会非常快，前后其下降的比率差别太大

可改进为，同时分子为1有时也效果不太好，所以公式优化为

其逐渐下降的过程与模拟退火的思想一致

为了体现随机的优势其样本比较大

先使用正规方程的思路求解

def J(theta, X_b, y):try:return np.sum((y - X_b.dot(theta)) ** 2) / len(y)except:return float('inf')def dJ(theta, X_b, y):return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(y)def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return theta

随机梯度下降法

x_b是某一行，y也是一个值

随机用了样本的三分之一，时间肯定要快

def dJ_sgd(theta, X_b_i, y_i):return 2 * X_b_i.T.dot(X_b_i.dot(theta) - y_i)def sgd(X_b, y, initial_theta, n_iters):t0, t1 = 5, 50def learning_rate(t):return t0 / (t + t1)theta = initial_thetafor cur_iter in range(n_iters):rand_i = np.random.randint(len(X_b))gradient = dJ_sgd(theta, X_b[rand_i], y[rand_i])theta = theta - learning_rate(cur_iter) * gradientreturn theta

第六章线性回归学习笔记中相关推荐

计算机网络（BYSEE）第六章应用层学习笔记（0612）
第六章应用层每个应用层协议都是为了解决某一类应用问题,通过位于不同主机中的多个应用进程之间的通信和协同工作来完成. 应用层的具体内容就是规定应用进程在通信时所遵循的协议应用层的许多协议都是基于客 ...
Lasso线性回归学习笔记（公式与代码实现）
目录 Lasso线性回归学习笔记(公式与代码实现) 1 为什么要在线性回归中引入正则化项(简介) 2 常见正则化项 3 损失函数图像与正则化之后的图像 3.1损失函数图像 3.2 加了 L~1~ 正则 ...
第六章 Cesium学习入门之添加Geojson数据（dataSource）
从0开始的Cesium 第一章 Cesium学习入门之搭建Vite+Vue3+Cesium开发环境第二章 Cesium学习入门之搭建Cesium界面预览和小控件隐藏第三章 Cesium学习入门之地 ...
[翻译] 神经网络与深度学习第六章深度学习 - Chapter 6 Deep learning
目录: 首页译序关于本书关于习题和难题第一章利用神经网络识别手写数字第二章反向传播算法是如何工作的第三章提升神经网络学习的效果第四章可视化地证明神经网络可以计算任何函数第五章 ...
控制系统仿真与CAD-薛定宇-第四章matlab学习笔记
控制系统仿真与CAD-薛定宇-第四章matlab学习笔记 04-02传递函数模型 tfdata() 传递函数属性法 04-07典型系统连接计算 pretty 用法 04-08方框图简化 04-09代数 ...
Elasticsearch7学习笔记(中)
Elasticsearch是实时全文搜索和分析引擎,提供搜集.分析.存储数据三大功能:是一套开放REST和JAVA API等结构提供高效搜索功能,可扩展的分布式系统.它构建于Apache Lucene ...
强化学习（RLAI）读书笔记第六章差分学习（TD-learning）
第六章:Temporal-Difference Learning TD-learning算法是强化学习中一个独具特色而又核心的想法,结合了蒙特卡洛算法和动态规划的想法.和MC一样不需要环境模型直接从s ...
python列表乘数值_《利用Python进行数据分析》十一章· 时间序列·学习笔记(一)...
一.时间序列时间序列(time series)数据是一种重要的结构化数据形式,应用于多个领域,包括金融学.经济学.生态学.神经科学.物理学等.在多个时间点观察或测量到的任何事物都可以形成一段时间序列 ...
第六章、面向对象基础--中（续）构造器、this、包、eclipse的使用
文章目录内容学习目标第六章面向对象基础--中(续) 6.2 构造器(Constructor) 构造器的作用构造方法的定义格式注意事项练习 6.3 this关键字 this的含义 this ...
疯狂python讲义学习笔记——中十章完结
#第十一章 thinker import tkinter as tk print(help(tk.Button.__init__))#以按扭为例查看有什么属性 class myApplication( ...

第六章线性回归学习笔记中

在线性回归模型中使用梯度下降法04-Implement-Gradient-Descent-in-Linear-Regression

使用梯度下降法训练

封装我们的线性回归算法

LinearRegression.py

6-5 梯度下降的向量化和数据标准化

梯度下降法的向量化

使用梯度下降法

使用梯度下降法前进行数据归一化

梯度下降法的优势

6-6 随机梯度下降法

随机梯度下降法

第六章线性回归学习笔记中相关推荐

最新文章

热门文章

第六章 线性回归 学习笔记中

在线性回归模型中使用梯度下降法04-Implement-Gradient-Descent-in-Linear-Regression

使用梯度下降法训练

封装我们的线性回归算法

LinearRegression.py

6-5 梯度下降的向量化和数据标准化

梯度下降法的向量化

使用梯度下降法

使用梯度下降法前进行数据归一化

梯度下降法的优势

6-6 随机梯度下降法

随机梯度下降法

第六章 线性回归 学习笔记中相关推荐

最新文章

热门文章

第六章线性回归学习笔记中

第六章线性回归学习笔记中相关推荐