6-7 scikit-learn中的随机梯度下降法

封装我们自己的SGD

真实使用我们自己的SGD

scikit-learn中的SGD

6-8 如何确定梯度计算的准确性调试梯度下降法

6-9 有关梯度下降法的更多深入讨论

随机的意义

6-7 scikit-learn中的随机梯度下降法

封装我们自己的SGD

所有的样本看一边，无法保证有的样本不能使用，将其索引打乱，全

三分之一的够用，两倍的样本量肯定够用了

LinearRegression.py

import numpy as np
from .metrics import r2_scoreclass LinearRegression:def __init__(self):"""初始化Linear Regression模型"""self.coef_ = Noneself.intercept_ = Noneself._theta = Nonedef fit_normal(self, X_train, y_train):"""根据训练数据集X_train, y_train训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"X_b = np.hstack([np.ones((len(X_train), 1)), X_train])self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef fit_bgd(self, X_train, y_train, eta=0.01, n_iters=1e4):"""根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"def J(theta, X_b, y):try:return np.sum((y - X_b.dot(theta)) ** 2) / len(y)except:return float('inf')def dJ(theta, X_b, y):return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(y)def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return thetaX_b = np.hstack([np.ones((len(X_train), 1)), X_train])initial_theta = np.zeros(X_b.shape[1])self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef fit_sgd(self, X_train, y_train, n_iters=50, t0=5, t1=50):"""根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""assert X_train.shape[0] == y_train.shape[0], \"the size of X_train must be equal to the size of y_train"assert n_iters >= 1def dJ_sgd(theta, X_b_i, y_i):return X_b_i * (X_b_i.dot(theta) - y_i) * 2.def sgd(X_b, y, initial_theta, n_iters=5, t0=5, t1=50):def learning_rate(t):return t0 / (t + t1)theta = initial_thetam = len(X_b)for i_iter in range(n_iters):indexes = np.random.permutation(m)X_b_new = X_b[indexes,:]y_new = y[indexes]for i in range(m):gradient = dJ_sgd(theta, X_b_new[i], y_new[i])theta = theta - learning_rate(i_iter * m + i) * gradientreturn thetaX_b = np.hstack([np.ones((len(X_train), 1)), X_train])initial_theta = np.random.randn(X_b.shape[1])self._theta = sgd(X_b, y_train, initial_theta, n_iters, t0, t1)self.intercept_ = self._theta[0]self.coef_ = self._theta[1:]return selfdef predict(self, X_predict):"""给定待预测数据集X_predict，返回表示X_predict的结果向量"""assert self.intercept_ is not None and self.coef_ is not None, \"must fit before predict!"assert X_predict.shape[1] == len(self.coef_), \"the feature number of X_predict must be equal to X_train"X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])return X_b.dot(self._theta)def score(self, X_test, y_test):"""根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""y_predict = self.predict(X_test)return r2_score(y_test, y_predict)def __repr__(self):return "LinearRegression()"

真实使用我们自己的SGD

model_selection.py

import numpy as npdef train_test_split(X, y, test_ratio=0.2, seed=None):"""将数据 X 和 y 按照test_ratio分割成X_train, X_test, y_train, y_test"""assert X.shape[0] == y.shape[0], \"the size of X must be equal to the size of y"assert 0.0 <= test_ratio <= 1.0, \"test_ration must be valid"if seed:np.random.seed(seed)shuffled_indexes = np.random.permutation(len(X))test_size = int(len(X) * test_ratio)test_indexes = shuffled_indexes[:test_size]train_indexes = shuffled_indexes[test_size:]X_train = X[train_indexes]y_train = y[train_indexes]X_test = X[test_indexes]y_test = y[test_indexes]return X_train, X_test, y_train, y_test

sklearn中的算法更优

scikit-learn中的SGD

6-8 如何确定梯度计算的准确性调试梯度下降法

近似计算某点的导数

这样做更容易理解，但其时间复杂度比之前的高

def dJ_debug(theta, X_b, y, epsilon=0.01):res = np.empty(len(theta))for i in range(len(theta)):theta_1 = theta.copy()theta_1[i] += epsilontheta_2 = theta.copy()theta_2[i] -= epsilonres[i] = (J(theta_1, X_b, y) - J(theta_2, X_b, y)) / (2 * epsilon)return res

def gradient_descent(dJ, X_b, y, initial_theta, eta, n_iters = 1e4, epsilon=1e-8):theta = initial_thetacur_iter = 0while cur_iter < n_iters:gradient = dJ(theta, X_b, y)last_theta = thetatheta = theta - eta * gradientif(abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):breakcur_iter += 1return theta

X_b = np.hstack([np.ones((len(X), 1)), X])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01%time theta = gradient_descent(dJ_debug, X_b, y, initial_theta, eta)
theta

dJ-debug有作用但速度很慢

dJ-debug的算法与J无法，其它函数都可以用，dJ-math只适用于这一个问题

6-9 有关梯度下降法的更多深入讨论

批量，每次看所有的样本

随机每次只看一个

批量慢，在下降最快的方向随机快但不稳定

将两种方法结合，一次看k个（10或20个）比只看一个的稳定，同时又快

随机的意义

求最小值则加负号，如果求最大值则用正号

第六章线性回归学习笔记下相关推荐

第六章 Cesium学习入门之添加Geojson数据（dataSource）
从0开始的Cesium 第一章 Cesium学习入门之搭建Vite+Vue3+Cesium开发环境第二章 Cesium学习入门之搭建Cesium界面预览和小控件隐藏第三章 Cesium学习入门之地 ...
[翻译] 神经网络与深度学习第六章深度学习 - Chapter 6 Deep learning
目录: 首页译序关于本书关于习题和难题第一章利用神经网络识别手写数字第二章反向传播算法是如何工作的第三章提升神经网络学习的效果第四章可视化地证明神经网络可以计算任何函数第五章 ...
强化学习（RLAI）读书笔记第六章差分学习（TD-learning）
第六章:Temporal-Difference Learning TD-learning算法是强化学习中一个独具特色而又核心的想法,结合了蒙特卡洛算法和动态规划的想法.和MC一样不需要环境模型直接从s ...
第六章图学习小结
第六章知识点总结图是由一个顶点集V和一个边集E构成的数据结构. 图的基于邻接矩阵的结构定义 1 //用两个数组分别存储顶点表和邻接矩阵 2 const int MVNum = 100; ...
世外桃源六python_求活在金朝末年_第六章：世外桃源-笔趣阁
第六章:世外桃源从谈话中,陈宪了解到,这个女人姓杨,娘家是沂源东庄子的大户,遭劫的卢家庄是他的夫家,他的丈夫就是卢家庄的庄主,两个孩子也都是卢家庄主的骨肉. 庄子被破的时候,卢庄主将他们母子三人藏在 ...
计算机网络（BYSEE）第六章应用层学习笔记（0612）
第六章应用层每个应用层协议都是为了解决某一类应用问题,通过位于不同主机中的多个应用进程之间的通信和协同工作来完成. 应用层的具体内容就是规定应用进程在通信时所遵循的协议应用层的许多协议都是基于客 ...
《游戏设计艺术（第二版）》第六章个人学习
目录第六章元素撑起主题微不足道的游戏统一主题 11号透镜:统一共鸣 12号透镜:共鸣回归现实第六章元素撑起主题 "一部伟大的著作必然有一个伟大的主题."--赫尔曼· ...
ESL第十六章集成学习汉明码、前向分段线性回归/前向逐步回归/Bet on Sparsity原则/噪信比/正则化路径/标准化L1间隔、学习集成/数值积分/集成生成/重要性采样/规则集成
(这一章的内容感觉很琐碎,很难提出主线.总的来说,应该还是正则路径和重要性采样的学习集成两块内容) 目录 16.1 导言 16.2 提升和正则化路径Regularization Path 16.2.1 ...
第六章深度学习（中下）
卷积网络的代码好了,现在来看看我们的卷积网络代码,network3.py.整体看来,程序结构类似于 network2.py,尽管细节有差异,因为我们使用了 Theano.首先我们来看 FullyCo ...
第六章深度学习（上中）
其他的深度学习模型在整本书中,我们聚焦在解决 MNIST 数字分类问题上.这一"下金蛋的"问题让我们深入理解了一些强大的想法:随机梯度下降,BP,卷积网络,正规化等等.但是该问题 ...

第六章线性回归学习笔记下

6-7 scikit-learn中的随机梯度下降法

封装我们自己的SGD

真实使用我们自己的SGD

scikit-learn中的SGD

6-8 如何确定梯度计算的准确性调试梯度下降法

6-9 有关梯度下降法的更多深入讨论

随机的意义

第六章线性回归学习笔记下相关推荐

最新文章

热门文章

第六章 线性回归 学习笔记下

6-7 scikit-learn中的随机梯度下降法

封装我们自己的SGD

真实使用我们自己的SGD

scikit-learn中的SGD

6-8 如何确定梯度计算的准确性 调试梯度下降法

6-9 有关梯度下降法的更多深入讨论

随机的意义

第六章 线性回归 学习笔记下相关推荐

最新文章

热门文章

第六章线性回归学习笔记下

6-8 如何确定梯度计算的准确性调试梯度下降法

第六章线性回归学习笔记下相关推荐