先上图:

研一寒假导师要求我们几个把这两幅动图用程序画出来,当时搜遍了网上没找到源代码,甚至还去推特问了图的原作者,后来没登录过,不知道回没回我哈哈。下面是两幅图的代码,可能不是非常还原。然后可以自己改一下线条颜色注释等等,用matplotlib画图的基础知识可以看一下B站莫烦的视频,链接在这儿:【莫烦Python】Matplotlib Python 画图教程_哔哩哔哩_bilibili如果某天你发现自己要学习 Matplotlib, 很可能是因为:Matplotlib 是一个非常强大的 Python 画图工具;手中有很多数据, 可是不知道该怎么呈现这些数据.Code: https://github.com/MorvanZhou/tutorials/tree/master/matplotlibTUT莫烦Python: https://mofanpy.com支持莫烦: https:https://www.bilibili.com/video/BV1Jx411L7LU?spm_id_from=333.337.search-card.all.click

还有参考了几份博文的资料,也写得非常不错,放在这儿:《机器学习十讲》第七讲 - 公鸡不下蛋 - 博客园源地址(相关案例在视频下方): http://cookdata.cn/auditorium/course_room/10018/ 《机器学习十讲》——第七讲(最优化) 机器学习的优化目标 最小化损失函https://www.cnblogs.com/lihaodeworld/p/14480929.htmlhttp://louistiao.me/notes/visualizing-and-animating-optimization-algorithms-with-matplotlib/#3D-Surface-Plothttp://louistiao.me/notes/visualizing-and-animating-optimization-algorithms-with-matplotlib/#3D-Surface-Plot

下面是代码:首先是最小值的

import matplotlib.pyplot as plt
import autograd.numpy as npfrom mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import LogNorm
from matplotlib import animation
from IPython.display import HTMLfrom autograd import elementwise_grad, value_and_grad, grad
from scipy.optimize import minimize
from collections import defaultdict
from itertools import zip_longest
from functools import partialf = lambda x, y: (1.5 - x + x * y) ** 2 + (2.25 - x + x * y ** 2) ** 2 + (2.625 - x + x * y ** 3) ** 2
xmin, xmax, xstep = -4.5, 4.5, .01
ymin, ymax, ystep = -4.5, 4.5, .01x, y = np.meshgrid(np.arange(xmin, xmax + xstep, xstep), np.arange(ymin, ymax + ystep, ystep))
z = f(x, y)minima = np.array([2.95, .5])
minima_ = minima.reshape(-1, 1)x0 = np.array([1., 1.5])def make_minimize_cb(path=[]):def minimize_cb(xk):path.append(np.copy(xk))return minimize_cbclass TrajectoryAnimation(animation.FuncAnimation):def __init__(self, *paths, labels=[], fig=None, ax=None, frames=None, interval=60, repeat_delay=5, blit=True,**kwargs):if fig is None:if ax is None:fig, ax = plt.subplots()else:fig = ax.get_figure()else:if ax is None:ax = fig.gca()self.fig = figself.ax = axself.paths = pathsif frames is None:frames = max(path.shape[1] for path in paths)self.lines = [ax.plot([], [], label=label, lw=2)[0]for _, label in zip_longest(paths, labels)]self.points = [ax.plot([], [], 'o', color=line.get_color())[0]for line in self.lines]super(TrajectoryAnimation, self).__init__(fig, self.animate, init_func=self.init_anim,frames=frames, interval=interval, blit=blit,repeat_delay=repeat_delay, **kwargs)def init_anim(self):for line, point in zip(self.lines, self.points):line.set_data([], [])point.set_data([], [])x = self.lines + self.pointsreturn xdef animate(self, i):for line, point, path in zip(self.lines, self.points, self.paths):line.set_data(*path[::, :i])point.set_data(*path[::, i - 1:i])x = self.lines + self.pointsreturn xmethods = ["SGD","Momentum","NAG","Adagrad","Adadelta","Rmsprop","Adam"
]def SGDUpdate(function, x0, y0, learning_rate, num_steps):allX = [x0]allY = [y0]x = x0y = y0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x = x - dz_dx * learning_ratey = y - dz_dy * learning_rateallX.append(x)allY.append(y)return np.array([allX, allY])def MomentumUpdate(function, x0, y0, learning_rate, num_steps, momentum=0.9):allX = [x0]allY = [y0]x = x0y = y0x_v = 0y_v = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_v = (momentum * x_v) - (dz_dx * learning_rate)y_v = (momentum * y_v) - (dz_dy * learning_rate)x = x + x_vy = y + y_vallX.append(x)allY.append(y)return np.array([allX, allY])def NAGUpdate(function, x0, y0, learning_rate, num_steps, momentum=0.9):allX = [x0]allY = [y0]x = x0y = y0x_v = 0x_v_prev = 0y_v = 0y_v_prev = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_v_prev = x_vx_v = (momentum * x_v) - (dz_dx * learning_rate)x = x - momentum * x_v_prev + (1 + momentum) * x_vy_v_prev = y_vy_v = (momentum * y_v) - (dz_dy * learning_rate)y = y - momentum * y_v_prev + (1 + momentum) * y_vallX.append(x)allY.append(y)return np.array([allX, allY])def AdagradUpdate(function, x0, y0, learning_rate, num_steps):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = x_cache + dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = y_cache + dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def AdadeltaUpdate(function, x0, y0, learning_rate, num_steps, decay_rate=0.9):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = decay_rate * x_cache + (1 - decay_rate) * dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = decay_rate * y_cache + (1 - decay_rate) * dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def RmspropUpdate(function, x0, y0, learning_rate, num_steps, decay_rate=0.9):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = decay_rate * x_cache + (1 - decay_rate) * dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = decay_rate * y_cache + (1 - decay_rate) * dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def AdamUpdate(function, x0, y0, learning_rate, num_steps, beta1, beta2):allX = [x0]allY = [y0]x = x0y = y0m_x = 0m_y = 0x_v = 0y_v = 0for step in range(1, num_steps + 1):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)m_x = beta1 * m_x + (1 - beta1) * dz_dxm_step_x = m_x / (1 - beta1 ** step)x_v = beta2 * x_v + (1 - beta2) * (dz_dx ** 2)v_step_x = x_v / (1 - beta2 ** step)x = x - learning_rate * m_step_x / (np.sqrt(v_step_x) + 1e-8)m_y = beta1 * m_y + (1 - beta1) * dz_dym_step_y = m_y / (1 - beta1 ** step)y_v = beta2 * y_v + (1 - beta2) * (dz_dy ** 2)v_step_y = y_v / (1 - beta2 ** step)y = y - learning_rate * m_step_y / (np.sqrt(v_step_y) + 1e-8)allX.append(x)allY.append(y)return np.array([allX, allY])learning_rate = 0.003
num_steps=100
SGDPath = SGDUpdate(f, x0[0], x0[1], learning_rate, num_steps)
MomentumPath = MomentumUpdate(f, x0[0], x0[1], learning_rate, num_steps)
NAGPath = NAGUpdate(f, x0[0], x0[1], learning_rate, num_steps)AdagradPath = AdagradUpdate(f, x0[0], x0[1], 0.5, num_steps)
AdadeltaPath = AdadeltaUpdate(f, x0[0], x0[1], 0.1, num_steps=70)
RmspropPath = RmspropUpdate(f, x0[0], x0[1], 0.12, num_steps=80)
AdamPath = AdamUpdate(f, x0[0], x0[1], 0.1, num_steps, 0.9, 0.999)paths = [SGDPath, MomentumPath, NAGPath, AdagradPath, AdadeltaPath,RmspropPath, AdamPath]fig, ax = plt.subplots(figsize=(10, 6))ax.contour(x, y, z, levels=np.logspace(0, 5, 35), norm=LogNorm(), cmap=plt.cm.jet)
ax.plot(*minima_, 'r*', markersize=24)ax.set_xlabel('$x$')
ax.set_ylabel('$y$')ax.set_xlim((xmin, xmax))
ax.set_ylim((ymin, ymax))anim = TrajectoryAnimation(*paths, labels=methods, ax=ax)ax.legend(loc='upper right')
plt.show()

下面是鞍点的:

#先引入算法相关的包,matplotlib用于绘图
# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.autograd import Variable
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation
from IPython.display import HTMLfrom autograd import elementwise_grad, value_and_grad,grad
from scipy.optimize import minimize
from scipy import optimize
from collections import defaultdict
from itertools import zip_longest
plt.rcParams['axes.unicode_minus']=False  # 用来正常显示负号
#使用python的匿名函数定义目标函数
f = lambda x,y : -x**2 + 2.5*y**2 #函数定义
f_grad = value_and_grad(lambda args : f(*args)) #函数梯度#绘制函数曲面
##先借助np.meshgrid生成网格点坐标矩阵。两个维度上每个维度显示范围为-5到5。对应网格点的函数值保存在z中
x,y = np.meshgrid(np.linspace(-5.0,5.0,50), np.linspace(-5.0,5.0,50))
z = f( x,y )
minima = np.array([4, 0])
minima_ = minima.reshape(-1, 1)
##plot_surface函数绘制3D曲面
x0 = np.array([0.0000001, 2.])def make_minimize_cb(path=[]):def minimize_cb(xk):path.append(np.copy(xk))return minimize_cbclass TrajectoryAnimation3D(animation.FuncAnimation):def __init__(self, *paths, zpaths, labels=[], fig=None, ax=None, frames=None, interval=60, repeat_delay=5, blit=True, **kwargs):if fig is None:if ax is None:fig, ax = plt.subplots()else:fig = ax.get_figure()else:if ax is None:ax = fig.gca()self.fig = figself.ax = axself.paths = pathsself.zpaths = zpathsif frames is None:frames = max(path.shape[1] for path in paths)self.lines = [ax.plot([], [], [], label=label, lw=2)[0] for _, label in zip_longest(paths, labels)]super(TrajectoryAnimation3D, self).__init__(fig, self.animate, init_func=self.init_anim,frames=frames, interval=interval, blit=blit,repeat_delay=repeat_delay, **kwargs)def init_anim(self):for line in self.lines:line.set_data([], [])line.set_3d_properties([])return self.linesdef animate(self, i):for line, path, zpath in zip(self.lines, self.paths, self.zpaths):line.set_data(*path[::,:i])line.set_3d_properties(zpath[:i])return self.linesmethods = ["SGD","Momentum","NAG","Adagrad","Adadelta","Rmsprop","Adam"
]def SGDUpdate(function, x0, y0, learning_rate, num_steps):allX = [x0]allY = [y0]x = x0y = y0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x = x - dz_dx * learning_ratey = y - dz_dy * learning_rateallX.append(x)allY.append(y)return np.array([allX, allY])def MomentumUpdate(function, x0, y0, learning_rate, num_steps, momentum=0.9):allX = [x0]allY = [y0]x = x0y = y0x_v = 0y_v = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_v = (momentum * x_v) - (dz_dx * learning_rate)y_v = (momentum * y_v) - (dz_dy * learning_rate)x = x + x_vy = y + y_vallX.append(x)allY.append(y)return np.array([allX, allY])def NAGUpdate(function, x0, y0, learning_rate, num_steps, momentum=0.9):allX = [x0]allY = [y0]x = x0y = y0x_v = 0x_v_prev = 0y_v = 0y_v_prev = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_v_prev = x_vx_v = (momentum * x_v) - (dz_dx * learning_rate)x = x - momentum * x_v_prev + (1 + momentum) * x_vy_v_prev = y_vy_v = (momentum * y_v) - (dz_dy * learning_rate)y = y - momentum * y_v_prev + (1 + momentum) * y_vallX.append(x)allY.append(y)return np.array([allX, allY])def AdagradUpdate(function, x0, y0, learning_rate, num_steps):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = x_cache + dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = y_cache + dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def AdadeltaUpdate(function, x0, y0, learning_rate, num_steps, decay_rate=0.9):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = decay_rate * x_cache + (1 - decay_rate) * dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = decay_rate * y_cache + (1 - decay_rate) * dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def RmspropUpdate(function, x0, y0, learning_rate, num_steps, decay_rate=0.9):allX = [x0]allY = [y0]x = x0y = y0x_cache = 0y_cache = 0for _ in range(num_steps):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)x_cache = decay_rate * x_cache + (1 - decay_rate) * dz_dx ** 2x = x - learning_rate * dz_dx / (np.sqrt(x_cache) + 1e-7)y_cache = decay_rate * y_cache + (1 - decay_rate) * dz_dy ** 2y = y - learning_rate * dz_dy / (np.sqrt(y_cache) + 1e-7)allX.append(x)allY.append(y)return np.array([allX, allY])def AdamUpdate(function, x0, y0, learning_rate, num_steps, beta1, beta2):allX = [x0]allY = [y0]x = x0y = y0m_x = 0m_y = 0x_v = 0y_v = 0for step in range(1, num_steps + 1):dz_dx = grad(function, argnum=0)(x, y)dz_dy = grad(function, argnum=1)(x, y)m_x = beta1 * m_x + (1 - beta1) * dz_dxm_step_x = m_x / (1 - beta1 ** step)x_v = beta2 * x_v + (1 - beta2) * (dz_dx ** 2)v_step_x = x_v / (1 - beta2 ** step)x = x - learning_rate * m_step_x / (np.sqrt(v_step_x) + 1e-8)m_y = beta1 * m_y + (1 - beta1) * dz_dym_step_y = m_y / (1 - beta1 ** step)y_v = beta2 * y_v + (1 - beta2) * (dz_dy ** 2)v_step_y = y_v / (1 - beta2 ** step)y = y - learning_rate * m_step_y / (np.sqrt(v_step_y) + 1e-8)allX.append(x)allY.append(y)return np.array([allX, allY])learning_rate = 0.01
num_steps=200
SGDPath = SGDUpdate(f, x0[0], x0[1], learning_rate, num_steps)
MomentumPath = MomentumUpdate(f, x0[0], x0[1], 0.05, num_steps)
NAGPath = NAGUpdate(f, x0[0], x0[1], 0.1, num_steps)AdagradPath = AdagradUpdate(f, x0[0], x0[1], 0.5, num_steps)
AdadeltaPath = AdadeltaUpdate(f, x0[0], x0[1], 0.1, num_steps=70)
RmspropPath = RmspropUpdate(f, x0[0], x0[1], 0.1, num_steps=80)
AdamPath = AdamUpdate(f, x0[0], x0[1], 0.1, num_steps, 0.9, 0.999)paths = [SGDPath, MomentumPath, NAGPath, AdagradPath, AdadeltaPath,RmspropPath, AdamPath]
zpaths = [f(*path) for path in paths]
fig = plt.figure(figsize=(8, 8))
ax = plt.axes(projection='3d', elev=50, azim=-50)ax.plot_surface(x,y, z, alpha=.7, cmap='coolwarm')
ax.plot([minima[0]],[minima[1]],[f(*minima)], 'b*', markersize=6)ax.set_xlabel('$x1$')
ax.set_ylabel('$y$')
ax.set_zlabel('$f$')ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))anim = TrajectoryAnimation3D(*paths, zpaths=zpaths, labels=methods, ax=ax)ax.legend(loc='upper left')plt.show()

注:运行在pycharm-pytorch3.9,如果出现No module named XXXX'的问题,可以左上角

几种梯度下降法比较最小值与鞍点(SGD、Momentum、NAG、Adagrad、Adadelta与Rmsprop)相关推荐

  1. 梯度下降法局部最优解和鞍点的问题

    陷入局部最优并不是神经网络的问题,在一个高维空间中做梯度下降,很难收敛到局部最优,因为局部最小值要求函数在所有维度上都是局部最小值.若一个维度收敛到局部最小值的概率是0.5,则n维度收敛到局部最小值的 ...

  2. 机器学习中的常见问题——几种梯度下降法

    一.梯度下降法 在机器学习算法中,对于很多监督学习模型,需要对原始的模型构建损失函数 l l l,接下来便是通过优化算法对损失函数 l l l进行优化,以便寻找到最优的参数 θ \theta θ.在求 ...

  3. 三种梯度下降法对比(Batch gradient descent、Mini-batch gradient descent 和 stochastic gradient descent)

    梯度下降(GD)是最小化风险函数.损失函数的一种常用方法,随机梯度下降(stochastic gradient descent).批量梯度下降(Batch gradient descent)和mini ...

  4. [数值计算-10]:一元非线性函数求最小值 - 导数与梯度下降法Python法代码示例

    作者主页(文火冰糖的硅基工坊):https://blog.csdn.net/HiWangWenBing 本文网址:https://blog.csdn.net/HiWangWenBing/article ...

  5. 梯度下降法的三种形式-BGD、SGD、MBGD

    在应用机器学习算法时,我们通常采用梯度下降法来对采用的算法进行训练.其实,常用的梯度下降法还具体包含有三种不同的形式,它们也各自有着不同的优缺点. 下面我们以线性回归算法来对三种梯度下降法进行比较. ...

  6. 详解梯度下降法的三种形式BGD、SGD以及MBGD

    在应用机器学习算法时,我们通常采用梯度下降法来对采用的算法进行训练.其实,常用的梯度下降法还具体包含有三种不同的形式,它们也各自有着不同的优缺点. 下面我们以线性回归算法来对三种梯度下降法进行比较. ...

  7. [Machine Learning] 梯度下降法的三种形式BGD、SGD以及MBGD

    来源:信息网络工程研究中心本文约1100字,建议阅读5分钟 本文为你介绍常用的梯度下降法还具体包含有三种不同的形式. 1. 批量梯度下降法BGD 2. 随机梯度下降法SGD 3. 小批量梯度下降法MB ...

  8. 梯度下降法的三种形式BGD、SGD以及MBGD

    http://www.cnblogs.com/maybe2030/p/5089753.html 在应用机器学习算法时,我们通常采用梯度下降法来对采用的算法进行训练.其实,常用的梯度下降法还具体包含有三 ...

  9. 梯度下降法的三种形式批量梯度下降法、随机梯度下降以及小批量梯度下降法

    梯度下降法的三种形式BGD.SGD以及MBGD 梯度下降法的三种形式BGD.SGD以及MBGD 阅读目录 1. 批量梯度下降法BGD 2. 随机梯度下降法SGD 3. 小批量梯度下降法MBGD 4. ...

最新文章

  1. Entity Framework CodeFirst尝试
  2. linux 提取某一行内容
  3. Qt编写OpenMP程序--HelloWorld
  4. windows编写linux脚本,Windows PowerShell:共享您的脚本 - 在脚本中编写 Cmdlet | Microsoft Docs...
  5. MapXtreme 2005 学习心得 缩放比例下不显示图层(十一)
  6. Mybatis开发CRUD
  7. 走向REST:在Spring和JAX-RS(Apache CXF)中嵌入Jetty
  8. UI实用素材|电子商务界面模板
  9. 1,使用three20来创建tableview
  10. Bailian2952 循环数【数学】
  11. Json 文件中出现 Expected value at 1:0 问题的解决
  12. Excel公式笔记(COUNTIFS)
  13. NTFS下的USN日志文件研究
  14. CircRNA–miRNA–mRNA调控网络分析
  15. 方差、标准差、平方差、残差
  16. 刘海洋《LaTex入门》学习笔记6
  17. 区块链仿真工具SimBlock
  18. 我大学大三用一个月时间学完就找到实习的一套Java教程
  19. 阿里云服务器ECS通用型g5和ECS通用型g6实例区别在哪?如何选择?
  20. ubuntu 安裝 jdk 6 遇到的問題

热门文章

  1. 【附源码】计算机毕业设计SSM网上书店管理系统
  2. 如何解决eNSP启动AR/WLAN设备失败问题
  3. 当年高考,小灰是怎么混过来的?
  4. 《那些年我们追过的Wrox精品红皮计算机图书》有奖活动
  5. 数字图像处理 --- 图像的傅里叶变换的频谱特征 二(方向性)
  6. 啊哈C——学习3.7一起来找茬
  7. PVsyst光伏组件辐照度算法
  8. Free SQL Server tools that might make your life a little easier
  9. 射影几何----著名的帕斯卡定理的证明
  10. 万豪国际集团旗下24家餐厅入围2022黑珍珠餐厅指南