Gradient Descent for one-hidden-layer-function(单隐藏层神经网络的梯度下降)

  • Problem description
  • Answers to questions

Problem description

This second computer programming assignment is to solve Computer programming of one-hidden-layer neural network with one-dimensional input and one-dimensional output and m=1 or 2 nodes in the hidden layer:

to fit the Runge function on the given eleven grid points

where

  • set σ(x)=11+e−x\sigma(x)=\frac{1}{1+e^{-x}}σ(x)=1+e−x1​ if m=2 is used
  • set σ(x)=e−x2\sigma(x)=e^{-x^2}σ(x)=e−x2 if m=1 is used

Note the variable to minimize is ci,wi,bic_i,\:w_i,\:b_ici​,wi​,bi​

Apply the gradient descent to this example and for fixed step size, test the convergence rate of your gradient descent and identify the constant γ\gammaγ in the Convergence Theorem 2 for Gradient Descent.

  • Is this a convex optimization problem? Why?
  • The plots of the objective function and its gradient magnitude
  • What is the convergence rate? Verify it by numerical examples.
  • What happens if you use different initial points?

Answers to questions

(1) This is not a convex optimization problem. We can fix two of the three variables w, b, c to see if this is a convex function by looking at the direction of the third variable. For m=2,the following figure 1-3 is the three-dimensional function diagram of fixed b, c direction, observation w direction, fixed w, c direction, observation b direction fixed w,b direction and observation c direction.

Figure1Figure 1Figure1


Figure2Figure 2Figure2

Figure3Figure 3Figure3

We can see that graphs 1 and 2 are nonconvex,while graph 3 may be convex,but convex functions require all three directions to be convex,so it is a nonconvex function.
For m=1,we still adopt the method of fixing two directions to observe the third direction.Figure 4-6 shows that fixed b and c observe w direction,fixed w,c observe b direction,fixed w,b observe c direction.

Figure4Figure5Figure6\qquad\qquad Figure 4 \qquad \qquad Figure 5 \qquad \qquad \qquad Figure 6Figure4Figure5Figure6
We can also see that the function is nonconvex in the w and b directions and convex in the c direction, but in sum, it is still a nonconvex function. So it is a nonconvex problem for both m=1 and m=2.
(2)
Figure 7-8 shows the change of the objective function and gradient when m=2. I got a step size of 0.05 and went through 10000 iterations.

Figure7Figure8\qquad \qquad Figure 7 \qquad\qquad\qquad Figure 8Figure7Figure8
If the step size is changed to 0.005, the objective function and gradient will change as shown in the figure below:

Figure9Figure10\qquad \qquad Figure 9 \qquad \qquad\qquad Figure 10Figure9Figure10

If the step size is changed to 0.0005, the objective function and gradient will change as shown in the figure below:

Figure11Figure12\qquad \qquad Figure 11 \qquad \qquad\qquad Figure 12Figure11Figure12

From the above picture analysis,we can kown the best step size may be around 0.05.
Figure 13-14 shows the change of objective function and gradient when m=1. I got a step size of 0.05 and went through 10000 iterations.

Figure13Figure14\qquad \qquad Figure 13 \qquad \qquad\qquad Figure 14Figure13Figure14
If the step size is changed to 0.005, the objective function and gradient will change as shown in the figure below:

Figure15Figure16\qquad \qquad Figure 15 \qquad \qquad\qquad Figure 16Figure15Figure16
If the step size is changed to 0.0005, the objective function and gradient will change as shown in the figure below:

Figure17Figure18\qquad \qquad Figure 17 \qquad \qquad\qquad Figure 18Figure17Figure18
From the above picture analysis,we can kown the best step size may be around 0.0005.
(3)

For m=2,γ is calculated by fitting logerrorklogerror_klogerrork​ to k. The figure below shows this value.
Figure 19-21 are graphs of the number of iterations with respect to error,1k\frac{1}{k}k1​ rate,γkγ^kγk rate, respectively,we can known the convergence rate is 1k\frac{1}{k}k1​ and ganna=0.999998

Figure19Figure20Figure21\qquad \qquad Figure 19 \qquad \qquad Figure 20\qquad \qquad Figure 21Figure19Figure20Figure21

For m=1,γ is calculated by fitting logerrorklogerror_klogerrork​ to k. The figure below shows this value.
Figure 22-24 are graphs of the number of iterations with respect to error,1k\frac{1}{k}k1​ rate, γkγ^kγk rate, respectively
we can known the convergence rate is 1k\frac{1}{k}k1​ and gamma=0.999679

Figure22Figure23Figure24\qquad \qquad Figure 22 \qquad \qquad Figure 23\qquad \qquad Figure 24Figure22Figure23Figure24

(4)
Using different initial points has a great influence on gradient descent. Some initial points make the gradient descent process very slow, and the iteration can not reach convergence or even divergence.The following figure shows the process of gradient descent which cannot converge due to different initial points when m=2 and m=1.

Figure25Figure26\qquad Figure25 \qquad \qquad\qquad Figure26Figure25Figure26

Figure27Figure28\qquad Figure27 \qquad \qquad\qquad Figure28Figure27Figure28
If the initial point is different the local minimum points may also be different.For m=2 The step size is 0.05, and the initial points are different. It can be seen that figure 33 reaches the minimum value faster than figure 29 and figure 3, so it can be inferred that different local minimum values are reached

Figure29Figure30\qquad Figure29 \qquad \qquad\qquad Figure30Figure29Figure30

Figure31Figure32\qquad Figure31 \qquad \qquad\qquad Figure32Figure31Figure32

Figure33Figure34\qquad Figure33 \qquad \qquad\qquad Figure34Figure33Figure34

Gradient Descent for one-hidden-layer-function(单隐藏层神经网络的梯度下降)相关推荐

  1. 使用单隐藏层神经网络对平面数据分类

    引言 为了巩固下吴恩达深度学习--浅层神经网络中的理论知识,我们来实现一个使用单隐藏层神经网络对平面数据进行分类的例子. 关于本文代码中的公式推导可见吴恩达深度学习--浅层神经网络. 这是吴恩达深度学 ...

  2. 手写 单隐藏层神经网络_反向传播(Matlab实现)

    文章目录 要点 待优化 效果 代码 mian train_neural_net 待优化(1)已完成 要点 1.sigmoid函数做为激活函数,二分类交叉熵函数做损失函数 2.可以同时对整个训练集进行训 ...

  3. 手写单隐层神经网络_鸢尾花分类(matlab实现)

    文章目录 思路 留言 效果 代码 mian split training testing 思路 sigmoid函数做激活函数 二分类交叉熵做损失函数 mian函数:数据预处理 split函数:数据集分 ...

  4. 神经网络与深度学习三:编写单隐层神经网络

    三:编写单隐层神经网络 1 神经网络概述 这篇文章你会学到如何实现一个神经网络,在我们深入学习技术细节之前,现在先大概快速的了解一下如何实现神经网络,如果你对某些内容不甚理解(后面的文章中会深入其中的 ...

  5. 排序层-深度模型-2015:AutoRec【单隐层神经网络推荐模型】

    AutoRec模型是2015年由澳大利亚国立大学提出的. 它将 自编码器(AutoEncoder ) 的思想和协同过滤结合,提出了一种单隐层神经网络 推荐模型.因其简洁的网络结构和清晰易懂的模型原理, ...

  6. 吴恩达老师深度学习视频课笔记:单隐含层神经网络公式推导及C++实现(二分类)

    关于逻辑回归的公式推导和实现可以参考: http://blog.csdn.net/fengbingchun/article/details/79346691 下面是在逻辑回归的基础上,对单隐含层的神经 ...

  7. 实验二 单隐层神经网络

    一.实验目的 (1)学习并掌握常见的机器学习方法: (2)能够结合所学的python知识实现机器学习算法: (3)能够用所学的机器学习算法解决实际问题. 二.实验内容与要求 (1)掌握神经网络的基本原 ...

  8. pytorch_lesson13.2 模型拟合度概念介绍+模型欠拟合实例+单隐藏层激活函数性能比较+相同激活函数不同隐藏层数结果对比+神经网络结构选择策略

    提示:仅仅是学习记录笔记,搬运了学习课程的ppt内容,本意不是抄袭!望大家不要误解!纯属学习记录笔记!!!!!! 文章目录 前言 一.模型拟合度概念介绍与实验 1.测试集的"不可知" ...

  9. 单隐层神经网络可以拟合任意单值连续函数

    [实例简介] 编写一种简单的神经网络,用于学习,内嵌了400个样本,对于想要学习神经网络的细节很有帮助,没有使用库函数,完全纯代码编写. [实例截图] 文件:590m.com/f/25127180-4 ...

最新文章

  1. sklearn 归一化 和 标准化
  2. 关系数据库的范式和反范式
  3. DPI — Application Assurance — Overview
  4. android读取assets中的html文件,android读取assets文件.htm
  5. 大话数据结构07 :链表栈
  6. HDFS--分布式文件系统
  7. 在VS 2010上搭建Windows Phone 7开发平台
  8. 有谁熟悉WordPress不?
  9. linux 添加隐藏wi-fi,隐藏wifi怎么设置?
  10. java如何对foo bar调用方法_关于java:如何测试工厂方法传递给构造函数的参数?...
  11. spring思想分析
  12. python_列表——元组——字典——集合
  13. mysql-关联查询
  14. 新浪微博OAuth2.0 VS OAuth1.0 主要区别总结
  15. Oracle 常见错误代码处理 1
  16. MediaCodec解码aac
  17. 如何轻松搞定内网摄像头远程运维?EasyNTS上云网关简单三步实现设备公网远程控制、远程配置
  18. 用计算机绘制函数图像,用计算机绘制函数图像
  19. 苹果软件更新在哪里_手机资讯:iPhone 为什么比安卓手机好用iPhone 的独到之处在哪里...
  20. IBM Cloud 2015 - CDN

热门文章

  1. 照片的体积怎么缩小?教你快速缩小图片体积的方法
  2. 因为此版本的应用程序不支持其项目类型(.csproj),若要打开它,请使用支持此类型项目的版本
  3. docker4dotnet #3 在macOS上使用Visual Studio Code和Docker开发asp.net core和mysql应用
  4. 钢铁行业经销商商城系统:完善钢材管控方案,轻松实现控价和防伪
  5. inno setup 卸载程序
  6. 重新理解函数空间(上)
  7. Xml字符串转Map
  8. UED、UCD、UE、UI、交互设计这 5 个名词有哪些区别?
  9. Echart 画图表
  10. Ty p e O R M框架