Problem description

This second computer programming assignment is to solve Computer programming of one-hidden-layer neural network with one-dimensional input and one-dimensional output and m=1 or 2 nodes in the hidden layer:

to fit the Runge function on the given eleven grid points

where

set σ(x)=11+e−x\sigma(x)=\frac{1}{1+e^{-x}}σ(x)=1+e−x1 if m=2 is used
set σ(x)=e−x2\sigma(x)=e^{-x^2}σ(x)=e−x2 if m=1 is used

Note the variable to minimize is ci,wi,bic_i,\:w_i,\:b_ici,wi,bi

Apply the gradient descent to this example and for fixed step size, test the convergence rate of your gradient descent and identify the constant γ\gammaγ in the Convergence Theorem 2 for Gradient Descent.

Is this a convex optimization problem? Why?
The plots of the objective function and its gradient magnitude
What is the convergence rate? Verify it by numerical examples.
What happens if you use different initial points?

Answers to questions

(1) This is not a convex optimization problem. We can fix two of the three variables w, b, c to see if this is a convex function by looking at the direction of the third variable. For m=2,the following figure 1-3 is the three-dimensional function diagram of fixed b, c direction, observation w direction, fixed w, c direction, observation b direction fixed w,b direction and observation c direction.

Figure1Figure 1Figure1

Figure2Figure 2Figure2

Figure3Figure 3Figure3

We can see that graphs 1 and 2 are nonconvex,while graph 3 may be convex,but convex functions require all three directions to be convex,so it is a nonconvex function.
For m=1,we still adopt the method of fixing two directions to observe the third direction.Figure 4-6 shows that fixed b and c observe w direction,fixed w,c observe b direction,fixed w,b observe c direction.

Figure4Figure5Figure6\qquad\qquad Figure 4 \qquad \qquad Figure 5 \qquad \qquad \qquad Figure 6Figure4Figure5Figure6
We can also see that the function is nonconvex in the w and b directions and convex in the c direction, but in sum, it is still a nonconvex function. So it is a nonconvex problem for both m=1 and m=2.
(2)
Figure 7-8 shows the change of the objective function and gradient when m=2. I got a step size of 0.05 and went through 10000 iterations.

Figure7Figure8\qquad \qquad Figure 7 \qquad\qquad\qquad Figure 8Figure7Figure8
If the step size is changed to 0.005, the objective function and gradient will change as shown in the figure below:

Figure9Figure10\qquad \qquad Figure 9 \qquad \qquad\qquad Figure 10Figure9Figure10

If the step size is changed to 0.0005, the objective function and gradient will change as shown in the figure below:

Figure11Figure12\qquad \qquad Figure 11 \qquad \qquad\qquad Figure 12Figure11Figure12

From the above picture analysis,we can kown the best step size may be around 0.05.
Figure 13-14 shows the change of objective function and gradient when m=1. I got a step size of 0.05 and went through 10000 iterations.

Figure13Figure14\qquad \qquad Figure 13 \qquad \qquad\qquad Figure 14Figure13Figure14
If the step size is changed to 0.005, the objective function and gradient will change as shown in the figure below:

Figure15Figure16\qquad \qquad Figure 15 \qquad \qquad\qquad Figure 16Figure15Figure16
If the step size is changed to 0.0005, the objective function and gradient will change as shown in the figure below:

Figure17Figure18\qquad \qquad Figure 17 \qquad \qquad\qquad Figure 18Figure17Figure18
From the above picture analysis,we can kown the best step size may be around 0.0005.
(3)

For m=2,γ is calculated by fitting logerrorklogerror_klogerrork to k. The figure below shows this value.
Figure 19-21 are graphs of the number of iterations with respect to error,1k\frac{1}{k}k1 rate,γkγ^kγk rate, respectively,we can known the convergence rate is 1k\frac{1}{k}k1 and ganna=0.999998

Figure19Figure20Figure21\qquad \qquad Figure 19 \qquad \qquad Figure 20\qquad \qquad Figure 21Figure19Figure20Figure21

For m=1,γ is calculated by fitting logerrorklogerror_klogerrork to k. The figure below shows this value.
Figure 22-24 are graphs of the number of iterations with respect to error,1k\frac{1}{k}k1 rate, γkγ^kγk rate, respectively
we can known the convergence rate is 1k\frac{1}{k}k1 and gamma=0.999679

Figure22Figure23Figure24\qquad \qquad Figure 22 \qquad \qquad Figure 23\qquad \qquad Figure 24Figure22Figure23Figure24

(4)
Using different initial points has a great influence on gradient descent. Some initial points make the gradient descent process very slow, and the iteration can not reach convergence or even divergence.The following figure shows the process of gradient descent which cannot converge due to different initial points when m=2 and m=1.

Figure25Figure26\qquad Figure25 \qquad \qquad\qquad Figure26Figure25Figure26

Figure27Figure28\qquad Figure27 \qquad \qquad\qquad Figure28Figure27Figure28
If the initial point is different the local minimum points may also be different.For m=2 The step size is 0.05, and the initial points are different. It can be seen that figure 33 reaches the minimum value faster than figure 29 and figure 3, so it can be inferred that different local minimum values are reached

Figure29Figure30\qquad Figure29 \qquad \qquad\qquad Figure30Figure29Figure30

Figure31Figure32\qquad Figure31 \qquad \qquad\qquad Figure32Figure31Figure32

Figure33Figure34\qquad Figure33 \qquad \qquad\qquad Figure34Figure33Figure34

Gradient Descent for one-hidden-layer-function（单隐藏层神经网络的梯度下降）相关推荐

使用单隐藏层神经网络对平面数据分类
引言为了巩固下吴恩达深度学习--浅层神经网络中的理论知识,我们来实现一个使用单隐藏层神经网络对平面数据进行分类的例子. 关于本文代码中的公式推导可见吴恩达深度学习--浅层神经网络. 这是吴恩达深度学 ...
手写单隐藏层神经网络_反向传播(Matlab实现)
文章目录要点待优化效果代码 mian train_neural_net 待优化(1)已完成要点 1.sigmoid函数做为激活函数,二分类交叉熵函数做损失函数 2.可以同时对整个训练集进行训 ...
手写单隐层神经网络_鸢尾花分类(matlab实现)
文章目录思路留言效果代码 mian split training testing 思路 sigmoid函数做激活函数二分类交叉熵做损失函数 mian函数:数据预处理 split函数:数据集分 ...
神经网络与深度学习三：编写单隐层神经网络
三:编写单隐层神经网络 1 神经网络概述这篇文章你会学到如何实现一个神经网络,在我们深入学习技术细节之前,现在先大概快速的了解一下如何实现神经网络,如果你对某些内容不甚理解(后面的文章中会深入其中的 ...
排序层-深度模型-2015：AutoRec【单隐层神经网络推荐模型】
AutoRec模型是2015年由澳大利亚国立大学提出的. 它将自编码器(AutoEncoder ) 的思想和协同过滤结合,提出了一种单隐层神经网络推荐模型.因其简洁的网络结构和清晰易懂的模型原理, ...
吴恩达老师深度学习视频课笔记：单隐含层神经网络公式推导及C++实现(二分类)
关于逻辑回归的公式推导和实现可以参考: http://blog.csdn.net/fengbingchun/article/details/79346691 下面是在逻辑回归的基础上,对单隐含层的神经 ...
实验二单隐层神经网络
一.实验目的 (1)学习并掌握常见的机器学习方法: (2)能够结合所学的python知识实现机器学习算法: (3)能够用所学的机器学习算法解决实际问题. 二.实验内容与要求 (1)掌握神经网络的基本原 ...
pytorch_lesson13.2 模型拟合度概念介绍+模型欠拟合实例+单隐藏层激活函数性能比较+相同激活函数不同隐藏层数结果对比+神经网络结构选择策略
提示:仅仅是学习记录笔记,搬运了学习课程的ppt内容,本意不是抄袭!望大家不要误解!纯属学习记录笔记!!!!!! 文章目录前言一.模型拟合度概念介绍与实验 1.测试集的"不可知" ...
单隐层神经网络可以拟合任意单值连续函数
[实例简介] 编写一种简单的神经网络,用于学习,内嵌了400个样本,对于想要学习神经网络的细节很有帮助,没有使用库函数,完全纯代码编写. [实例截图] 文件:590m.com/f/25127180-4 ...

Gradient Descent for one-hidden-layer-function（单隐藏层神经网络的梯度下降）

Gradient Descent for one-hidden-layer-function（单隐藏层神经网络的梯度下降）

Problem description

Answers to questions

Gradient Descent for one-hidden-layer-function（单隐藏层神经网络的梯度下降）相关推荐

最新文章

热门文章