吴恩达ex3_吴恩达Machine Learning Ex3 python实现

1.Multi-class classification

使用Logistic regression和neural networks来识别手写数字识别(从0到9)。在第一部分练习中使用Logistic regression进行one-vs-all分类。

1.1 Dataset

数据集ex3data1.mat包含了5000条手写数字的训练样本，每个训练样本是 20 * 20 的像素灰度的矩阵。每一个像素值用浮点数来表示对应位置的灰度值，并被展开成400维的向量。即矩阵X中每一行代表一个训练样本。

数据集ex3data1.mat中还包含了向量y，包含5000个样本的标签。使用scipy模块将mat形式的数据框导入。

import scipy.io as scio

import pandas as pd

data = scio.loadmat('E:/2018/ML/work/machine-learning-ex3/ex3/ex3data1.mat')

data1 = data.get('X')

label = data.get('y')

1.2 数据可视化

选中其中一行手写数字数据进行可视化，需将每一行数据重新形成20*20的矩阵。使用random模块中的randint()随机选择0~5000中的一个随机数，并利用matplotlib模块进行画图。

import random

import matplotlib.pyplot as plt

def plot_an_image(x):

pick_one = random.randint(0, 5000)

image = x[pick_one, :]

fig, ax = plt.subplots(figsize=(1, 1))

ax.matshow(image.reshape((20, 20)), cmap='gray_r')

plt.xticks([])

plt.yticks([])

plt.show()

print('this should be {}'.format(label[pick_one]))

plot_an_image(data1)

随机选择data1中的100行数据，进行绘图。

def plot_100_image(x):

sample_idx = np.random.choice(np.arange(x.shape[0]), 100)

sample_images = x[sample_idx, :]

fig, ax_array = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True, figsize=(8, 8))

for row in range(10):

for col in range(10):

ax_array[row, col].matshow(sample_images[10 * row + col].reshape((20, 20)), cmap='gray_r')

plt.xticks([])

plt.yticks([])

plt.show()

plot_100_image(data1)

1.3 Vectorizing regularized Logistic Regression

使用one-vs-all logistic regression模型来构建一个multi-class分类器。由于有10个类别，需要构建10个分开的logistic分类器。为了使训练更高效，使用向量化语言，而不使用循环。

正则化的cost function被定义为如下：

注意其中bias term 即theta0是没有包含在正则化项中的。

首先定义sigmoid函数，其具体表达式为

# 定义sigmoid函数

def sigmoid(x):

return 1/(1+np.exp(-x))

def costFunction(theta, x, y, Lambda):

m = np.shape(x)[0]

thetaReg = theta[1:]

y = y.transpose()

hypothesis = sigmoid(np.dot(x, theta))

cost = y * np.log(hypothesis) + (1 - y) * np.log(1 - hypothesis)

reg = np.sum(thetaReg * thetaReg) * Lambda/(2 * m)

costAll = np.mean(-cost) + reg

return costAll

对正则化的logistic regression cost function求偏导

def Gradient(theta, x, y, Lambda):

m = np.shape(x)[0]

theteReg = theta[1:]

y = y.transpose()

hypothesis = sigmoid(np.dot(x, theta))

loss = hypothesis - y

cost_1 = np.dot(loss, x)/m

reg = np.concatenate([np.array([0]), (Lambda / m) * theteReg])

gradient = cost_1 + reg

return gradient

1.4 One-vs-all Classification

在手写数字数据集，类别为10，代码需要对任意一个类别识别。代码需要返回所有分类器的参数矩阵k*(N+1)维即 10 * 401，其中每一行代表每一个分类器的参数。

对于训练分类器k(1，..., K)时需要对数据进行转换，将类别为k的标记为正向类(y = 1),然后将其他类标记为负向类(y = 0),随后利用minimize()对参数对第k个分类器的参数。实现1~K 的循环得到最终所有的参数矩阵

def oneVsAll(x, y, Lambda, k):

all_theta = np.zeros((k, np.shape(x)[1]))

for i in range(k):

theta = np.zeros(np.shape(x)[1])

y_i = np.array([1 if label == i+1 else 0 for label in y])

ret = minimize(fun=costFunction, x0=theta, args=(x, y_i, Lambda), method='TNC', jac=Gradient, options={'disp': True})

all_theta[i, :] = ret.x

return all_theta

利用得到的参数矩阵进行预测，对每一观测计算每个分类器的sigmoid()结果，即对于1个观测应有10个sigmoid()结果，选择其中最大的作为这一观测的预测类别。

def predictOneVsAll(x, all_theta):

thetaT = np.transpose(all_theta)

probMat = sigmoid(np.dot(x, thetaT))

maxProb = np.argmax(probMat, axis=1)

label = maxProb+1

return label

对于读入的数据data1需要增加bias term项

dataFix = np.insert(data1, 0, 1, axis=1)

thetaAll = oneVsAll(dataFix, label, 1, 10)

pred = predictOneVsAll(dataFix, thetaAll)

accuracy = np.mean(pred == label.T)

print('accuracy = {0}%'.format(accuracy * 100))

得到最终accuracy为94.5%

2. Neural Networks

前面用multi-class logistic regression 来识别手写数字，但是logistic regression不能形成复杂的hypotheses 只能是线性分类。使用neural networks来构造更复杂的模型，非线性的neural network。运行feedforward propagation algorithm来运行，其中权重已知。

所构造的神经网络包含了一个输入层，一个隐藏层，一个输出层，其中输入层有400个input unit和一个bias，隐藏层包含了25个units和1个bias, 输出层包含10个output unit(因为存在10个类别数字)

parameters = scio.loadmat('E:/2018/ML/work/machine-learning-ex3/ex3/ex3weights.mat')

theta1 = parameters.get('Theta1')

theta2 = parameters.get('Theta2')

data_Neur = np.insert(data1, 0, 1, axis=1)

hidden_layer = sigmoid(np.dot(data_Neur, theta1.T))

z_2 = np.insert(hidden_layer, 0, 1, axis=1)

output_layer = sigmoid(np.dot(z_2, theta2.T))

max_prob = np.argmax(output_layer, axis=1)

out = max_prob + 1

accuracy = np.mean(out == label.T)

print('accuracy = {0}%'.format(accuracy * 100))

得到accuracy为97.5%

吴恩达ex3_吴恩达Machine Learning Ex3 python实现相关推荐

吴恩达Coursera, 机器学习专项课程, Machine Learning：Advanced Learning Algorithms第三周编程作业...
吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第三周所有jupyter notebook文件: 吴恩达,机器学 ...
吴恩达Coursera, 机器学习专项课程, Machine Learning：Advanced Learning Algorithms第二周编程作业...
吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第二周所有jupyter notebook文件: 吴恩达,机器学 ...
Machine Learning之Python篇（一）
Machine Learning之Python篇概述教程 https://ljalphabeta.gitbooks.io/python-/content/ <Python机器学习>中文 ...
sklearn自学指南(part1)--Machine Learning in Python
学习笔记,仅供参考,有错必纠自翻译+举一反三 scikit-learn(Machine Learning in Python) 预测数据分析的简单和有效的工具每个人都可以访问,并可在各种上下文中重 ...
[导读]7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills ▪ Python The Hard Way by Zed A. Shaw ▪ Google Developers Python Course ...
Machine Learning with Python Cookbook 学习笔记第8章
Chapter 8. Handling Images 前言本笔记是针对人工智能典型算法的课程中Machine Learning with Python Cookbook的学习笔记学习的实战代码都放 ...
Coursera | Applied Data Science with Python 专项课程 | Applied Machine Learning in Python
本文为学习笔记,记录了由University of Michigan推出的Coursera专项课程--Applied Data Science with Python中Course Three: Ap ...
Machine Learning with Python Cookbook 学习笔记第9章
Chapter 9. Dimensionality Reduction Using Feature Extraction 前言本笔记是针对人工智能典型算法的课程中Machine Learning w ...
Machine Learning with Python Cookbook 学习笔记第6章
Chapter 6. Handling Text 本笔记是针对人工智能典型算法的课程中Machine Learning with Python Cookbook的学习笔记学习的实战代码都放在代码压缩 ...

吴恩达ex3_吴恩达Machine Learning Ex3 python实现

吴恩达ex3_吴恩达Machine Learning Ex3 python实现相关推荐

最新文章

热门文章