吴恩达机器学习ex7 python实现
这个项目包含了K-means 和PCA(主成分分析)
1 Kmeans聚类
1.1 读取数据
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy.io import loadmat
定义函数,返回离样本最近的中心
def find_closest_center(X,center):m=X.shape[0]k=center.shape[0]idx=np.zeros(m)for i in range(m):min_dist=1e6for j in range(k):dist=np.sum((X[i,:]-center[j,:])**2)if dist < min_dist:min_dist=distidx[i]=jreturn idx
加载数据斌测试寻找中心函数
data=loadmat(r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\ex7data2.mat')
X=data['X']
initial_center=np.array([[3,3],[6,2],[8,5]])idx=find_closest_center(X,initial_center)
idx[:3]
array([0., 2., 1.])
数据格式转化
data2=pd.DataFrame(data.get('X'),columns=['X1','X2'])
data2.head()
X1 | X2 | |
---|---|---|
0 | 1.842080 | 4.607572 |
1 | 5.658583 | 4.799964 |
2 | 6.352579 | 3.290854 |
3 | 2.904017 | 4.612204 |
4 | 3.231979 | 4.939894 |
数据可视化
sb.set(context='notebook',style='white') #style为设置风格 context paper, notebook, talk, poste四种参数 主要是大小
sb.lmplot('X1','X2',data=data2,fit_reg=False) #fit_regbool, optional If True, estimate and plot a regression model relating the x and y variables.
plt.show()
1.2 定义中心计算函数和训练模型函数
计算中心
def compute_center(X,idx,k):#k个centerm,n=X.shapecenter=np.zeros((k,n))for i in range(k):indices=np.where(idx==i)center[i,:]=(np.sum(X[indices,:],axis=1)/len(indices[0])).ravel()#我们平时用的sum应该是默认的axis=0 就是普通的相加而当加入axis=1以后就是将一个矩阵的每一行向量相加return center
compute_center(X,idx,3)
array([[2.42830111, 3.15792418],[5.81350331, 2.63365645],[7.11938687, 3.6166844 ]])
kmeans迭代计算
def run_kmeans(X,initial_center,max_iters):m,n=X.shapek=initial_center.shape[0]idx=np.zeros(m)center=initial_centerfor i in range(max_iters):idx=find_closest_center(X,center)center=compute_center(X,idx,k)idx=find_closest_center(X,center)return idx,center
1.3 模型训练以及可视化
idx,center=run_kmeans(X,initial_center,10)
cluster1=X[np.where(idx==0)[0],:]
cluster2=X[np.where(idx==1)[0],:]
cluster3=X[np.where(idx==2)[0],:]fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(cluster1[:,0], cluster1[:,1], s=30, color='r', label='Cluster 1')
ax.scatter(cluster2[:,0], cluster2[:,1], s=30, color='g', label='Cluster 2')
ax.scatter(cluster3[:,0], cluster3[:,1], s=30, color='b', label='Cluster 3')
ax.legend()
plt.show()
随机初始化聚类中心,选择随机样本并将其用作初始聚类中心
def init_center(X,k):m,n=X.shapecenter=np.zeros((k,n))idx=np.random.randint(0,m,k)for i in range(k):center[i,:]=X[idx[i],:]return center
随机初始化聚类中心的可视化结果
initial_center=init_center(X, 3)
print(initial_center)
idx,center=run_kmeans(X,initial_center,10)
cluster1=X[np.where(idx==0)[0],:]
cluster2=X[np.where(idx==1)[0],:]
cluster3=X[np.where(idx==2)[0],:]fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(cluster1[:,0], cluster1[:,1], s=30, color='r', label='Cluster 1')
ax.scatter(cluster2[:,0], cluster2[:,1], s=30, color='g', label='Cluster 2')
ax.scatter(cluster3[:,0], cluster3[:,1], s=30, color='b', label='Cluster 3')
ax.legend()
plt.show()
[[4.72372078 0.62044136][3.85384314 0.7920479 ][2.61036396 0.88027602]]
2 Kmeans图像压缩
2.1 读取数据并进行可视化
from IPython.display import Image
Image(filename=r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\bird_small.png')
image_data=loadmat(r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\bird_small.mat')
image_data
{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Tue Jun 5 04:06:24 2012','__version__': '1.0','__globals__': [],'A': array([[[219, 180, 103],[230, 185, 116],[226, 186, 110],...,[ 14, 15, 13],[ 13, 15, 12],[ 12, 14, 12]],[[230, 193, 119],[224, 192, 120],[226, 192, 124],...,[ 16, 16, 13],[ 14, 15, 10],[ 11, 14, 9]],[[228, 191, 123],[228, 191, 121],[220, 185, 118],...,[ 14, 16, 13],[ 13, 13, 11],[ 11, 15, 10]],...,[[ 15, 18, 16],[ 18, 21, 18],[ 18, 19, 16],...,[ 81, 45, 45],[ 70, 43, 35],[ 72, 51, 43]],[[ 16, 17, 17],[ 17, 18, 19],[ 20, 19, 20],...,[ 80, 38, 40],[ 68, 39, 40],[ 59, 43, 42]],[[ 15, 19, 19],[ 20, 20, 18],[ 18, 19, 17],...,[ 65, 43, 39],[ 58, 37, 38],[ 52, 39, 34]]], dtype=uint8)}
A=image_data['A']
A.shape
(128, 128, 3)
2.2 数据压缩
数据格式转换
A=A/255
X=np.reshape(A,(A.shape[0]*A.shape[1],A.shape[2]))
X.shape
(16384, 3)
利用kmeans建模
initial_center=init_center(X,16)
idx,center=run_kmeans(X,initial_center,10)
idx1=find_closest_center(X,center)
X_recovered=center[idx.astype(int),:]
X_recovered.shape
(16384, 3)
2.3 数据恢复
X_recovered=np.reshape(X_recovered,(A.shape[0],A.shape[1],A.shape[2]))
X_recovered.shape
(128, 128, 3)
plt.imshow(X_recovered)
plt.show()
2.4 利用scikit-learn来实现图片压缩
from skimage import io
pic=io.imread(r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\bird_small.png')/255
io.imshow(pic)
plt.show()
pic.shape
(128, 128, 3)
数据格式转换
data=pic.reshape(128*128,3)
建立模型并训练
from sklearn.cluster import KMeans
model=KMeans(n_clusters=16,n_init=100,n_jobs=-1)
model.fit(data)
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,n_clusters=16, n_init=100, n_jobs=-1, precompute_distances='auto',random_state=None, tol=0.0001, verbose=0)
center=model.cluster_centers_
print(center.shape)C=model.predict(data)
print(C.shape)
(16, 3)
(16384,)
center[C].shape
(16384, 3)
数据恢复
compressed_pic=center[C].reshape((128,128,3))
fig,ax=plt.subplots(1,2)
ax[0].imshow(pic)
ax[1].imshow(compressed_pic)plt.show()
3 主成分分析
PCA是在数据集中找到“主成分”或最大方差方向的线性变换。 它可以用于降维。
3.1 读取数据
#主成分分析
data=loadmat(r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\ex7data1.mat')
data
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 14 22:41:44 2011','__version__': '1.0','__globals__': [],'X': array([[3.38156267, 3.38911268],[4.52787538, 5.8541781 ],[2.65568187, 4.41199472],[2.76523467, 3.71541365],[2.84656011, 4.17550645],[3.89067196, 6.48838087],[3.47580524, 3.63284876],[5.91129845, 6.68076853],[3.92889397, 5.09844661],[4.56183537, 5.62329929],[4.57407171, 5.39765069],[4.37173356, 5.46116549],[4.19169388, 4.95469359],[5.24408518, 4.66148767],[2.8358402 , 3.76801716],[5.63526969, 6.31211438],[4.68632968, 5.6652411 ],[2.85051337, 4.62645627],[5.1101573 , 7.36319662],[5.18256377, 4.64650909],[5.70732809, 6.68103995],[3.57968458, 4.80278074],[5.63937773, 6.12043594],[4.26346851, 4.68942896],[2.53651693, 3.88449078],[3.22382902, 4.94255585],[4.92948801, 5.95501971],[5.79295774, 5.10839305],[2.81684824, 4.81895769],[3.88882414, 5.10036564],[3.34323419, 5.89301345],[5.87973414, 5.52141664],[3.10391912, 3.85710242],[5.33150572, 4.68074235],[3.37542687, 4.56537852],[4.77667888, 6.25435039],[2.6757463 , 3.73096988],[5.50027665, 5.67948113],[1.79709714, 3.24753885],[4.3225147 , 5.11110472],[4.42100445, 6.02563978],[3.17929886, 4.43686032],[3.03354125, 3.97879278],[4.6093482 , 5.879792 ],[2.96378859, 3.30024835],[3.97176248, 5.40773735],[1.18023321, 2.87869409],[1.91895045, 5.07107848],[3.95524687, 4.5053271 ],[5.11795499, 6.08507386]])}
X=data['X']
fig,ax=plt.subplots(figsize=(12,8))
ax.scatter(X[:,0],X[:,1])
plt.show()
3.2 模型建立
协方差的奇异值分解
def pca(X):X=(X-X.mean())/X.std()X=np.matrix(X)cov=(np.dot(X.T,X))/X.shape[0]U,S,V=np.linalg.svd(cov)return U,S,V
U,S,V=pca(X)
U,S,V
(matrix([[-0.79241747, -0.60997914],[-0.60997914, 0.79241747]]),array([1.43584536, 0.56415464]),matrix([[-0.79241747, -0.60997914],[-0.60997914, 0.79241747]]))
预测函数
def predict(X,U,k):U_reduced=U[:,:k]return np.dot(X,U_reduced)
Z=predict(X,U,1)
print(Z.shape)
(50, 1)
恢复数据
def recover_data(Z,U,k):U_reduced=U[:,:k]return np.dot(Z,U_reduced.T)
X_recovered=recover_data(Z,U,1)
X_recovered
matrix([[3.76152442, 2.89550838],[5.67283275, 4.36677606],[3.80014373, 2.92523637],[3.53223661, 2.71900952],[3.80569251, 2.92950765],[5.57926356, 4.29474931],[3.93851354, 3.03174929],[6.94105849, 5.3430181 ],[4.93142811, 3.79606507],[5.58255993, 4.29728676],[5.48117436, 4.21924319],[5.38482148, 4.14507365],[5.02696267, 3.8696047 ],[5.54606249, 4.26919213],[3.60199795, 2.77270971],[6.58954104, 5.07243054],[5.681006 , 4.37306758],[4.02614513, 3.09920545],[6.76785875, 5.20969415],[5.50019161, 4.2338821 ],[6.81311151, 5.24452836],[4.56923815, 3.51726213],[6.49947125, 5.00309752],[4.94381398, 3.80559934],[3.47034372, 2.67136624],[4.41334883, 3.39726321],[5.97375815, 4.59841938],[6.10672889, 4.70077626],[4.09805306, 3.15455801],[4.90719483, 3.77741101],[4.94773778, 3.80861976],[6.36085631, 4.8963959 ],[3.81339161, 2.93543419],[5.61026298, 4.31861173],[4.32622924, 3.33020118],[6.02248932, 4.63593118],[3.48356381, 2.68154267],[6.19898705, 4.77179382],[2.69816733, 2.07696807],[5.18471099, 3.99103461],[5.68860316, 4.37891565],[4.14095516, 3.18758276],[3.82801958, 2.94669436],[5.73637229, 4.41568689],[3.45624014, 2.66050973],[5.10784454, 3.93186513],[2.13253865, 1.64156413],[3.65610482, 2.81435955],[4.66128664, 3.58811828],[6.1549641 , 4.73790627]])
fig,ax=plt.subplots(figsize=(12,8))
ax.scatter(list(X_recovered[:, 0]), list(X_recovered[:, 1]))
plt.show()
3.3 PCA应用于脸部图像
读取数据
faces=loadmat(r'C:\Users\xxx\Desktop\机器学习\machine-learning-ex7\ex7faces.mat')
X=faces['X']
X.shape
(5000, 1024)
数据可视化
def plot_n_image(X,n):pic_size=int(np.sqrt(X.shape[1]))grid_size=int(np.sqrt(n))first_n_image=X[:n,:]fig,ax=plt.subplots(nrows=grid_size,ncols=grid_size,sharey=True,sharex=True,figsize=(8,8))for r in range(grid_size):for c in range(grid_size):ax[r,c].imshow(first_n_image[grid_size*r+c].reshape((pic_size,pic_size)))plt.xticks(np.array([]))plt.yticks(np.array([]))
face=np.reshape(X[3,:],(32,32))
plt.imshow(face)
plt.show()
U,S,V=pca(X)
Z=predict(X,U,100)
数据复原
X_recovered =recover_data(Z,U,100)
face=np.reshape(X_recovered [3,:],(32,32))
plt.imshow(face)
plt.show()
总结
- 我们平时用的sum应该是默认的axis=0 就是普通的相加而当加入axis=1以后就是将一个矩阵的每一行向量相加;
- pca降维会损失一些信息,kmeans将原来很多的颜色用少量的颜色去表示,这样就可以减小图片大小了。
吴恩达机器学习ex7 python实现相关推荐
- 吴恩达机器学习 EX7 第二部分 主成分分析(PCA)
2 主成分分析 主成分分析通过协方差矩阵提取数据的主要成分,如90%的成分,通常用户数据压缩和数据可视化(维度降低方便可视化) 2.1 导入模块和数据 该部分通过将二维数据压缩成一维数据演示主成分分析 ...
- 吴恩达机器学习作业Python实现(三):多类分类和前馈神经网络
吴恩达机器学习系列作业目录 1 多类分类(多个logistic回归) 我们将扩展我们在练习2中写的logistic回归的实现,并将其应用于一对多的分类(不止两个类别). import numpy as ...
- 吴恩达机器学习作业Python实现(二):logistic回归
吴恩达机器学习系列作业目录 1 Logistic regression 在这部分的练习中,你将建立一个逻辑回归模型来预测一个学生是否能进入大学.假设你是一所大学的行政管理人员,你想根据两门考试的结果, ...
- 吴恩达机器学习作业Python实现(八):异常检测和推荐系统
吴恩达机器学习系列作业目录 1 Anomaly detection 这部分,您将实现一个异常检测算法来检测服务器计算机中的异常行为.他的特征是测量每个服务器的响应速度(mb/s)和延迟(ms).当你的 ...
- 吴恩达机器学习作业Python实现(七):K-means和PCA主成分分析
吴恩达机器学习系列作业目录 1 K-means Clustering 在这个练习中,您将实现K-means算法并将其用于图像压缩.通过减少图像中出现的颜色的数量,只剩下那些在图像中最常见的颜色. 1. ...
- 吴恩达机器学习作业Python实现(六):SVM支持向量机
吴恩达机器学习系列作业目录 1 Support Vector Machines 1.1 Example Dataset 1 %matplotlib inline import numpy as np ...
- 吴恩达机器学习作业Python实现(五):偏差和方差
吴恩达机器学习系列作业目录 在本练习中,您将实现正则化的线性回归和多项式回归,并使用它来研究具有不同偏差-方差属性的模型 1 Regularized Linear Regression 正则线性回归 ...
- 吴恩达机器学习作业Python实现(四):神经网络(反向传播)
吴恩达机器学习系列作业目录 1 Neural Networks 神经网络 在这个练习中,你将实现反向传播算法来学习神经网络的参数.依旧是上次预测手写数数字的例子. 1.1 Visualizing th ...
- 吴恩达机器学习ex1 Python实现
** ** 机器学习小白入门,在看吴恩达机器学习课程的同时找到了课后的练习.开贴用于记录学习(copy)过程. 学习参考:吴恩达机器学习ex1 单变量线性回归 题目描述:在本部分的练习中,您将使用一个 ...
最新文章
- rtmp的URL里面mp3:和mp4:是啥意思
- 8运行不了_好消息!十堰新建28座充电站,年底投入运行
- 访问数,每次访问页数,平均停留时间,跳出率
- Java进阶之对象克隆(复制)
- 在linux内核3.14.43添加自己的驱动源码,linux内核如何加入自己的驱动
- SSM后台框架下配合实现小程序图片上传至阿里云OOS
- API 安全成企业考虑的第一要务
- 北理c语言乐学作业分数运算,北理乐学C语言答案-最新(12页)-原创力文档
- a byte of python2微盘_《A Byte of Python》与《简明 Python 教程》PDF版
- vue电商后台管理项目总结
- 计算机条件求和函数,在excel中怎样根据多个条件进行求和
- 【第008问 Unity中什么是UV?】
- TM4C123-使用ROM中的函数库
- 13个Python小游戏,可以上班摸鱼玩了一天
- 教女朋友学会用ESP8266实现wifi杀手——有固件
- sql语句(select,create,drop,alter,delete,insert,update,grant)
- Python基础入门知识(11)
- 【复杂句的逻辑练习题】规则变化的过去分词
- iOS 底层探索篇 —— KVC 底层原理
- vue 导出excel,合并单元格,修改样式