Numpy用法详细总结：学习numpy如何使用，看这一篇文章就足够了

Numpy用法详细总结

一、创建ndarray及查看数据类型
- 1. 使用np.array()由python list创建
- 图片与array数组的关系
- 2. 使用np的常用函数创建
二、ndarray的常用属性
三、ndarray的基本操作
- 1、索引
- 2、切片
- 拼图小游戏：把女孩放在老虎背上
- 3、变形
- 4、级联
- - 推广
- 5、切分
- 6、副本
四、ndarray的聚合操作
- 1、求和
- - 推广
  - 练习：给定一个4维矩阵，如何得到最后两维的和？
- 2、最值
- 3、其他聚合操作
- - - 思考题：如何根据第3列来对一个5*5矩阵排序？
  - 排序
五、ndarray的矩阵操作
- 1. 基本矩阵操作
- - 1）算术运算（即加减乘除）
  - 2）矩阵积
- 2. 广播机制

导入numpy并查看版本

import numpy as np

np.__version__

'1.13.1'

什么是numpy？

即Numeric Python，python经过扩展以后可以支持数组和矩阵类型，包含大量的矩阵和数组的计算函数

numpy框架是后面机器学习和数据挖掘的基础，pandas、scipy、matplotlib等都是基于numpy

一、创建ndarray及查看数据类型

numpy中最基础数据结构就是ndarray：即数组

1. 使用np.array()由python list创建

data = [1,2,3]
nd = np.array(data)
nd

array([1, 2, 3])

type(data),type(nd)

(list, numpy.ndarray)

# 查看nd中的元素的类型
nd.dtype

dtype('int32')

nd2 = np.array([1,3,4.6,"fdsaf",True])
nd2

array(['1', '3', '4.6', 'fdsaf', 'True'],dtype='<U32')

nd2.dtype

dtype('<U32')

【注意】
1、数组中所有元素的类型都相同
2、如果数组是由列表来创建的，列表中元素类不同的时候会被统一成某个类型（优先级:str>float>int）

图片与array数组的关系

# 注：图片在numpy中也是一个数组
# 导入一张图片
import matplotlib.pyplot as plt
# 这个工具是数据可视化分析工具，在这里我用来导入图片

girl = plt.imread("./source/girl.jpg")

type(girl) # 图片导入后是array类型的数组

numpy.ndarray

# 查看数组的形状
girl.shape
# shape属性是一个元组，元组的每一个元素代表了数组girl在这个维度上的元素个数

(900, 1440, 3)

girl

array([[[225, 231, 231],[229, 235, 235],[222, 228, 228],..., [206, 213, 162],[211, 213, 166],[217, 220, 173]],[[224, 230, 230],[229, 235, 235],[223, 229, 229],..., [206, 213, 162],[211, 213, 166],[217, 220, 173]],[[224, 230, 230],[229, 235, 235],[223, 229, 229],..., [206, 213, 162],[211, 213, 166],[219, 221, 174]],..., [[175, 187, 213],[180, 192, 218],[175, 187, 213],..., [155, 162, 180],[153, 160, 178],[156, 163, 181]],[[175, 187, 213],[180, 192, 218],[174, 186, 212],..., [155, 162, 180],[153, 160, 178],[155, 162, 180]],[[177, 189, 215],[181, 193, 219],[174, 186, 212],..., [155, 162, 180],[153, 160, 178],[156, 163, 181]]], dtype=uint8)

# 用plt工具来显示一下图片
plt.imshow(girl)
plt.show()

创建一张图片

# 创建一张图片
boy = np.array([[[0.4,0.5,0.6],[0.8,0.8,0.2],[0.6,0.9,0.5]],[[0.12,0.32,0.435],[0.22,0.45,0.9],[0.1,0.2,0.3]],[[0.12,0.32,0.435],[0.12,0.32,0.435],[0.12,0.32,0.435]],[[0.12,0.32,0.435],[0.12,0.32,0.435],[0.12,0.32,0.435]]])
boy

array([[[ 0.4  ,  0.5  ,  0.6  ],[ 0.8  ,  0.8  ,  0.2  ],[ 0.6  ,  0.9  ,  0.5  ]],[[ 0.12 ,  0.32 ,  0.435],[ 0.22 ,  0.45 ,  0.9  ],[ 0.1  ,  0.2  ,  0.3  ]],[[ 0.12 ,  0.32 ,  0.435],[ 0.12 ,  0.32 ,  0.435],[ 0.12 ,  0.32 ,  0.435]],[[ 0.12 ,  0.32 ,  0.435],[ 0.12 ,  0.32 ,  0.435],[ 0.12 ,  0.32 ,  0.435]]])

plt.imshow(boy)
plt.show()

二维数组也可以表示一张图片,二维的图片是灰度级的

#二维数组也可以表示一张图片,二维的图片是灰度级的
boy2 = np.array([[0.1,0.2,0.3,0.4],[0.6,0.3,0.2,0.5],[0.9,0.8,0.3,0.2]])
boy2

array([[ 0.1,  0.2,  0.3,  0.4],[ 0.6,  0.3,  0.2,  0.5],[ 0.9,  0.8,  0.3,  0.2]])

plt.imshow(boy2,cmap="gray")
plt.show()

图片切割：取出图片一部分

# 切图片
g = girl[:200,:300]

plt.imshow(g)
plt.show()

2. 使用np的常用函数创建

1)np.ones(shape,dtype=None,order=‘C’)

np.ones((2,3,3,4,5))
# shape参数代表的是数组的形状，要求传一个元组或者列表，元组的每一元素
# 代表创建出来的数组的该维度上的元素的个数

array([[[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]],[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]],[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]]],[[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]],[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]],[[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]],[[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.],[ 1.,  1.,  1.,  1.,  1.]]]]])

ones = np.ones((168,233,3))

plt.imshow(ones)
plt.show()

2）np.zeros(shape,dtype=“float”,order=“C”)

np.zeros((1,2,3))

array([[[ 0.,  0.,  0.],[ 0.,  0.,  0.]]])

3）np.full(shape,fill_value,dtype=None)

np.full((2,3),12)

array([[12, 12, 12],[12, 12, 12]])

4）np.eye(N,M,k=0,dtype=‘float’)

np.eye(6)

array([[ 1.,  0.,  0.,  0.,  0.,  0.],[ 0.,  1.,  0.,  0.,  0.,  0.],[ 0.,  0.,  1.,  0.,  0.,  0.],[ 0.,  0.,  0.,  1.,  0.,  0.],[ 0.,  0.,  0.,  0.,  1.,  0.],[ 0.,  0.,  0.,  0.,  0.,  1.]])

np.eye(3,4)

array([[ 1.,  0.,  0.,  0.],[ 0.,  1.,  0.,  0.],[ 0.,  0.,  1.,  0.]])

np.eye(5,4)

array([[ 1.,  0.,  0.,  0.],[ 0.,  1.,  0.,  0.],[ 0.,  0.,  1.,  0.],[ 0.,  0.,  0.,  1.],[ 0.,  0.,  0.,  0.]])

5）np.linspace(start,stop,num=50)

np.linspace(1,10,num=100)
# 从start到stop平均分成num份，取切分点

array([  1.        ,   1.09090909,   1.18181818,   1.27272727,1.36363636,   1.45454545,   1.54545455,   1.63636364,1.72727273,   1.81818182,   1.90909091,   2.        ,2.09090909,   2.18181818,   2.27272727,   2.36363636,2.45454545,   2.54545455,   2.63636364,   2.72727273,2.81818182,   2.90909091,   3.        ,   3.09090909,3.18181818,   3.27272727,   3.36363636,   3.45454545,3.54545455,   3.63636364,   3.72727273,   3.81818182,3.90909091,   4.        ,   4.09090909,   4.18181818,4.27272727,   4.36363636,   4.45454545,   4.54545455,4.63636364,   4.72727273,   4.81818182,   4.90909091,5.        ,   5.09090909,   5.18181818,   5.27272727,5.36363636,   5.45454545,   5.54545455,   5.63636364,5.72727273,   5.81818182,   5.90909091,   6.        ,6.09090909,   6.18181818,   6.27272727,   6.36363636,6.45454545,   6.54545455,   6.63636364,   6.72727273,6.81818182,   6.90909091,   7.        ,   7.09090909,7.18181818,   7.27272727,   7.36363636,   7.45454545,7.54545455,   7.63636364,   7.72727273,   7.81818182,7.90909091,   8.        ,   8.09090909,   8.18181818,8.27272727,   8.36363636,   8.45454545,   8.54545455,8.63636364,   8.72727273,   8.81818182,   8.90909091,9.        ,   9.09090909,   9.18181818,   9.27272727,9.36363636,   9.45454545,   9.54545455,   9.63636364,9.72727273,   9.81818182,   9.90909091,  10.        ])

np.logspace(1,10,num=10)
# 从1-10分成10份（对应的分别是1、2、3...10）
# logx = 1  logx = 2  logx = 3 => 返回值10^1、10^2 .... 10^10

array([  1.00000000e+01,   1.00000000e+02,   1.00000000e+03,1.00000000e+04,   1.00000000e+05,   1.00000000e+06,1.00000000e+07,   1.00000000e+08,   1.00000000e+09,1.00000000e+10])

6）np.arange([start,]stop,[step,]dtype=None) "[]"中是可选项

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.arange(2,12)

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

np.arange(2,12,2)

array([ 2,  4,  6,  8, 10])

7）np.random.randint(low,high=None,size=None,dtype=‘I’)

np.random.randint(3,10,size=(10,10,3))
# 随机生成整数数组

array([[[4, 6, 6],[5, 9, 4],[5, 9, 6],[4, 6, 4],[7, 4, 9],[5, 9, 4],[8, 6, 3],[7, 5, 8],[8, 3, 4],[5, 4, 8]],[[6, 5, 8],[9, 3, 5],[8, 4, 4],[5, 9, 8],[8, 5, 6],[9, 4, 6],[5, 8, 8],[5, 7, 6],[3, 7, 9],[5, 5, 7]],[[4, 7, 5],[9, 4, 9],[3, 3, 4],[8, 4, 8],[3, 6, 3],[4, 4, 3],[4, 4, 5],[5, 5, 4],[5, 7, 9],[4, 4, 9]],[[6, 3, 8],[5, 9, 6],[5, 6, 7],[3, 8, 6],[3, 7, 8],[6, 9, 7],[6, 7, 3],[7, 5, 4],[3, 3, 6],[9, 9, 7]],[[3, 5, 6],[7, 4, 6],[5, 3, 7],[3, 6, 3],[8, 3, 8],[7, 9, 7],[8, 7, 9],[4, 7, 5],[8, 8, 6],[4, 5, 4]],[[4, 4, 9],[9, 8, 7],[6, 6, 6],[4, 9, 5],[6, 9, 6],[9, 4, 8],[4, 7, 9],[9, 4, 9],[6, 9, 3],[8, 5, 9]],[[7, 6, 3],[4, 5, 4],[5, 6, 7],[7, 3, 4],[7, 4, 8],[7, 5, 6],[4, 9, 9],[4, 4, 8],[9, 3, 6],[3, 6, 9]],[[7, 7, 4],[8, 6, 3],[3, 8, 7],[5, 6, 9],[5, 8, 4],[9, 4, 4],[3, 6, 6],[6, 7, 4],[4, 8, 8],[4, 6, 3]],[[7, 4, 9],[5, 3, 7],[5, 9, 4],[5, 7, 9],[7, 6, 6],[6, 3, 3],[9, 4, 4],[5, 3, 4],[5, 7, 9],[3, 3, 5]],[[7, 3, 8],[7, 6, 8],[5, 7, 4],[4, 4, 7],[4, 5, 9],[8, 3, 5],[5, 9, 9],[6, 3, 7],[9, 5, 7],[8, 5, 9]]])

8）np.random.randn(d0,d1,…,dn)
从第一维度到第n维度生成一个数组，数组中的数字符合标准正态分布

np.random.randn(2,3,10)
# N(0,1)

array([[[-0.03414751, -1.01771263,  1.12067965, -0.43953023, -1.82364645,-0.0971702 , -0.65734554, -0.10303229,  1.52904104, -0.48624526],[-0.29295679, -1.09430988,  0.07499788,  0.31664607,  0.3500672 ,-0.18508775,  1.75620537,  0.71531162,  0.6161491 , -1.22053836],[ 0.7323965 ,  0.20671506, -0.58314419, -0.16540522, -0.23903187,1.27785655,  0.26691062, -1.45973265, -0.27273178, -1.02878312]],[[ 0.07655004, -0.35616184, -0.46353849, -1.8515281 , -0.26543777,0.76412627,  0.83337437,  0.04521198, -2.10686009,  0.84883742],[ 0.22188875,  0.63737544,  0.26173337, -0.11475485, -1.30431707,1.25062924,  2.03032414,  0.13742253, -0.98713219,  1.19711129],[ 0.69212245,  0.70550039, -1.15995398, -0.95507681, -0.39439139,2.76551965,  0.56088858,  0.54709151,  1.17615801,  0.17744971]]])

9）np.random.normal(loc=0.0,scale=1.0,size=None)

np.random.normal(175,20,size=100)
# 服从N(175,20) 生成10条数据

array([ 174.44281329,  177.66402876,  162.76426831,  210.11244283,161.26671985,  209.52372115,  159.92703726,  197.83048917,190.60230978,  170.27114821,  202.67422923,  203.04492988,171.13235245,  175.64710565,  200.40533303,  207.930948  ,141.09792492,  158.87495159,  176.74197674,  164.57884322,181.22386631,  156.26287142,  133.37408465,  178.07588597,187.50842048,  186.35236779,  153.61560634,  145.53831704,232.55949685,  142.01340562,  195.22465693,  188.922162  ,170.02159668,  167.74728882,  173.27258287,  187.68132279,217.7260755 ,  158.28833839,  155.11568289,  200.26945864,178.91552559,  149.21007505,  200.6454259 ,  169.37529856,201.18878627,  184.37773296,  196.67909536,  144.10223051,184.63682023,  167.86858875,  191.08394709,  169.98017168,204.05198975,  199.65286793,  176.22452948,  181.17515804,178.81440955,  176.79845708,  189.50950157,  136.05787608,199.35198398,  162.43654974,  155.61396415,  172.22147069,181.91161368,  192.82571507,  203.70689642,  190.79312957,204.48924027,  180.48880551,  176.81359193,  145.87844077,190.13853094,  160.22281705,  200.04783678,  165.19927728,184.10218694,  178.27524256,  191.58148162,  141.4792985 ,208.4723939 ,  163.70082179,  142.70675324,  189.25398816,183.53849685,  150.86998696,  172.04187127,  207.12343336,190.10648007,  188.18995666,  175.43040298,  183.79396855,172.60260342,  195.1083776 ,  194.70719705,  163.10904061,146.78089275,  195.2271401 ,  201.60339544,  164.91176955])

10）np.random.random(size=None)

np.random.random(size=(12,1)) # 0-1之间的浮点数

array([[ 0.54080763],[ 0.95618258],[ 0.19457156],[ 0.12198452],[ 0.3423529 ],[ 0.01716331],[ 0.28061005],[ 0.51960339],[ 0.60122982],[ 0.26462352],[ 0.85645091],[ 0.32352418]])

练习：用随机数生成一张图片

boy = np.random.random(size=(667,568,3))

plt.imshow(boy)
plt.show()

二、ndarray的常用属性

数组的常用属性：

维度 ndim，大小 size，形状 shape，元素类型 dtype，每项大小 itemsize，数据 data

tigger = plt.imread("./source/tigger.jpg")

# 1、维度
tigger.ndim

# 2、大小，指的是一个数组中具体有多少个数字
tigger.size

# 3、形状
tigger.shape

(786, 1200, 3)

# 4、数据的类型
tigger.dtype

dtype('uint8')

# 5、每个数字的大小（占的字节数）
tigger.itemsize

t = tigger / 255.0

t.dtype

dtype('float64')

t.itemsize

# 6、data
tigger.data

<memory at 0x000001AA3A0D8138>

三、ndarray的基本操作

1、索引

l = [1,2,3,4,5,6]
l[5]
l[-1]
l[0]
l[-6]
# 正着数从0开始，倒着数从-1开始

nd = np.random.randint(0,10,size=(4))
nd

array([9, 6, 1, 7])

nd[0]
nd[1]
nd[-3]

lp = [[1,2,3],[4,5,6],[7,8]]
lp[1][2]

np.array(lp)

array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8])], dtype=object)

np.array(lp)
# 如果二维列表中，某个维度值不保持一致，将会把这个维度打包成一个列表
# 【注意】数组中每个维度的元素的个数必须一样

array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8])], dtype=object)

nd = np.random.randint(0,10,size=(4,4))
nd
#[[2,2,1],[1,2,1]]

array([[7, 9, 2, 3],[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8]])

nd[1][3]
# 多次索引：首先找最前面的维度得到子数组，然后从得到的子数组中继续索引

区别于列表

nd[1,3]
# 一次索引：直接按照(1,3)这个次序来找

lp[1,3] # 列表不能这样找

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)<ipython-input-64-8b65614beafa> in <module>()
----> 1 lp[1,3] # 列表不能这样找TypeError: list indices must be integers or slices, not tuple

nd[[1,1,2,3,1,2]]
# 用列表来做索引：按照列表中指定的次序来遍历数组

array([[0, 2, 7, 3],[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8],[0, 2, 7, 3],[1, 9, 0, 1]])

lp[[1,1]] # 列表的索引不能是列表

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)<ipython-input-66-e9ca25f0b661> in <module>()
----> 1 lp[[1,1]] # 列表的索引不能是列表TypeError: list indices must be integers or slices, not list

nd[[1,2,2,2]][[0,1,2]]

array([[0, 2, 7, 3],[1, 9, 0, 1],[1, 9, 0, 1]])

nd[[2,2,1]]

array([[1, 9, 0, 1],[1, 9, 0, 1],[0, 2, 7, 3]])

nd[[2,2,1,1],[1,2,1,1]]

array([9, 0, 2, 2])

2、切片

nd

array([[7, 9, 2, 3],[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8]])

nd[0:100] # 左闭右开的区间,右边可以无限大

array([[7, 9, 2, 3],[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8]])

lp[0:100]

[[1, 2, 3], [4, 5, 6], [7, 8]]

nd[:2]

array([[7, 9, 2, 3],[0, 2, 7, 3]])

nd[1:]

array([[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8]])

nd[3:0:-1]
# 如果步长为负数，代表从后往前数，要求区间也是倒着的

array([[4, 1, 2, 8],[1, 9, 0, 1],[0, 2, 7, 3]])

nd

array([[7, 9, 2, 3],[0, 2, 7, 3],[1, 9, 0, 1],[4, 1, 2, 8]])

nd[:,0::2]

array([[7, 2],[0, 7],[1, 0],[4, 2]])

nd[1:3,0:2] # 即切行又切列

array([[0, 2],[1, 9]])

把girl倒过来

girl

array([[[225, 231, 231],[229, 235, 235],[222, 228, 228],..., [206, 213, 162],[211, 213, 166],[217, 220, 173]],[[224, 230, 230],[229, 235, 235],[223, 229, 229],..., [206, 213, 162],[211, 213, 166],[217, 220, 173]],[[224, 230, 230],[229, 235, 235],[223, 229, 229],..., [206, 213, 162],[211, 213, 166],[219, 221, 174]],..., [[175, 187, 213],[180, 192, 218],[175, 187, 213],..., [155, 162, 180],[153, 160, 178],[156, 163, 181]],[[175, 187, 213],[180, 192, 218],[174, 186, 212],..., [155, 162, 180],[153, 160, 178],[155, 162, 180]],[[177, 189, 215],[181, 193, 219],[174, 186, 212],..., [155, 162, 180],[153, 160, 178],[156, 163, 181]]], dtype=uint8)

plt.imshow(girl[::-2,::-2])
plt.show()

拼图小游戏：把女孩放在老虎背上

t = tigger.copy() #

plt.imshow(tigger)
plt.show()

girl2 = plt.imread("./source/girl2.jpg")
plt.imshow(girl2)
plt.show()

# 给老虎挖坑
tigger[150:450,300:600] = girl2

plt.imshow(tigger)
plt.show()

3、变形

reshape()

resize()

tigger.shape

(786, 1200, 3)

nd = np.random.randint(0,10,size=12)
nd

array([4, 0, 1, 1, 8, 7, 7, 5, 3, 0, 7, 3])

nd.shape

(12,)

nd.reshape((3,2,2,1)) # 参数为一个元组，代表的就是要把nd变成一个什么形状

array([[[[4],[0]],[[1],[1]]],[[[8],[7]],[[7],[5]]],[[[3],[0]],[[7],[3]]]])

nd

array([4, 0, 1, 1, 8, 7, 7, 5, 3, 0, 7, 3])

nd.reshape((3,2))#cannot reshape array of size 12 into shape (3,8)
# 变形的时候size要保持一致

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-94-dda3397392b8> in <module>()
----> 1 nd.reshape((3,2))#cannot reshape array of size 12 into shape (3,8)ValueError: cannot reshape array of size 12 into shape (3,2)

nd.resize((2,6))

nd

array([[4, 0, 1, 1, 8, 7],[7, 5, 3, 0, 7, 3]])

【注意】

  1）形变之前和形变之后的数组的size要保持一致，否则无法形变2）reshape()函数是把原数组拷贝副本以后对副本进行形变，并且把形变的结果返回3）resize()函数在原来的数组上进行形变，不需要返回结果

4、级联

级联：就是按照指定的维度把两个数组连在一起

nd1 = np.random.randint(0,10,size=(4,4))
nd2 = np.random.randint(20,40,size=(3,4))

print(nd1)
print(nd2)

[[2 5 6 1][4 8 0 5][9 4 7 8][4 3 0 8]]
[[38 22 25 38][22 38 30 21][23 34 28 26]]

# 将两个数组进行级联
np.concatenate([nd1,nd2],axis=0)
# 参数1，是一个列表（或者元组），列表中是参与级联的那些数组
# 参数axis默认为0代表在行上（第0个维度）进行级联，1代表在列上（第1个维度）进行级联

array([[ 2,  5,  6,  1],[ 4,  8,  0,  5],[ 9,  4,  7,  8],[ 4,  3,  0,  8],[38, 22, 25, 38],[22, 38, 30, 21],[23, 34, 28, 26]])

np.concatenate([nd1,nd2],axis=1)
# 列级联需要行数一致

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-102-0a76346b819d> in <module>()
----> 1 np.concatenate([nd1,nd2],axis=1)ValueError: all the input array dimensions except for the concatenation axis must match exactly

nd3 = np.random.randint(0,10,size=(4,3))
nd3

array([[1, 3, 7],[9, 5, 3],[9, 0, 2],[0, 7, 4]])

nd1

array([[2, 5, 6, 1],[4, 8, 0, 5],[9, 4, 7, 8],[4, 3, 0, 8]])

np.concatenate([nd1,nd3])
# 列数不一致，不能进行行级联

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-106-871caaeeb895> in <module>()
----> 1 np.concatenate([nd1,nd3])ValueError: all the input array dimensions except for the concatenation axis must match exactly

np.concatenate([nd1,nd3],axis=1)

array([[2, 5, 6, 1, 1, 3, 7],[4, 8, 0, 5, 9, 5, 3],[9, 4, 7, 8, 9, 0, 2],[4, 3, 0, 8, 0, 7, 4]])

推广

1)形状一致才可以级联

nd4 = np.random.randint(0,10,size=(1,2,3))
nd5 = np.random.randint(0,10,size=(1,4,3))
print(nd4)
print(nd5)

[[[2 9 8][9 5 6]]]
[[[9 9 6][8 3 4][8 7 7][0 6 6]]]

np.concatenate([nd4,nd5],axis=1)

array([[[2, 9, 8],[9, 5, 6],[9, 9, 6],[8, 3, 4],[8, 7, 7],[0, 6, 6]]])

nd6 = np.random.randint(0,10,size=4)
nd6

array([3, 5, 3, 6])

2)维度不一致不能级联

np.concatenate([nd1,nd6])

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-124-6dd6213f71bc> in <module>()
----> 1 np.concatenate([nd1,nd6])ValueError: all the input arrays must have same number of dimensions

级联需要注意的问题：

1）维度必须一样
2）形状必须相符（axis等于哪个维度，我们去掉这个维度以后，剩余的形状必须一致）
3）级联方向可以有axis来指定，默认是0

针对于二维数组还有hstack和vstack

nd = np.random.randint(0,10,size=(10,1))
nd

array([[1],[7],[6],[9],[0],[4],[6],[2],[0],[8]])

np.hstack(nd)

array([1, 7, 6, 9, 0, 4, 6, 2, 0, 8])

nd1 = np.random.randint(0,10,size=(10,2))
nd1

array([[4, 4],[3, 1],[3, 3],[9, 6],[5, 1],[4, 7],[3, 3],[4, 3],[7, 9],[6, 5]])

np.hstack(nd1)

array([4, 4, 3, 1, 3, 3, 9, 6, 5, 1, 4, 7, 3, 3, 4, 3, 7, 9, 6, 5])

np.vstack(nd1)

array([[4, 4],[3, 1],[3, 3],[9, 6],[5, 1],[4, 7],[3, 3],[4, 3],[7, 9],[6, 5]])

nd2 = np.random.randint(0,10,size=10)
nd2

array([1, 7, 4, 3, 9, 0, 3, 3, 2, 5])

np.vstack(nd2)

array([[1],[7],[4],[3],[9],[0],[3],[3],[2],[5]])

np.hstack(nd2)

array([1, 7, 4, 3, 9, 0, 3, 3, 2, 5])

hstack()把列数组改成行数组，把二维数组改成一维
vstack()把行数组改成列数组，把一维数组改成二维（把一维数组中的每一个元素作为一行）

5、切分

切分就是把一个数组切成多个

vsplit()

hsplit()

split()

nd = np.random.randint(0,100,size=(5,6))
nd

array([[17, 47, 83, 33, 69, 24],[60,  4, 34, 29, 75, 60],[33, 55, 67,  1, 76, 82],[31, 92,  1, 14, 83, 95],[59, 88, 81, 49, 70, 11]])

# 水平方向上切分
np.hsplit(nd,[1,4,5,8,9])
# 参数1，代表被切分的数组，参数2，是一个列表，代表了切分点的位置

[array([[17],[60],[33],[31],[59]]), array([[47, 83, 33],[ 4, 34, 29],[55, 67,  1],[92,  1, 14],[88, 81, 49]]), array([[69],[75],[76],[83],[70]]), array([[24],[60],[82],[95],[11]]), array([], shape=(5, 0), dtype=int32), array([], shape=(5, 0), dtype=int32)]

# 竖直方向上切分
np.vsplit(nd,[1,3,5])

[array([[17, 47, 83, 33, 69, 24]]), array([[60,  4, 34, 29, 75, 60],[33, 55, 67,  1, 76, 82]]), array([[31, 92,  1, 14, 83, 95],[59, 88, 81, 49, 70, 11]]), array([], shape=(0, 6), dtype=int32)]

split()函数

nd

array([[17, 47, 83, 33, 69, 24],[60,  4, 34, 29, 75, 60],[33, 55, 67,  1, 76, 82],[31, 92,  1, 14, 83, 95],[59, 88, 81, 49, 70, 11]])

np.split(nd,[1,2],axis=0)
# axis默认为0代表在第0个维度上进行切分，1代表切的是第1个维度

[array([[17, 47, 83, 33, 69, 24]]),array([[60,  4, 34, 29, 75, 60]]),array([[33, 55, 67,  1, 76, 82],[31, 92,  1, 14, 83, 95],[59, 88, 81, 49, 70, 11]])]

推广

nd1 = np.random.randint(0,10,size=(3,4,5))
nd1

array([[[5, 7, 8, 7, 9],[3, 6, 1, 9, 0],[6, 0, 2, 6, 9],[4, 5, 5, 3, 9]],[[6, 7, 6, 2, 3],[3, 0, 0, 5, 3],[9, 9, 0, 6, 2],[5, 4, 5, 4, 4]],[[8, 7, 4, 8, 9],[2, 2, 1, 7, 3],[2, 2, 9, 4, 7],[7, 3, 9, 4, 1]]])

np.split(nd1,[2],axis=2)

[array([[[5, 7],[3, 6],[6, 0],[4, 5]],[[6, 7],[3, 0],[9, 9],[5, 4]],[[8, 7],[2, 2],[2, 2],[7, 3]]]), array([[[8, 7, 9],[1, 9, 0],[2, 6, 9],[5, 3, 9]],[[6, 2, 3],[0, 5, 3],[0, 6, 2],[5, 4, 4]],[[4, 8, 9],[1, 7, 3],[9, 4, 7],[9, 4, 1]]])]

6、副本

nd = np.random.randint(0,100,size=6)
nd

array([34, 69, 14,  2, 48, 74])

nd1 = nd
# 数组之间的赋值只是对地址一次拷贝，数组对象本身并没有被拷贝

nd1

array([34, 69, 14,  2, 48, 74])

nd1[0] = 100

nd1

array([100,  69,  14,   2,  48,  74])

nd

array([100,  69,  14,   2,  48,  74])

nd2 = nd.copy()
# copy函数是把nd引用的那个数组也拷贝一份副本，并且把这个副本的地址存入了nd2

nd2[0] = 200000

nd

array([100,  69,  14,   2,  48,  74])

nd1

array([100,  69,  14,   2,  48,  74])

nd2

array([200000,     69,     14,      2,     48,     74])

讨论：由列表创建数组的过程有木有副本的创建

l = [1,2,3]
l

[1, 2, 3]

nd = np.array(l)
nd

array([1, 2, 3])

nd[0] = 1000

[1, 2, 3]

说明：由列表创建数组的过程就是把列表拷贝出一个副本，然后把这个副本中的元素类型做一个统一化，然后放入数组对象中

四、ndarray的聚合操作

聚合操作指的就是对数组内部的数据进行某些特性的求解

1、求和

nd = np.random.randint(0,10,size=(3,4))
nd

array([[5, 9, 6, 8],[3, 7, 1, 9],[5, 7, 6, 3]])

nd.sum() # 完全聚合

nd.sum(axis=0) # 对行进行聚合（即对第0个维度进行聚合）

array([13, 23, 13, 20])

nd.sum(axis=1) # 对列进行聚合（即对第1个维度进行聚合）

array([28, 20, 21])

推广

nd = np.random.randint(0,10,size=(2,3,4))
nd

array([[[1, 0, 0, 3],[9, 6, 1, 8],[4, 9, 3, 9]],[[8, 0, 4, 3],[3, 0, 1, 8],[8, 0, 7, 4]]])

nd.sum()

nd.sum(axis=0)

array([[ 9,  0,  4,  6],[12,  6,  2, 16],[12,  9, 10, 13]])

nd.sum(axis=2)

array([[ 4, 24, 25],[15, 12, 19]])

聚合操作的规律：通过axis来改变聚合轴，axis=x的时候，第x的维度就会消失，把这个维度上对应的元素进行聚合

练习：给定一个4维矩阵，如何得到最后两维的和？

nd1 = np.random.randint(0,10,size=(2,3,4,5))
nd1

array([[[[3, 2, 9, 4, 0],[1, 0, 2, 3, 7],[4, 8, 6, 6, 5],[2, 3, 4, 1, 5]],[[3, 2, 0, 1, 3],[7, 3, 3, 4, 1],[0, 4, 0, 6, 9],[3, 8, 6, 0, 5]],[[5, 1, 3, 5, 0],[1, 4, 1, 8, 0],[9, 1, 9, 6, 5],[6, 1, 8, 5, 1]]],[[[7, 5, 3, 4, 5],[7, 8, 6, 7, 2],[9, 9, 5, 3, 4],[9, 2, 9, 7, 2]],[[3, 2, 9, 7, 7],[0, 8, 1, 3, 0],[1, 5, 5, 6, 5],[4, 8, 7, 2, 9]],[[1, 3, 5, 0, 6],[6, 0, 3, 5, 6],[2, 4, 6, 9, 0],[8, 7, 4, 0, 6]]]])

写法一

nd1.sum(axis=2).sum(axis=2)

array([[ 75,  68,  79],[113,  92,  81]])

写法二

nd1.sum(axis=-1).sum(axis=-1)

array([[ 75,  68,  79],[113,  92,  81]])

写法三

nd1.sum(axis=(-1,-2))

array([[ 75,  68,  79],[113,  92,  81]])

2、最值

nd

array([[[1, 0, 0, 3],[9, 6, 1, 8],[4, 9, 3, 9]],[[8, 0, 4, 3],[3, 0, 1, 8],[8, 0, 7, 4]]])

nd.sum(axis=-1)

array([[ 4, 24, 25],[15, 12, 19]])

nd.max()

nd.max(axis=-1)

array([[3, 9, 9],[8, 8, 8]])

nd.max(axis=1)

array([[9, 9, 3, 9],[8, 0, 7, 8]])

nd.min(axis=0)

array([[1, 0, 0, 3],[3, 0, 1, 8],[4, 0, 3, 4]])

3、其他聚合操作

Function Name    NaN-safe Version    Description
np.sum  np.nansum   Compute sum of elements
np.prod np.nanprod  Compute product of elements
np.mean np.nanmean  Compute mean of elements
np.std  np.nanstd   Compute standard deviation
np.var  np.nanvar   Compute variance
np.min  np.nanmin   Find minimum value
np.max  np.nanmax   Find maximum value
np.argmin   np.nanargmin    Find index of minimum value
np.argmax   np.nanargmax    Find index of maximum value
np.median   np.nanmedian    Compute median of elements
np.percentile   np.nanpercentile    Compute rank-based statistics of elements
np.any  N/A Evaluate whether any elements are true
np.all  N/A Evaluate whether all elements are true
np.power 幂运算

np.nan
# 这个数字代表的是缺失，默认是浮点类型
type(np.nan) # 任何数字和nan相运算都是缺失

float

np.nan + 10

nan

np.nan*10

nan

nd2 = np.array([12,23,np.nan,34,np.nan,90])
nd2

array([ 12.,  23.,  nan,  34.,  nan,  90.])

# 对nd2聚合
nd2.sum(axis=0)

nan

nd2.max()

nan

普通聚合对于有缺失的数组来说会造成干扰，就需要使用带nan的聚合

np.nansum(nd2)

159.0

np.nanmean(nd2)

39.75

聚合操作：

1）axis指定的是聚合的哪个维度，默认没有代表完全聚合（即把所有的数组全聚合起来最后得到一个常数），如果axis值指定哪个维度，这个维度就会消失，取而代之的是聚合以后的结果
2）numpy里面的聚合函数有两个版本带nan和不带nan，带nan的聚合会把缺失的那些项在聚合的时候直接剔除掉

思考题：如何根据第3列来对一个5*5矩阵排序？

nd = np.random.randint(0,100,size=(5,5))
nd

array([[70, 76, 87, 23, 68],[34,  3, 59, 93, 71],[71, 64, 98, 31, 70],[59, 17, 71, 99, 50],[86, 58, 91, 22, 18]])

排序

np.sort(nd,axis=0)

array([[34,  3, 59, 22, 18],[59, 17, 71, 23, 50],[70, 58, 87, 31, 68],[71, 64, 91, 93, 70],[86, 76, 98, 99, 71]])

np.sort(nd[:,3])

array([22, 23, 31, 93, 99])

nd[[4,0,2,1,3]]

array([[86, 58, 91, 22, 18],[70, 76, 87, 23, 68],[71, 64, 98, 31, 70],[34,  3, 59, 93, 71],[59, 17, 71, 99, 50]])

ind = np.argsort(nd[:,3]) # 按照从小到大的顺序排序以后，返回元素对应的下标
ind

array([4, 0, 2, 1, 3], dtype=int64)

nd[ind]

array([[86, 58, 91, 22, 18],[70, 76, 87, 23, 68],[71, 64, 98, 31, 70],[34,  3, 59, 93, 71],[59, 17, 71, 99, 50]])

五、ndarray的矩阵操作

1. 基本矩阵操作

1）算术运算（即加减乘除）

nd = np.random.randint(0,10,size=(3,3))
nd

array([[7, 4, 6],[4, 5, 1],[0, 2, 5]])

nd + nd

array([[14,  8, 12],[ 8, 10,  2],[ 0,  4, 10]])

nd + 2 # 在这里常数2会被放大成一个3*3的矩阵值全为2

array([[9, 6, 8],[6, 7, 3],[2, 4, 7]])

nd - 2

array([[ 5,  2,  4],[ 2,  3, -1],[-2,  0,  3]])

在数学矩阵是可以乘以或除以一个常数的

nd * 4

array([[28, 16, 24],[16, 20,  4],[ 0,  8, 20]])

nd / 4

array([[ 1.75,  1.  ,  1.5 ],[ 1.  ,  1.25,  0.25],[ 0.  ,  0.5 ,  1.25]])

1/nd

C:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide"""Entry point for launching an IPython kernel.array([[ 0.14285714,  0.25      ,  0.16666667],[ 0.25      ,  0.2       ,  1.        ],[        inf,  0.5       ,  0.2       ]])

2）矩阵积

nd1 = np.random.randint(0,10,size=(2,3))
nd2 = np.random.randint(0,10,size=(3,3))
print(nd1)
print(nd2)

[[8 3 5][3 3 5]]
[[4 1 0][1 3 0][7 6 7]]

np.dot(nd1,nd2)

array([[70, 47, 35],[50, 42, 35]])

两个矩阵A和B相乘的时候A*B的时候，数学上要求A列数要B的行数保持一致（因为我们在乘的时候是拿A的行乘B的列）

2. 广播机制

ndarray的广播机制的两条规则：

1、为缺失维度补1
2、假定缺失的元素用已有值填充

nd + nd1

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-243-1efd3ade59a4> in <module>()
----> 1 nd + nd1ValueError: operands could not be broadcast together with shapes (3,3) (2,3)

nd

array([[7, 4, 6],[4, 5, 1],[0, 2, 5]])

nd1 = np.random.randint(0,10,size=3)
nd1

array([1, 8, 6])

矩阵和向量相加减，矩阵和常数相加减，向量和常数相加减在数学上是不允许

在程序中，之所以可这样计算，原因是广播机制，把低维度的数据扩展成了和高维度形状类似的数据类型

nd + nd1

array([[ 8, 12, 12],[ 5, 13,  7],[ 1, 10, 11]])

nd1 + 3

array([ 4, 11,  9])

nd2 = np.random.randint(0,10,size=4)
nd2

array([8, 5, 1, 7])

nd1+nd2

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-249-99c1f2f85312> in <module>()
----> 1 nd1+nd2ValueError: operands could not be broadcast together with shapes (3,) (4,)

nd + nd2

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-250-434995cd4e14> in <module>()
----> 1 nd + nd2ValueError: operands could not be broadcast together with shapes (3,3) (4,)

nd3 = np.random.randint(0,10,size=(3,1))
nd3

array([[6],[8],[6]])

nd +nd3  # nd3是一个列向量，向量可以向矩阵广播

array([[13, 10, 12],[12, 13,  9],[ 6,  8, 11]])

广播机制的原则：

1）就是要把缺失的那些行或者列补充完整
2）我们可以把一个常数向任何一个矩阵或者向量进行广播，用常数来填补整个扩展的矩阵
3）向量可以向形状类似的举证广播（比如行向量可以向列数与其一致矩阵广播），向量在向矩阵广播的时候，用向量的行（或列）取填补扩展的矩阵