Python科学计算库核心知识点总结_代码篇(ML/DL依赖语法) 20190812 田超凡

开发环境：Python 3.6.8 Anaconda 3.8 Jupyter Notebook 1.0

1.time模块

import time;#1.格林威治西部的夏令时地区的偏移秒数print(time.altzone);#2.当前时间戳print(time.time());#3.接受时间元组并返回日期字符串#返回当前系统时间的时间元祖localtime，返回日期字符串localtime=time.localtime(time.time());print(time.asctime(localtime));-324001560517727.8200786Fri Jun 14 21:08:47 2019

In [38]:

#4.时差(计算相对于首次调用clock的时差)clock_1=time.clock();print(clock_1);clock_2=time.clock();print(clock_2);clock_3=time.clock();print(clock_3);clock_4=time.clock();print(clock_4);129.9167088774698129.916821925841129.91688063805958129.91693752691737

In [57]:

#5.若有参数，则参数为时间元祖#若没有参数，则默认取当前系统时间#返回日期字符串sc_time=time.ctime();print(sc_time);Tue Apr 9 10:12:41 2019

In [72]:

#6.日期字符串转时间元祖struct_time_tup=time.strptime("2019-04-09 08:05:02","%Y-%m-%d %H:%M:%S");print(struct_time_tup);#7.时间元祖转日期字符串date_str=time.strftime("%Y-%m-%d %H:%M:%S",struct_time_tup);print(date_str);time.struct_time(tm_year=2019, tm_mon=4, tm_mday=9, tm_hour=8, tm_min=5, tm_sec=2, tm_wday=1, tm_yday=99, tm_isdst=-1)2019-04-09 08:05:02

In [74]:

#8.时间元祖转换为时间戳sys_time_million=time.mktime(struct_time_tup);print(sys_time_million);1554768302.0

In [76]:

#9.时间元祖返回无格式化的日期字符串timestr_noneformat=time.asctime(struct_time_tup);print(timestr_noneformat);Tue Apr 9 08:05:02 2019

In [2]:

#10.定时器：推迟X秒后调用for i in range(1,11): print(i); time.sleep(0.3);12345678910

import time;#1.将字符串的时间"2017-10-10 23:40:00"转换为时间戳和时间元组str="2017-10-10 23:40:00";#时间元祖struct_timetup=time.strptime(str,"%Y-%m-%d %H:%M:%S");print(struct_timetup);#时间戳time_million=time.asctime(struct_timetup);print(time_million);time.struct_time(tm_year=2017, tm_mon=10, tm_mday=10, tm_hour=23, tm_min=40, tm_sec=0, tm_wday=1, tm_yday=283, tm_isdst=-1)Tue Oct 10 23:40:00 2017

In [29]:

import time;#2.字符串格式更改。如提time = "2017-10-10 23:40:00",想改为 time= "2017/10/10 23:40:00"str="2017-10-10 23:40:00";time_tup=time.strptime(str,"%Y-%m-%d %H:%M:%S");time_format_str=time.strftime("%Y/%m/%d %H:%M:%S",time_tup);print(time_format_str);2017/10/10 23:40:00

In [36]:

#3.将时间戳为1542088432转换为指定格式日期如：2018-10-10 10:51:45time_million="1542088432";time_tup=time.localtime(int(time_million));time_format_str=time.strftime("%Y-%m-%d %H:%M:%S",time_tup);print(time_format_str);2018-11-13 13:53:52

In [59]:

#4.获得三天前此时的时间是多少import datetime;import time;#result_minu计算结果为datetime类型result_minu=datetime.datetime.now()-datetime.timedelta(days=3);print(type(result_minu));format_str=result_minu.strftime("%Y-%m-%d %H:%M:%S");print(format_str);<class 'datetime.datetime'>2019-04-06 10:43:55

2.numpy模块

import numpy as np;#1.创建ndarray多维数组#eg 三维数组x,y,z# list=[# [# [1,1,1],# [2,2,2]# ],# [# [3,3,3],# [4,4,4]# ]# ];list=[ [1,2,3], [4,5,6], [7,8,9]];ndarray_one=np.array(list);ndarray_one

Out[15]:

array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [29]:

#2.使用zeros函数创建指定长度的全零数组#zeros(shape[, dtype, order]) 返回给定形状和类型的新数组，用零填充。# order：{'C'，'F'}，可选，是否在存储器中以C或Fortran连续（按行或列方式）存储多维数据。full_arr=np.zeros(shape=10,dtype=np.int32);full_arr#3.使用ones函数创建指定长度的全一数组full_one_arr=np.ones(shape=10,dtype=np.int32);full_one_arr

Out[29]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [46]:

#4.使用empty函数创建一个指定长度，但是没有任何值的空数组empty_array=np.empty(shape=10);empty_array

Out[46]:

array([6.23042070e-307, 1.89146896e-307, 2.22518251e-306, 1.33511969e-306, 1.78022342e-306, 6.02764687e+278, 3.89250200e+147, 1.78019625e-306, 1.42410974e-306, 1.33504432e-306])

ufunc通用计算函数

ufunc：numpy模块中对ndarray中数据进行快速元素级运算的函数，也可以看做是简单的函数(接受一个或多个标量值，并产生一个或多个标量值)的矢量化包装器。主要包括一元函数和二元函数

In [55]:

import numpy as np;#常用一元函数arr=np.arange(-36,0,dtype=np.int32);arr.shape=(4,9);arr_test=arr[0];arr_relative=np.abs(arr[0]);#计算整数、浮点数或者复数的绝对值，对于非复数，可以使用更快的fabsprint("原数组===>"+str(arr_test));print("abs绝对值===>"+str(np.abs(arr_test)));print("fabs快速取绝对值===>"+str(np.fabs(arr_test)));print("sqrt平方根===>"+str(np.sqrt(arr_relative)));print("square求平方===>"+str(np.square(arr_test)));print("exp指数e的x次方===>"+str(np.exp(arr_test)));print("自然对数===>"+str(np.log(arr_relative)));print("底数为10的log===>"+str(np.log10(arr_relative)));print("底数为2的log===>"+str(np.log2(arr_relative)));print("各个元素的正负号1 正数，0：零，-1：负数===>"+str(np.sign(arr_test)));print("向上取整===>"+str(np.ceil(arr_test)));print("向下取整===>"+str(np.floor(arr_test)));print("四舍五入到最接近的整数，保留dtype类型"+str(np.rint(arr_test)));print("是否非数字===>"+str(np.isnan(arr_test)));print("元素是否有穷===>"+str(np.isfinite(arr_test)));print("元素是否无穷===>"+str(np.isinf(arr_test)));print("====================三角函数=====================");print("f(sin)==>"+str(np.sin(arr_test)));print("f(cos)==>"+str(np.cos(arr_test)));print("f(cosh)==>"+str(np.cosh(arr_test)));print("f(sinh)==>"+str(np.sinh(arr_test)));print("f(tan)==>"+str(np.tan(arr_test)));print("f(tanh)==>"+str(np.tanh(arr_test)));print("====================反三角函数=====================");# print("f(arccos)==>"+str(np.arccos(arr_relative)));# # print("f(arccosh)==>"+str(np.arccosh(arr_test)));# print("f(arcsin)==>"+str(np.arcsin(arr_relative)));# # print("f(arcsinh)==>"+str(np.sinh(arr_test)));# print("f(arctan)==>"+str(np.arctan(arr_relative)));# print("f(arctanh)==>"+str(np.arctanh(arr_relative)));原数组===>[-36 -35 -34 -33 -32 -31 -30 -29 -28]abs绝对值===>[36 35 34 33 32 31 30 29 28]fabs快速取绝对值===>[36. 35. 34. 33. 32. 31. 30. 29. 28.]sqrt平方根===>[6. 5.91607978 5.83095189 5.74456265 5.65685425 5.56776436 5.47722558 5.38516481 5.29150262]square求平方===>[1296 1225 1156 1089 1024 961 900 841 784]exp指数e的x次方===>[2.31952283e-16 6.30511676e-16 1.71390843e-15 4.65888615e-15 1.26641655e-14 3.44247711e-14 9.35762297e-14 2.54366565e-13 6.91440011e-13]自然对数===>[3.58351894 3.55534806 3.52636052 3.49650756 3.4657359 3.4339872 3.40119738 3.36729583 3.33220451]底数为10的log===>[1.5563025 1.54406804 1.53147892 1.51851394 1.50514998 1.49136169 1.47712125 1.462398 1.44715803]底数为2的log===>[5.169925 5.12928302 5.08746284 5.04439412 5. 4.95419631 4.9068906 4.857981 4.80735492]各个元素的正负号1 正数，0：零，-1：负数===>[-1 -1 -1 -1 -1 -1 -1 -1 -1]向上取整===>[-36. -35. -34. -33. -32. -31. -30. -29. -28.]向下取整===>[-36. -35. -34. -33. -32. -31. -30. -29. -28.]四舍五入到最接近的整数，保留dtype类型[-36. -35. -34. -33. -32. -31. -30. -29. -28.]是否非数字===>[False False False False False False False False False]元素是否有穷===>[ True True True True True True True True True]元素是否无穷===>[False False False False False False False False False]====================三角函数=====================f(sin)==>[ 0.99177885 0.42818267 -0.52908269 -0.99991186 -0.55142668 0.40403765 0.98803162 0.66363388 -0.27090579]f(cos)==>[-0.12796369 -0.90369221 -0.84857027 -0.01327675 0.83422336 0.91474236 0.15425145 -0.74805753 -0.96260587]f(cosh)==>[2.15561577e+15 7.93006726e+14 2.91730871e+14 1.07321790e+14 3.94814801e+13 1.45244248e+13 5.34323729e+12 1.96566715e+12 7.23128532e+11]f(sinh)==>[-2.15561577e+15 -7.93006726e+14 -2.91730871e+14 -1.07321790e+14 -3.94814801e+13 -1.45244248e+13 -5.34323729e+12 -1.96566715e+12 -7.23128532e+11]f(tan)==>[-7.75047091 -0.47381472 0.62349896 75.3130148 -0.66100604 0.44169557 6.4053312 -0.88714284 0.2814296 ]f(tanh)==>[-1. -1. -1. -1. -1. -1. -1. -1. -1.]====================反三角函数=====================

In [89]:

#常用二元函数(元素级操作)arr1=np.arange(1,17).reshape((4,4));arr2=np.arange(18,34).reshape((4,4));print("取模运算===>"+str(np.mod(arr2,arr1)));print("矩阵积(点积)计算===>"+str(np.dot(arr1,arr2)));print("=======元素比较运算，返回布尔数组========");print("greater==>"+str(np.greater(arr1,arr2)));print("greater_equal==>"+str(np.greater_equal(arr1,arr2)));print("less==>"+str(np.less(arr1,arr2)));print("less_equal==>"+str(np.less_equal(arr1,arr2)));print("not_equal==>"+str(np.not_equal(arr1,arr2)));print("equal==>"+str(np.equal(arr1,arr2)));print("逻辑并且==>"+str(np.logical_and(arr1,arr2)));print("逻辑或==>"+str(np.logical_or(arr1,arr2)));print("逻辑非==>"+str(np.logical_xor(arr1,arr2)));print("power幂运算==>"+str(np.power(arr1,3)));取模运算===>[[0 1 2 1] [2 5 3 1] [8 7 6 5] [4 3 2 1]]矩阵积(点积)计算===>[[ 260 270 280 290] [ 644 670 696 722] [1028 1070 1112 1154] [1412 1470 1528 1586]]=======元素比较运算，返回布尔数组========greater==>[[False False False False] [False False False False] [False False False False] [False False False False]]greater_equal==>[[False False False False] [False False False False] [False False False False] [False False False False]]less==>[[ True True True True] [ True True True True] [ True True True True] [ True True True True]]less_equal==>[[ True True True True] [ True True True True] [ True True True True] [ True True True True]]not_equal==>[[ True True True True] [ True True True True] [ True True True True] [ True True True True]]equal==>[[False False False False] [False False False False] [False False False False] [False False False False]]逻辑并且==>[[ True True True True] [ True True True True] [ True True True True] [ True True True True]]逻辑或==>[[ True True True True] [ True True True True] [ True True True True] [ True True True True]]逻辑非==>[[False False False False] [False False False False] [False False False False] [False False False False]]power幂运算==>[[ 1 8 27 64] [ 125 216 343 512] [ 729 1000 1331 1728] [2197 2744 3375 4096]]

numpy_ufunc聚合函数

聚合函数是对一组值(eg一个数组)进行操作，返回一个单一值作为结果的函数。当然聚合函数也可以指定对某个具体的轴进行数据聚合操作；常见的聚合操作有：平均值、最大值、最小值、总体标准偏差等等正在上传…重新上传取消In [21]:

import numpy as np;#计算最大值、最小值、平均值、方差、偏差、标准差arr=np.arange(1,13).reshape(4,3);print("最大值==>"+str(np.max(arr)));print("最小值==>"+str(np.min(arr)));print("累加和==>"+str(np.sum(arr)));print("平均值==>"+str(np.mean(arr)));print("")#方差偏差平方的平均值print("方差、偏差平方的平均值==>"+str(np.mean((arr-arr.mean())**2)));#标准差=偏差的平方的平均值的平方根print("偏差的平方的平均值的平方根==>"+str(np.sqrt(np.mean((arr-arr.mean())**2))));print("标准差==>"+str(np.std(arr)));#axis=0根据列计算 axis=1根据行计算 axis=2根据上一层结果计算 -1=<X<=2print("===============根据轴变换聚合求值========================");print("最大值==>"+str(arr.max(axis=1)));print("最小值==>"+str(arr.min(axis=1)));print("累加和==>"+str(arr.sum(axis=1)));print("平均值==>"+str(arr.mean(axis=1)));print("标准差==>"+str(arr.std(axis=1)));最大值==>12最小值==>1累加和==>78平均值==>6.5方差、偏差平方的平均值==>11.916666666666666偏差的平方的平均值的平方根==>3.452052529534663标准差==>3.452052529534663===============根据轴变换聚合求值========================最大值==>[ 3 6 9 12]最小值==>[ 1 4 7 10]累加和==>[ 6 15 24 33]平均值==>[ 2. 5. 8. 11.]标准差==>[0.81649658 0.81649658 0.81649658 0.81649658]

numpy_ufunc其他函数
import numpy as np;
#1.np.where函数是三元表达式x if condition else y的矢量化版本
# where(condition, [x, y])
#将数组中的所有异常数字替换为0，比如将NaN替换为0
list=[
    [1,2,np.NaN,4],
    [4,5,6,np.NaN],
    [7,8,9,np.inf],
    [np.inf,np.e,np.pi,4]
];
arr=np.array(list);
arr

condition=np.isnan(arr) | np.isinf(arr);
new_arr=np.where(condition,0,arr);
new_arr;

#2.np.unique函数去重
arr=np.random.randint(1,5,(3,3));
print(arr);
print("===========================");
print(np.unique(arr));

[[1 1 4] [4 2 3] [2 4 4]]===========================[1 2 3 4]#ndarray矩阵积运算

矩阵：多维数组即矩阵矩阵积(matrix product)：两个二维矩阵(行和列的矩阵)满足第一个矩阵的列数与第二个矩阵的行数相同，那么可以进行矩阵的乘法，即矩阵积. 矩阵积不是元素级的运算。也称为点积、数量积。矩阵积的运算方式：横向上的数据和纵向上的对应位置上的数据分别相乘再相加==>目标矩阵的第一个元素

In [93]:

import numpy as np;#ndarray多维数组的切片list=[ [ [2,3,4,5,7,8,9], [6,8,9,1,10,12,15] ], [ [8,8,8,5,7,8,9], [6,8,9,1,15,16,18] ], [ [6,8,9,1,10,12,15], [2,3,4,5,7,8,9] ]];for i in range(0,3): temp_arr=[]; for j in range(1,8): temp_arr.append(j*2); list[i].append(temp_arr); #创建多维数组more_batch_arr=np.array(list);more_batch_arr;#1.根据维度切片获取每个维度数组的第三列以后数据# [[[ 2 3 4 5 7 8 9]# [ 6 8 9 1 10 12 15]# [ 2 4 6 8 10 12 14]]# [[ 8 8 8 5 7 8 9]# [ 6 8 9 1 15 16 18]# [ 2 4 6 8 10 12 14]]# [[ 6 8 9 1 10 12 15]# [ 2 3 4 5 7 8 9]# [ 2 4 6 8 10 12 14]]]# ================================================================# [[[ 1 15 16 18]]]# ================================================================# [ 1 15 16 18]# print(more_batch_arr);# print("================================================================");# print(more_batch_arr[1:2,1:2,3:]);# print("================================================================");# print(more_batch_arr[1][1][3:]);#2.根据维度切片获取每个维度第二个数组的第5个元素# print(more_batch_arr);# print("=============================================");# print(more_batch_arr[:,1:2,5:6]);#3.原数组切片生成的新数组元素改变会同时改变原数组对应的元素# print(more_batch_arr);temp_arr=more_batch_arr[2][0][1:3];temp_arr[0]=999;# [[[ 2 3 4 5 7 8 9]# [ 6 8 9 1 10 12 15]# [ 2 4 6 8 10 12 14]]# [[ 8 8 8 5 7 8 9]# [ 6 8 9 1 15 16 18]# [ 2 4 6 8 10 12 14]]# [[ 6 999 9 1 10 12 15]# [ 2 3 4 5 7 8 9]# [ 2 4 6 8 10 12 14]]]print(more_batch_arr);[[[ 2 3 4 5 7 8 9] [ 6 8 9 1 10 12 15] [ 2 4 6 8 10 12 14]] [[ 8 8 8 5 7 8 9] [ 6 8 9 1 15 16 18] [ 2 4 6 8 10 12 14]] [[ 6 999 9 1 10 12 15] [ 2 3 4 5 7 8 9] [ 2 4 6 8 10 12 14]]]

In [41]:

#3.ndarray花式索引：使用数组切片arr=np.array(list);arr;#(0,1) (2,2)数据.....new_arr=arr[[0,2],[1,2]];print(arr);print("===================================");print(new_arr);[[[ 2 3 4 5 7 8 9] [ 6 8 9 1 10 12 15] [ 2 4 6 8 10 12 14]] [[ 8 8 8 5 7 8 9] [ 6 8 9 1 15 16 18] [ 2 4 6 8 10 12 14]] [[ 6 8 9 1 10 12 15] [ 2 3 4 5 7 8 9] [ 2 4 6 8 10 12 14]]]===================================[[ 6 8 9 1 10 12 15] [ 2 4 6 8 10 12 14]]

In [61]:

#4.ndarray布尔类型索引A=np.random.random((5,5));AB=A<0.5;Bprint(A.shape);print(B.shape);C=A[B];Cprint(C);print("=================================================================================================");print(str(C.size)+","+str(C.shape));(5, 5)(5, 5)[0.43295127 0.25033959 0.46266342 0.21866253 0.25209698 0.39495449 0.21935515 0.11646636 0.47662111 0.1685607 0.32221494 0.25938553 0.31796039 0.03912522 0.19376713 0.04493688]=================================================================================================16,(16,)

In [94]:

#5 花式索引-练习：#(1)输出joe的成绩#(2)输出joe的数学成绩#(3)输出joe和anne的成绩#(4)输出非joe和anne的成绩names=np.array(["joe","tom","anne"]);scores=np.array([ [70,80,90], [77,88,91], [80,90,70]]);classes=np.array(["语文","数学","英语"]);print("joe的成绩是："+str(scores[[0][0]]));print("joe的数学成绩是：");print(scores[::2]);print("非joe和anne的成绩是："+str(scores[1:2]));joe的成绩是：[70 80 90]joe的数学成绩是：[[70 80 90] [80 90 70]]非joe和anne的成绩是：[[77 88 91]]

In [101]:

#6 使用dot函数进行矩阵积运算arr_1=np.arange(40).reshape(5,-1);arr_1;arr_2=np.random.randint(1,50,(8,3));arr_2print(arr_1);print("============================================================================");print(arr_2);print("============================================================================");result=np.dot(arr_1,arr_2);print(result);[[ 0 1 2 3 4 5 6 7] [ 8 9 10 11 12 13 14 15] [16 17 18 19 20 21 22 23] [24 25 26 27 28 29 30 31] [32 33 34 35 36 37 38 39]]============================================================================[[30 37 31] [35 36 19] [36 25 39] [12 37 47] [41 4 34] [ 8 6 15] [32 18 15] [42 14 10]]============================================================================[[ 833 449 609] [2721 1865 2289] [4609 3281 3969] [6497 4697 5649] [8385 6113 7329]]import numpy as np;#1.ndarray矢量化运算example_arr=np.arange(1,50,2);example_arr.shape=(5,5);example_arr# 数组与标量运算# example_arr+5# example_arr-5# example_arr*2# example_arr/2# example_arr**2# 数组与数组运算arr_2=np.arange(10,35);arr_2.shape=(5,-1);arr_2# example_arr+arr_2# example_arr-arr_2;# example_arr*arr_2# example_arr/arr_2# example_arr**arr_2#2.广播(矩阵和向量运算):维度m*n ==> row*col#(1)一个数组可以是任意m*n维，另一个数组只能是1*n维，n为col数，必须对应才能广播#(2)一个数组可以是任意m*n维，另一个数组只能是n*1维，n为row数，必须对应才能广播#综上所述，两个数组的维度row(或col)必须有一个为1，另一个数组的row(或col)与第一个数组的维度剩下的值对应#广播只能在row为1的矩阵上进行(即向量)#3*5example_arr=np.random.randint(1,9,(1,3));example_arr#5*5example_arr_2=np.random.randint(1,9,(3,3));# [[6 5 4]]# [[2 5 3]# [4 7 4]# [4 4 2]]# [[ 8 10 7]# [10 12 8]# [10 9 6]]print(example_arr);print();print(example_arr_2);print();print(example_arr+example_arr_2);print("======================================");#3*1example_arr=np.random.randint(1,9,(3,1));#3*5example_arr_2=np.random.randint(1,9,(3,5));# [[1]# [8]# [7]]# [[7 2 5 3 5]# [5 5 4 6 6]# [2 2 7 2 5]]# [[ 8 3 6 4 6]# [13 13 12 14 14]# [ 9 9 14 9 12]]print(example_arr);print();print(example_arr_2);print();print(example_arr+example_arr_2);[[6 5 4]][[2 5 3] [4 7 4] [4 4 2]][[ 8 10 7] [10 12 8] [10 9 6]]import numpy as np;#ndarray数组转置和轴变换arr=np.arange(40).reshape((5,-1));arr#通过transpose函数转置arr_t=arr.transpose();arr_tprint(arr);print("==========================================");print(arr_t);#通过T属性转置arr_t=arr.T;print("==========================================");print(arr_t);[[ 0 1 2 3 4 5 6 7] [ 8 9 10 11 12 13 14 15] [16 17 18 19 20 21 22 23] [24 25 26 27 28 29 30 31] [32 33 34 35 36 37 38 39]]==========================================[[ 0 8 16 24 32] [ 1 9 17 25 33] [ 2 10 18 26 34] [ 3 11 19 27 35] [ 4 12 20 28 36] [ 5 13 21 29 37] [ 6 14 22 30 38] [ 7 15 23 31 39]]==========================================[[ 0 8 16 24 32] [ 1 9 17 25 33] [ 2 10 18 26 34] [ 3 11 19 27 35] [ 4 12 20 28 36] [ 5 13 21 29 37] [ 6 14 22 30 38] [ 7 15 23 31 39]]

In [23]:

#transpose参数(0,1,2)的含义是轴变换，即对原数组的shape顺序做调整后进行数组转置ex_arr_1=np.arange(1,41).reshape(2,5,4);ex_arr_1print(ex_arr_1);#shape变成了(2,4,5) ex_arr_2=ex_arr_1.transpose((0,2,1));ex_arr_2;print(ex_arr_2);#数组转置还原ex_arr_1=ex_arr_2.transpose((0,2,1));ex_arr_1;print(ex_arr_1);[[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16] [17 18 19 20]] [[21 22 23 24] [25 26 27 28] [29 30 31 32] [33 34 35 36] [37 38 39 40]]][[[ 1 5 9 13 17] [ 2 6 10 14 18] [ 3 7 11 15 19] [ 4 8 12 16 20]] [[21 25 29 33 37] [22 26 30 34 38] [23 27 31 35 39] [24 28 32 36 40]]][[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16] [17 18 19 20]] [[21 22 23 24] [25 26 27 28] [29 30 31 32] [33 34 35 36] [37 38 39 40]]]import numpy as np;#数组拉伸和合并#调用函数np.tile(A,rep)拉伸数组A，延A的维度拉伸rep次.rep为元祖(整体纵向x倍,整体横向y倍)arr_1=np.arange(1,31).reshape((2,3,5));arr_1#拉伸数组arr_1arr_2=np.tile(arr_1,(2,1));arr_2# [[[ 1 2 3 4 5]# [ 6 7 8 9 10]# [11 12 13 14 15]# [ 1 2 3 4 5]# [ 6 7 8 9 10]# [11 12 13 14 15]]# [[16 17 18 19 20]# [21 22 23 24 25]# [26 27 28 29 30]# [16 17 18 19 20]# [21 22 23 24 25]# [26 27 28 29 30]]]print(arr_2);[[[ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15] [ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15]] [[16 17 18 19 20] [21 22 23 24 25] [26 27 28 29 30] [16 17 18 19 20] [21 22 23 24 25] [26 27 28 29 30]]]

In [50]:

#ndarray数组堆叠(操作同一个数组)#stack(arrays, axis=0, out=None) arrays要加入的新数组，axis新的轴# arr_2=np.arange(1,26).reshape((5,5));#axis=0 按列算(无变化) axis=1按行算(以每个数组的每一行为单位追加) axis=2按第二层算(即对axis=1的结果中每个数组的列为单位累加计算)#axis=-1通axis=2计算第二层，注意axis取值只能是-1=<X<=2，越界则抛出异常arr_1=np.arange(1,31).reshape((2,3,5));print(arr_1);print("============================================");arr_3=np.stack(arr_1,axis=0)arr_4=np.stack(arr_1,axis=1)arr_5=np.stack(arr_1,axis=2)print(arr_3);print("============================================");print(arr_4);print("============================================");print(arr_5);[[[ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15]] [[16 17 18 19 20] [21 22 23 24 25] [26 27 28 29 30]]]============================================[[[ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15]] [[16 17 18 19 20] [21 22 23 24 25] [26 27 28 29 30]]]============================================[[[ 1 2 3 4 5] [16 17 18 19 20]] [[ 6 7 8 9 10] [21 22 23 24 25]] [[11 12 13 14 15] [26 27 28 29 30]]]============================================[[[ 1 16] [ 2 17] [ 3 18] [ 4 19] [ 5 20]] [[ 6 21] [ 7 22] [ 8 23] [ 9 24] [10 25]] [[11 26] [12 27] [13 28] [14 29] [15 30]]]

In [61]:

#ndarray多个数组合并arr_1=np.arange(1,25).reshape((4,6));arr_1arr_2=np.arange(25,49).reshape((4,6));arr_2axis_one_arr=np.stack((arr_1,arr_2),axis=0);axis_one_arr #2,4,6axis_two_arr=np.stack((arr_1,arr_2),axis=1);axis_two_arr #4,2,6axis_three_arr=np.stack((arr_1,arr_2),axis=2);axis_three_arr #4,6,2

Out[61]:

array([[[ 1, 25], [ 2, 26], [ 3, 27], [ 4, 28], [ 5, 29], [ 6, 30]], [[ 7, 31], [ 8, 32], [ 9, 33], [10, 34], [11, 35], [12, 36]], [[13, 37], [14, 38], [15, 39], [16, 40], [17, 41], [18, 42]], [[19, 43], [20, 44], [21, 45], [22, 46], [23, 47], [24, 48]]])

In [87]:

#ndarray数组横向、纵向合并arr_1=np.arange(30,62).reshape((8,4));arr_1arr_2=np.arange(45,77).reshape((8,4));arr_2#数组横向合并arr_h=np.hstack((arr_1,arr_2));# [[30 31 32 33 45 46 47 48]# [34 35 36 37 49 50 51 52]# [38 39 40 41 53 54 55 56]# [42 43 44 45 57 58 59 60]# [46 47 48 49 61 62 63 64]# [50 51 52 53 65 66 67 68]# [54 55 56 57 69 70 71 72]# [58 59 60 61 73 74 75 76]]# print(arr_h);#数组纵向合并# [[30 31 32 33]# [34 35 36 37]# [38 39 40 41]# [42 43 44 45]# [46 47 48 49]# [50 51 52 53]# [54 55 56 57]# [58 59 60 61]# [45 46 47 48]# [49 50 51 52]# [53 54 55 56]# [57 58 59 60]# [61 62 63 64]# [65 66 67 68]# [69 70 71 72]# [73 74 75 76]]# arr_v=np.vstack((arr_1,arr_2));print(arr_v);[[30 31 32 33] [34 35 36 37] [38 39 40 41] [42 43 44 45] [46 47 48 49] [50 51 52 53] [54 55 56 57] [58 59 60 61] [45 46 47 48] [49 50 51 52] [53 54 55 56] [57 58 59 60] [61 62 63 64] [65 66 67 68] [69 70 71 72] [73 74 75 76]]import numpy as np;#ndarray多维数组创建help(np.ndarray);Help on class ndarray in module numpy:class ndarray(builtins.object) | ndarray(shape, dtype=float, buffer=None, offset=0, | strides=None, order=None) | | An array object represents a multidimensional, homogeneous array | of fixed-size items. An associated data-type object describes the | format of each element in the array (its byte-order, how many bytes it | occupies in memory, whether it is an integer, a floating point number, | or something else, etc.) | | Arrays should be constructed using `array`, `zeros` or `empty` (refer | to the See Also section below). The parameters given here refer to | a low-level method (`ndarray(...)`) for instantiating an array. | | For more information, refer to the `numpy` module and examine the | methods and attributes of an array. | | Parameters | ---------- | (for the __new__ method; see Notes below) | | shape : tuple of ints | Shape of created array. | dtype : data-type, optional | Any object that can be interpreted as a numpy data type. | buffer : object exposing buffer interface, optional | Used to fill the array with data. | offset : int, optional | Offset of array data in buffer. | strides : tuple of ints, optional | Strides of data in memory. | order : {'C', 'F'}, optional | Row-major (C-style) or column-major (Fortran-style) order. | | Attributes | ---------- | T : ndarray | Transpose of the array. | data : buffer | The array's elements, in memory. | dtype : dtype object | Describes the format of the elements in the array. | flags : dict | Dictionary containing information related to memory use, e.g., | 'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc. | flat : numpy.flatiter object | Flattened version of the array as an iterator. The iterator | allows assignments, e.g., ``x.flat = 3`` (See `ndarray.flat` for | assignment examples; TODO). | imag : ndarray | Imaginary part of the array. | real : ndarray | Real part of the array. | size : int | Number of elements in the array. | itemsize : int | The memory use of each array element in bytes. | nbytes : int | The total number of bytes required to store the array data, | i.e., ``itemsize * size``. | ndim : int | The array's number of dimensions. | shape : tuple of ints | Shape of the array. | strides : tuple of ints | The step-size required to move from one element to the next in | memory. For example, a contiguous ``(3, 4)`` array of type | ``int16`` in C-order has strides ``(8, 2)``. This implies that | to move from element to element in memory requires jumps of 2 bytes. | To move from row-to-row, one needs to jump 8 bytes at a time | (``2 * 4``). | ctypes : ctypes object | Class containing properties of the array needed for interaction | with ctypes. | base : ndarray | If the array is a view into another array, that array is its `base` | (unless that array is also a view). The `base` array is where the | array data is actually stored. | | See Also | -------- | array : Construct an array. | zeros : Create an array, each element of which is zero. | empty : Create an array, but leave its allocated memory unchanged (i.e., | it contains "garbage"). | dtype : Create a data-type. | | Notes | ----- | There are two modes of creating an array using ``__new__``: | | 1. If `buffer` is None, then only `shape`, `dtype`, and `order` | are used. | 2. If `buffer` is an object exposing the buffer interface, then | all keywords are interpreted. | | No ``__init__`` method is needed because the array is fully initialized | after the ``__new__`` method. | | Examples | -------- | These examples illustrate the low-level `ndarray` constructor. Refer | to the `See Also` section above for easier ways of constructing an | ndarray. | | First mode, `buffer` is None: | | >>> np.ndarray(shape=(2,2), dtype=float, order='F') | array([[ -1.13698227e+002, 4.25087011e-303], | [ 2.88528414e-306, 3.27025015e-309]]) #random | | Second mode: | | >>> np.ndarray((2,), buffer=np.array([1,2,3]), | ... offset=np.int_().itemsize, | ... dtype=int) # offset = 1*itemsize, i.e. skip first element | array([2, 3]) | | Methods defined here: | | __abs__(self, /) | abs(self) | | __add__(self, value, /) | Return self+value. | | __and__(self, value, /) | Return self&value. | | __array__(...) | a.__array__(|dtype) -> reference if type unchanged, copy otherwise. | | Returns either a new reference to self if dtype is not given or a new array | of provided data type if dtype is different from the current dtype of the | array. | | __array_prepare__(...) | a.__array_prepare__(obj) -> Object of same type as ndarray object obj. | | __array_ufunc__(...) | | __array_wrap__(...) | a.__array_wrap__(obj) -> Object of same type as ndarray object a. | | __bool__(self, /) | self != 0 | | __complex__(...) | | __contains__(self, key, /) | Return key in self. | | __copy__(...) | a.__copy__() | | Used if :func:`copy.copy` is called on an array. Returns a copy of the array. | | Equivalent to ``a.copy(order='K')``. | | __deepcopy__(...) | a.__deepcopy__(memo, /) -> Deep copy of array. | | Used if :func:`copy.deepcopy` is called on an array. | | __delitem__(self, key, /) | Delete self[key]. | | __divmod__(self, value, /) | Return divmod(self, value). | | __eq__(self, value, /) | Return self==value. | | __float__(self, /) | float(self) | | __floordiv__(self, value, /) | Return self//value. | | __format__(...) | default object formatter | | __ge__(self, value, /) | Return self>=value. | | __getitem__(self, key, /) | Return self[key]. | | __gt__(self, value, /) | Return self>value. | | __iadd__(self, value, /) | Return self+=value. | | __iand__(self, value, /) | Return self&=value. | | __ifloordiv__(self, value, /) | Return self//=value. | | __ilshift__(self, value, /) | Return self<<=value. | | __imatmul__(self, value, /) | Return self@=value. | | __imod__(self, value, /) | Return self%=value. | | __imul__(self, value, /) | Return self*=value. | | __index__(self, /) | Return self converted to an integer, if self is suitable for use as an index into a list. | | __int__(self, /) | int(self) | | __invert__(self, /) | ~self | | __ior__(self, value, /) | Return self|=value. | | __ipow__(self, value, /) | Return self**=value. | | __irshift__(self, value, /) | Return self>>=value. | | __isub__(self, value, /) | Return self-=value. | | __iter__(self, /) | Implement iter(self). | | __itruediv__(self, value, /) | Return self/=value. | | __ixor__(self, value, /) | Return self^=value. | | __le__(self, value, /) | Return self<=value. | | __len__(self, /) | Return len(self). | | __lshift__(self, value, /) | Return self<<value. | | __lt__(self, value, /) | Return self<value. | | __matmul__(self, value, /) | Return self@value. | | __mod__(self, value, /) | Return self%value. | | __mul__(self, value, /) | Return self*value. | | __ne__(self, value, /) | Return self!=value. | | __neg__(self, /) | -self | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature. | | __or__(self, value, /) | Return self|value. | | __pos__(self, /) | +self | | __pow__(self, value, mod=None, /) | Return pow(self, value, mod). | | __radd__(self, value, /) | Return value+self. | | __rand__(self, value, /) | Return value&self. | | __rdivmod__(self, value, /) | Return divmod(value, self). | | __reduce__(...) | a.__reduce__() | | For pickling. | | __repr__(self, /) | Return repr(self). | | __rfloordiv__(self, value, /) | Return value//self. | | __rlshift__(self, value, /) | Return value<<self. | | __rmatmul__(self, value, /) | Return value@self. | | __rmod__(self, value, /) | Return value%self. | | __rmul__(self, value, /) | Return value*self. | | __ror__(self, value, /) | Return value|self. | | __rpow__(self, value, mod=None, /) | Return pow(value, self, mod). | | __rrshift__(self, value, /) | Return value>>self. | | __rshift__(self, value, /) | Return self>>value. | | __rsub__(self, value, /) | Return value-self. | | __rtruediv__(self, value, /) | Return value/self. | | __rxor__(self, value, /) | Return value^self. | | __setitem__(self, key, value, /) | Set self[key] to value. | | __setstate__(...) | a.__setstate__(state, /) | | For unpickling. | | The `state` argument must be a sequence that contains the following | elements: | | Parameters | ---------- | version : int | optional pickle version. If omitted defaults to 0. | shape : tuple | dtype : data-type | isFortran : bool | rawdata : string or list | a binary string with the data (or a list if 'a' is an object array) | | __sizeof__(...) | __sizeof__() -> int | size of object in memory, in bytes | | __str__(self, /) | Return str(self). | | __sub__(self, value, /) | Return self-value. | | __truediv__(self, value, /) | Return self/value. | | __xor__(self, value, /) | Return self^value. | | all(...) | a.all(axis=None, out=None, keepdims=False) | | Returns True if all elements evaluate to True. | | Refer to `numpy.all` for full documentation. | | See Also | -------- | numpy.all : equivalent function | | any(...) | a.any(axis=None, out=None, keepdims=False) | | Returns True if any of the elements of `a` evaluate to True. | | Refer to `numpy.any` for full documentation. | | See Also | -------- | numpy.any : equivalent function | | argmax(...) | a.argmax(axis=None, out=None) | | Return indices of the maximum values along the given axis. | | Refer to `numpy.argmax` for full documentation. | | See Also | -------- | numpy.argmax : equivalent function | | argmin(...) | a.argmin(axis=None, out=None) | | Return indices of the minimum values along the given axis of `a`. | | Refer to `numpy.argmin` for detailed documentation. | | See Also | -------- | numpy.argmin : equivalent function | | argpartition(...) | a.argpartition(kth, axis=-1, kind='introselect', order=None) | | Returns the indices that would partition this array. | | Refer to `numpy.argpartition` for full documentation. | | .. versionadded:: 1.8.0 | | See Also | -------- | numpy.argpartition : equivalent function | | argsort(...) | a.argsort(axis=-1, kind='quicksort', order=None) | | Returns the indices that would sort this array. | | Refer to `numpy.argsort` for full documentation. | | See Also | -------- | numpy.argsort : equivalent function | | astype(...) | a.astype(dtype, order='K', casting='unsafe', subok=True, copy=True) | | Copy of the array, cast to a specified type. | | Parameters | ---------- | dtype : str or dtype | Typecode or data-type to which the array is cast. | order : {'C', 'F', 'A', 'K'}, optional | Controls the memory layout order of the result. | 'C' means C order, 'F' means Fortran order, 'A' | means 'F' order if all the arrays are Fortran contiguous, | 'C' order otherwise, and 'K' means as close to the | order the array elements appear in memory as possible. | Default is 'K'. | casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional | Controls what kind of data casting may occur. Defaults to 'unsafe' | for backwards compatibility. | | * 'no' means the data types should not be cast at all. | * 'equiv' means only byte-order changes are allowed. | * 'safe' means only casts which can preserve values are allowed. | * 'same_kind' means only safe casts or casts within a kind, | like float64 to float32, are allowed. | * 'unsafe' means any data conversions may be done. | subok : bool, optional | If True, then sub-classes will be passed-through (default), otherwise | the returned array will be forced to be a base-class array. | copy : bool, optional | By default, astype always returns a newly allocated array. If this | is set to false, and the `dtype`, `order`, and `subok` | requirements are satisfied, the input array is returned instead | of a copy. | | Returns | ------- | arr_t : ndarray | Unless `copy` is False and the other conditions for returning the input | array are satisfied (see description for `copy` input parameter), `arr_t` | is a new array of the same shape as the input array, with dtype, order | given by `dtype`, `order`. | | Notes | ----- | Starting in NumPy 1.9, astype method now returns an error if the string | dtype to cast to is not long enough in 'safe' casting mode to hold the max | value of integer/float array that is being casted. Previously the casting | was allowed even if the result was truncated. | | Raises | ------ | ComplexWarning | When casting from complex to float or int. To avoid this, | one should use ``a.real.astype(t)``. | | Examples | -------- | >>> x = np.array([1, 2, 2.5]) | >>> x | array([ 1. , 2. , 2.5]) | | >>> x.astype(int) | array([1, 2, 2]) | | byteswap(...) | a.byteswap(inplace=False) | | Swap the bytes of the array elements | | Toggle between low-endian and big-endian data representation by | returning a byteswapped array, optionally swapped in-place. | | Parameters | ---------- | inplace : bool, optional | If ``True``, swap bytes in-place, default is ``False``. | | Returns | ------- | out : ndarray | The byteswapped array. If `inplace` is ``True``, this is | a view to self. | | Examples | -------- | >>> A = np.array([1, 256, 8755], dtype=np.int16) | >>> map(hex, A) | ['0x1', '0x100', '0x2233'] | >>> A.byteswap(inplace=True) | array([ 256, 1, 13090], dtype=int16) | >>> map(hex, A) | ['0x100', '0x1', '0x3322'] | | Arrays of strings are not swapped | | >>> A = np.array(['ceg', 'fac']) | >>> A.byteswap() | array(['ceg', 'fac'], | dtype='|S3') | | choose(...) | a.choose(choices, out=None, mode='raise') | | Use an index array to construct a new array from a set of choices. | | Refer to `numpy.choose` for full documentation. | | See Also | -------- | numpy.choose : equivalent function | | clip(...) | a.clip(min=None, max=None, out=None) | | Return an array whose values are limited to ``[min, max]``. | One of max or min must be given. | | Refer to `numpy.clip` for full documentation. | | See Also | -------- | numpy.clip : equivalent function | | compress(...) | a.compress(condition, axis=None, out=None) | | Return selected slices of this array along given axis. | | Refer to `numpy.compress` for full documentation. | | See Also | -------- | numpy.compress : equivalent function | | conj(...) | a.conj() | | Complex-conjugate all elements. | | Refer to `numpy.conjugate` for full documentation. | | See Also | -------- | numpy.conjugate : equivalent function | | conjugate(...) | a.conjugate() | | Return the complex conjugate, element-wise. | | Refer to `numpy.conjugate` for full documentation. | | See Also | -------- | numpy.conjugate : equivalent function | | copy(...) | a.copy(order='C') | | Return a copy of the array. | | Parameters | ---------- | order : {'C', 'F', 'A', 'K'}, optional | Controls the memory layout of the copy. 'C' means C-order, | 'F' means F-order, 'A' means 'F' if `a` is Fortran contiguous, | 'C' otherwise. 'K' means match the layout of `a` as closely | as possible. (Note that this function and :func:`numpy.copy` are very | similar, but have different default values for their order= | arguments.) | | See also | -------- | numpy.copy | numpy.copyto | | Examples | -------- | >>> x = np.array([[1,2,3],[4,5,6]], order='F') | | >>> y = x.copy() | | >>> x.fill(0) | | >>> x | array([[0, 0, 0], | [0, 0, 0]]) | | >>> y | array([[1, 2, 3], | [4, 5, 6]]) | | >>> y.flags['C_CONTIGUOUS'] | True | | cumprod(...) | a.cumprod(axis=None, dtype=None, out=None) | | Return the cumulative product of the elements along the given axis. | | Refer to `numpy.cumprod` for full documentation. | | See Also | -------- | numpy.cumprod : equivalent function | | cumsum(...) | a.cumsum(axis=None, dtype=None, out=None) | | Return the cumulative sum of the elements along the given axis. | | Refer to `numpy.cumsum` for full documentation. | | See Also | -------- | numpy.cumsum : equivalent function | | diagonal(...) | a.diagonal(offset=0, axis1=0, axis2=1) | | Return specified diagonals. In NumPy 1.9 the returned array is a | read-only view instead of a copy as in previous NumPy versions. In | a future version the read-only restriction will be removed. | | Refer to :func:`numpy.diagonal` for full documentation. | | See Also | -------- | numpy.diagonal : equivalent function | | dot(...) | a.dot(b, out=None) | | Dot product of two arrays. | | Refer to `numpy.dot` for full documentation. | | See Also | -------- | numpy.dot : equivalent function | | Examples | -------- | >>> a = np.eye(2) | >>> b = np.ones((2, 2)) * 2 | >>> a.dot(b) | array([[ 2., 2.], | [ 2., 2.]]) | | This array method can be conveniently chained: | | >>> a.dot(b).dot(b) | array([[ 8., 8.], | [ 8., 8.]]) | | dump(...) | a.dump(file) | | Dump a pickle of the array to the specified file. | The array can be read back with pickle.load or numpy.load. | | Parameters | ---------- | file : str | A string naming the dump file. | | dumps(...) | a.dumps() | | Returns the pickle of the array as a string. | pickle.loads or numpy.loads will convert the string back to an array. | | Parameters | ---------- | None | | fill(...) | a.fill(value) | | Fill the array with a scalar value. | | Parameters | ---------- | value : scalar | All elements of `a` will be assigned this value. | | Examples | -------- | >>> a = np.array([1, 2]) | >>> a.fill(0) | >>> a | array([0, 0]) | >>> a = np.empty(2) | >>> a.fill(1) | >>> a | array([ 1., 1.]) | | flatten(...) | a.flatten(order='C') | | Return a copy of the array collapsed into one dimension. | | Parameters | ---------- | order : {'C', 'F', 'A', 'K'}, optional | 'C' means to flatten in row-major (C-style) order. | 'F' means to flatten in column-major (Fortran- | style) order. 'A' means to flatten in column-major | order if `a` is Fortran *contiguous* in memory, | row-major order otherwise. 'K' means to flatten | `a` in the order the elements occur in memory. | The default is 'C'. | | Returns | ------- | y : ndarray | A copy of the input array, flattened to one dimension. | | See Also | -------- | ravel : Return a flattened array. | flat : A 1-D flat iterator over the array. | | Examples | -------- | >>> a = np.array([[1,2], [3,4]]) | >>> a.flatten() | array([1, 2, 3, 4]) | >>> a.flatten('F') | array([1, 3, 2, 4]) | | getfield(...) | a.getfield(dtype, offset=0) | | Returns a field of the given array as a certain type. | | A field is a view of the array data with a given data-type. The values in | the view are determined by the given type and the offset into the current | array in bytes. The offset needs to be such that the view dtype fits in the | array dtype; for example an array of dtype complex128 has 16-byte elements. | If taking a view with a 32-bit integer (4 bytes), the offset needs to be | between 0 and 12 bytes. | | Parameters | ---------- | dtype : str or dtype | The data type of the view. The dtype size of the view can not be larger | than that of the array itself. | offset : int | Number of bytes to skip before beginning the element view. | | Examples | -------- | >>> x = np.diag([1.+1.j]*2) | >>> x[1, 1] = 2 + 4.j | >>> x | array([[ 1.+1.j, 0.+0.j], | [ 0.+0.j, 2.+4.j]]) | >>> x.getfield(np.float64) | array([[ 1., 0.], | [ 0., 2.]]) | | By choosing an offset of 8 bytes we can select the complex part of the | array for our view: | | >>> x.getfield(np.float64, offset=8) | array([[ 1., 0.], | [ 0., 4.]]) | | item(...) | a.item(*args) | | Copy an element of an array to a standard Python scalar and return it. | | Parameters | ---------- | \*args : Arguments (variable number and type) | | * none: in this case, the method only works for arrays | with one element (`a.size == 1`), which element is | copied into a standard Python scalar object and returned. | | * int_type: this argument is interpreted as a flat index into | the array, specifying which element to copy and return. | | * tuple of int_types: functions as does a single int_type argument, | except that the argument is interpreted as an nd-index into the | array. | | Returns | ------- | z : Standard Python scalar object | A copy of the specified element of the array as a suitable | Python scalar | | Notes | ----- | When the data type of `a` is longdouble or clongdouble, item() returns | a scalar array object because there is no available Python scalar that | would not lose information. Void arrays return a buffer object for item(), | unless fields are defined, in which case a tuple is returned. | | `item` is very similar to a[args], except, instead of an array scalar, | a standard Python scalar is returned. This can be useful for speeding up | access to elements of the array and doing arithmetic on elements of the | array using Python's optimized math. | | Examples | -------- | >>> x = np.random.randint(9, size=(3, 3)) | >>> x | array([[3, 1, 7], | [2, 8, 3], | [8, 5, 3]]) | >>> x.item(3) | 2 | >>> x.item(7) | 5 | >>> x.item((0, 1)) | 1 | >>> x.item((2, 2)) | 3 | | itemset(...) | a.itemset(*args) | | Insert scalar into an array (scalar is cast to array's dtype, if possible) | | There must be at least 1 argument, and define the last argument | as *item*. Then, ``a.itemset(*args)`` is equivalent to but faster | than ``a[args] = item``. The item should be a scalar value and `args` | must select a single item in the array `a`. | | Parameters | ---------- | \*args : Arguments | If one argument: a scalar, only used in case `a` is of size 1. | If two arguments: the last argument is the value to be set | and must be a scalar, the first argument specifies a single array | element location. It is either an int or a tuple. | | Notes | ----- | Compared to indexing syntax, `itemset` provides some speed increase | for placing a scalar into a particular location in an `ndarray`, | if you must do this. However, generally this is discouraged: | among other problems, it complicates the appearance of the code. | Also, when using `itemset` (and `item`) inside a loop, be sure | to assign the methods to a local variable to avoid the attribute | look-up at each loop iteration. | | Examples | -------- | >>> x = np.random.randint(9, size=(3, 3)) | >>> x | array([[3, 1, 7], | [2, 8, 3], | [8, 5, 3]]) | >>> x.itemset(4, 0) | >>> x.itemset((2, 2), 9) | >>> x | array([[3, 1, 7], | [2, 0, 3], | [8, 5, 9]]) | | max(...) | a.max(axis=None, out=None, keepdims=False) | | Return the maximum along a given axis. | | Refer to `numpy.amax` for full documentation. | | See Also | -------- | numpy.amax : equivalent function | | mean(...) | a.mean(axis=None, dtype=None, out=None, keepdims=False) | | Returns the average of the array elements along given axis. | | Refer to `numpy.mean` for full documentation. | | See Also | -------- | numpy.mean : equivalent function | | min(...) | a.min(axis=None, out=None, keepdims=False) | | Return the minimum along a given axis. | | Refer to `numpy.amin` for full documentation. | | See Also | -------- | numpy.amin : equivalent function | | newbyteorder(...) | arr.newbyteorder(new_order='S') | | Return the array with the same data viewed with a different byte order. | | Equivalent to:: | | arr.view(arr.dtype.newbytorder(new_order)) | | Changes are also made in all fields and sub-arrays of the array data | type. | | | | Parameters | ---------- | new_order : string, optional | Byte order to force; a value from the byte order specifications | below. `new_order` codes can be any of: | | * 'S' - swap dtype from current to opposite endian | * {'<', 'L'} - little endian | * {'>', 'B'} - big endian | * {'=', 'N'} - native order | * {'|', 'I'} - ignore (no change to byte order) | | The default value ('S') results in swapping the current | byte order. The code does a case-insensitive check on the first | letter of `new_order` for the alternatives above. For example, | any of 'B' or 'b' or 'biggish' are valid to specify big-endian. | | | Returns | ------- | new_arr : array | New array object with the dtype reflecting given change to the | byte order. | | nonzero(...) | a.nonzero() | | Return the indices of the elements that are non-zero. | | Refer to `numpy.nonzero` for full documentation. | | See Also | -------- | numpy.nonzero : equivalent function | | partition(...) | a.partition(kth, axis=-1, kind='introselect', order=None) | | Rearranges the elements in the array in such a way that the value of the | element in kth position is in the position it would be in a sorted array. | All elements smaller than the kth element are moved before this element and | all equal or greater are moved behind it. The ordering of the elements in | the two partitions is undefined. | | .. versionadded:: 1.8.0 | | Parameters | ---------- | kth : int or sequence of ints | Element index to partition by. The kth element value will be in its | final sorted position and all smaller elements will be moved before it | and all equal or greater elements behind it. | The order of all elements in the partitions is undefined. | If provided with a sequence of kth it will partition all elements | indexed by kth of them into their sorted position at once. | axis : int, optional | Axis along which to sort. Default is -1, which means sort along the | last axis. | kind : {'introselect'}, optional | Selection algorithm. Default is 'introselect'. | order : str or list of str, optional | When `a` is an array with fields defined, this argument specifies | which fields to compare first, second, etc. A single field can | be specified as a string, and not all fields need to be specified, | but unspecified fields will still be used, in the order in which | they come up in the dtype, to break ties. | | See Also | -------- | numpy.partition : Return a parititioned copy of an array. | argpartition : Indirect partition. | sort : Full sort. | | Notes | ----- | See ``np.partition`` for notes on the different algorithms. | | Examples | -------- | >>> a = np.array([3, 4, 2, 1]) | >>> a.partition(3) | >>> a | array([2, 1, 3, 4]) | | >>> a.partition((1, 3)) | array([1, 2, 3, 4]) | | prod(...) | a.prod(axis=None, dtype=None, out=None, keepdims=False) | | Return the product of the array elements over the given axis | | Refer to `numpy.prod` for full documentation. | | See Also | -------- | numpy.prod : equivalent function | | ptp(...) | a.ptp(axis=None, out=None, keepdims=False) | | Peak to peak (maximum - minimum) value along a given axis. | | Refer to `numpy.ptp` for full documentation. | | See Also | -------- | numpy.ptp : equivalent function | | put(...) | a.put(indices, values, mode='raise') | | Set ``a.flat[n] = values[n]`` for all `n` in indices. | | Refer to `numpy.put` for full documentation. | | See Also | -------- | numpy.put : equivalent function | | ravel(...) | a.ravel([order]) | | Return a flattened array. | | Refer to `numpy.ravel` for full documentation. | | See Also | -------- | numpy.ravel : equivalent function | | ndarray.flat : a flat iterator on the array. | | repeat(...) | a.repeat(repeats, axis=None) | | Repeat elements of an array. | | Refer to `numpy.repeat` for full documentation. | | See Also | -------- | numpy.repeat : equivalent function | | reshape(...) | a.reshape(shape, order='C') | | Returns an array containing the same data with a new shape. | | Refer to `numpy.reshape` for full documentation. | | See Also | -------- | numpy.reshape : equivalent function | | Notes | ----- | Unlike the free function `numpy.reshape`, this method on `ndarray` allows | the elements of the shape parameter to be passed in as separate arguments. | For example, ``a.reshape(10, 11)`` is equivalent to | ``a.reshape((10, 11))``. | | resize(...) | a.resize(new_shape, refcheck=True) | | Change shape and size of array in-place. | | Parameters | ---------- | new_shape : tuple of ints, or `n` ints | Shape of resized array. | refcheck : bool, optional | If False, reference count will not be checked. Default is True. | | Returns | ------- | None | | Raises | ------ | ValueError | If `a` does not own its own data or references or views to it exist, | and the data memory must be changed. | PyPy only: will always raise if the data memory must be changed, since | there is no reliable way to determine if references or views to it | exist. | | SystemError | If the `order` keyword argument is specified. This behaviour is a | bug in NumPy. | | See Also | -------- | resize : Return a new array with the specified shape. | | Notes | ----- | This reallocates space for the data area if necessary. | | Only contiguous arrays (data elements consecutive in memory) can be | resized. | | The purpose of the reference count check is to make sure you | do not use this array as a buffer for another Python object and then | reallocate the memory. However, reference counts can increase in | other ways so if you are sure that you have not shared the memory | for this array with another Python object, then you may safely set | `refcheck` to False. | | Examples | -------- | Shrinking an array: array is flattened (in the order that the data are | stored in memory), resized, and reshaped: | | >>> a = np.array([[0, 1], [2, 3]], order='C') | >>> a.resize((2, 1)) | >>> a | array([[0], | [1]]) | | >>> a = np.array([[0, 1], [2, 3]], order='F') | >>> a.resize((2, 1)) | >>> a | array([[0], | [2]]) | | Enlarging an array: as above, but missing entries are filled with zeros: | | >>> b = np.array([[0, 1], [2, 3]]) | >>> b.resize(2, 3) # new_shape parameter doesn't have to be a tuple | >>> b | array([[0, 1, 2], | [3, 0, 0]]) | | Referencing an array prevents resizing... | | >>> c = a | >>> a.resize((1, 1)) | Traceback (most recent call last): | ... | ValueError: cannot resize an array that has been referenced ... | | Unless `refcheck` is False: | | >>> a.resize((1, 1), refcheck=False) | >>> a | array([[0]]) | >>> c | array([[0]]) | | round(...) | a.round(decimals=0, out=None) | | Return `a` with each element rounded to the given number of decimals. | | Refer to `numpy.around` for full documentation. | | See Also | -------- | numpy.around : equivalent function | | searchsorted(...) | a.searchsorted(v, side='left', sorter=None) | | Find indices where elements of v should be inserted in a to maintain order. | | For full documentation, see `numpy.searchsorted` | | See Also | -------- | numpy.searchsorted : equivalent function | | setfield(...) | a.setfield(val, dtype, offset=0) | | Put a value into a specified place in a field defined by a data-type. | | Place `val` into `a`'s field defined by `dtype` and beginning `offset` | bytes into the field. | | Parameters | ---------- | val : object | Value to be placed in field. | dtype : dtype object | Data-type of the field in which to place `val`. | offset : int, optional | The number of bytes into the field at which to place `val`. | | Returns | ------- | None | | See Also | -------- | getfield | | Examples | -------- | >>> x = np.eye(3) | >>> x.getfield(np.float64) | array([[ 1., 0., 0.], | [ 0., 1., 0.], | [ 0., 0., 1.]]) | >>> x.setfield(3, np.int32) | >>> x.getfield(np.int32) | array([[3, 3, 3], | [3, 3, 3], | [3, 3, 3]]) | >>> x | array([[ 1.00000000e+000, 1.48219694e-323, 1.48219694e-323], | [ 1.48219694e-323, 1.00000000e+000, 1.48219694e-323], | [ 1.48219694e-323, 1.48219694e-323, 1.00000000e+000]]) | >>> x.setfield(np.eye(3), np.int32) | >>> x | array([[ 1., 0., 0.], | [ 0., 1., 0.], | [ 0., 0., 1.]]) | | setflags(...) | a.setflags(write=None, align=None, uic=None) | | Set array flags WRITEABLE, ALIGNED, (WRITEBACKIFCOPY and UPDATEIFCOPY), | respectively. | | These Boolean-valued flags affect how numpy interprets the memory | area used by `a` (see Notes below). The ALIGNED flag can only | be set to True if the data is actually aligned according to the type. | The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY flags can never be set | to True. The flag WRITEABLE can only be set to True if the array owns its | own memory, or the ultimate owner of the memory exposes a writeable buffer | interface, or is a string. (The exception for string is made so that | unpickling can be done without copying memory.) | | Parameters | ---------- | write : bool, optional | Describes whether or not `a` can be written to. | align : bool, optional | Describes whether or not `a` is aligned properly for its type. | uic : bool, optional | Describes whether or not `a` is a copy of another "base" array. | | Notes | ----- | Array flags provide information about how the memory area used | for the array is to be interpreted. There are 7 Boolean flags | in use, only four of which can be changed by the user: | WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED. | | WRITEABLE (W) the data area can be written to; | | ALIGNED (A) the data and strides are aligned appropriately for the hardware | (as determined by the compiler); | | UPDATEIFCOPY (U) (deprecated), replaced by WRITEBACKIFCOPY; | | WRITEBACKIFCOPY (X) this array is a copy of some other array (referenced | by .base). When the C-API function PyArray_ResolveWritebackIfCopy is | called, the base array will be updated with the contents of this array. | | All flags can be accessed using the single (upper case) letter as well | as the full name. | | Examples | -------- | >>> y | array([[3, 1, 7], | [2, 0, 0], | [8, 5, 9]]) | >>> y.flags | C_CONTIGUOUS : True | F_CONTIGUOUS : False | OWNDATA : True | WRITEABLE : True | ALIGNED : True | WRITEBACKIFCOPY : False | UPDATEIFCOPY : False | >>> y.setflags(write=0, align=0) | >>> y.flags | C_CONTIGUOUS : True | F_CONTIGUOUS : False | OWNDATA : True | WRITEABLE : False | ALIGNED : False | WRITEBACKIFCOPY : False | UPDATEIFCOPY : False | >>> y.setflags(uic=1) | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | ValueError: cannot set WRITEBACKIFCOPY flag to True | | sort(...) | a.sort(axis=-1, kind='quicksort', order=None) | | Sort an array, in-place. | | Parameters | ---------- | axis : int, optional | Axis along which to sort. Default is -1, which means sort along the | last axis. | kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional | Sorting algorithm. Default is 'quicksort'. | order : str or list of str, optional | When `a` is an array with fields defined, this argument specifies | which fields to compare first, second, etc. A single field can | be specified as a string, and not all fields need be specified, | but unspecified fields will still be used, in the order in which | they come up in the dtype, to break ties. | | See Also | -------- | numpy.sort : Return a sorted copy of an array. | argsort : Indirect sort. | lexsort : Indirect stable sort on multiple keys. | searchsorted : Find elements in sorted array. | partition: Partial sort. | | Notes | ----- | See ``sort`` for notes on the different sorting algorithms. | | Examples | -------- | >>> a = np.array([[1,4], [3,1]]) | >>> a.sort(axis=1) | >>> a | array([[1, 4], | [1, 3]]) | >>> a.sort(axis=0) | >>> a | array([[1, 3], | [1, 4]]) | | Use the `order` keyword to specify a field to use when sorting a | structured array: | | >>> a = np.array([('a', 2), ('c', 1)], dtype=[('x', 'S1'), ('y', int)]) | >>> a.sort(order='y') | >>> a | array([('c', 1), ('a', 2)], | dtype=[('x', '|S1'), ('y', '<i4')]) | | squeeze(...) | a.squeeze(axis=None) | | Remove single-dimensional entries from the shape of `a`. | | Refer to `numpy.squeeze` for full documentation. | | See Also | -------- | numpy.squeeze : equivalent function | | std(...) | a.std(axis=None, dtype=None, out=None, ddof=0, keepdims=False) | | Returns the standard deviation of the array elements along given axis. | | Refer to `numpy.std` for full documentation. | | See Also | -------- | numpy.std : equivalent function | | sum(...) | a.sum(axis=None, dtype=None, out=None, keepdims=False) | | Return the sum of the array elements over the given axis. | | Refer to `numpy.sum` for full documentation. | | See Also | -------- | numpy.sum : equivalent function | | swapaxes(...) | a.swapaxes(axis1, axis2) | | Return a view of the array with `axis1` and `axis2` interchanged. | | Refer to `numpy.swapaxes` for full documentation. | | See Also | -------- | numpy.swapaxes : equivalent function | | take(...) | a.take(indices, axis=None, out=None, mode='raise') | | Return an array formed from the elements of `a` at the given indices. | | Refer to `numpy.take` for full documentation. | | See Also | -------- | numpy.take : equivalent function | | tobytes(...) | a.tobytes(order='C') | | Construct Python bytes containing the raw data bytes in the array. | | Constructs Python bytes showing a copy of the raw contents of | data memory. The bytes object can be produced in either 'C' or 'Fortran', | or 'Any' order (the default is 'C'-order). 'Any' order means C-order | unless the F_CONTIGUOUS flag in the array is set, in which case it | means 'Fortran' order. | | .. versionadded:: 1.9.0 | | Parameters | ---------- | order : {'C', 'F', None}, optional | Order of the data for multidimensional arrays: | C, Fortran, or the same as for the original array. | | Returns | ------- | s : bytes | Python bytes exhibiting a copy of `a`'s raw data. | | Examples | -------- | >>> x = np.array([[0, 1], [2, 3]]) | >>> x.tobytes() | b'\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' | >>> x.tobytes('C') == x.tobytes() | True | >>> x.tobytes('F') | b'\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00' | | tofile(...) | a.tofile(fid, sep="", format="%s") | | Write array to a file as text or binary (default). | | Data is always written in 'C' order, independent of the order of `a`. | The data produced by this method can be recovered using the function | fromfile(). | | Parameters | ---------- | fid : file or str | An open file object, or a string containing a filename. | sep : str | Separator between array items for text output. | If "" (empty), a binary file is written, equivalent to | ``file.write(a.tobytes())``. | format : str | Format string for text file output. | Each entry in the array is formatted to text by first converting | it to the closest Python type, and then using "format" % item. | | Notes | ----- | This is a convenience function for quick storage of array data. | Information on endianness and precision is lost, so this method is not a | good choice for files intended to archive data or transport data between | machines with different endianness. Some of these problems can be overcome | by outputting the data as text files, at the expense of speed and file | size. | | When fid is a file object, array contents are directly written to the | file, bypassing the file object's ``write`` method. As a result, tofile | cannot be used with files objects supporting compression (e.g., GzipFile) | or file-like objects that do not support ``fileno()`` (e.g., BytesIO). | | tolist(...) | a.tolist() | | Return the array as a (possibly nested) list. | | Return a copy of the array data as a (nested) Python list. | Data items are converted to the nearest compatible Python type. | | Parameters | ---------- | none | | Returns | ------- | y : list | The possibly nested list of array elements. | | Notes | ----- | The array may be recreated, ``a = np.array(a.tolist())``. | | Examples | -------- | >>> a = np.array([1, 2]) | >>> a.tolist() | [1, 2] | >>> a = np.array([[1, 2], [3, 4]]) | >>> list(a) | [array([1, 2]), array([3, 4])] | >>> a.tolist() | [[1, 2], [3, 4]] | | tostring(...) | a.tostring(order='C') | | Construct Python bytes containing the raw data bytes in the array. | | Constructs Python bytes showing a copy of the raw contents of | data memory. The bytes object can be produced in either 'C' or 'Fortran', | or 'Any' order (the default is 'C'-order). 'Any' order means C-order | unless the F_CONTIGUOUS flag in the array is set, in which case it | means 'Fortran' order. | | This function is a compatibility alias for tobytes. Despite its name it returns bytes not strings. | | Parameters | ---------- | order : {'C', 'F', None}, optional | Order of the data for multidimensional arrays: | C, Fortran, or the same as for the original array. | | Returns | ------- | s : bytes | Python bytes exhibiting a copy of `a`'s raw data. | | Examples | -------- | >>> x = np.array([[0, 1], [2, 3]]) | >>> x.tobytes() | b'\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' | >>> x.tobytes('C') == x.tobytes() | True | >>> x.tobytes('F') | b'\x00\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00' | | trace(...) | a.trace(offset=0, axis1=0, axis2=1, dtype=None, out=None) | | Return the sum along diagonals of the array. | | Refer to `numpy.trace` for full documentation. | | See Also | -------- | numpy.trace : equivalent function | | transpose(...) | a.transpose(*axes) | | Returns a view of the array with axes transposed. | | For a 1-D array, this has no effect. (To change between column and | row vectors, first cast the 1-D array into a matrix object.) | For a 2-D array, this is the usual matrix transpose. | For an n-D array, if axes are given, their order indicates how the | axes are permuted (see Examples). If axes are not provided and | ``a.shape = (i[0], i[1], ... i[n-2], i[n-1])``, then | ``a.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0])``. | | Parameters | ---------- | axes : None, tuple of ints, or `n` ints | | * None or no argument: reverses the order of the axes. | | * tuple of ints: `i` in the `j`-th place in the tuple means `a`'s | `i`-th axis becomes `a.transpose()`'s `j`-th axis. | | * `n` ints: same as an n-tuple of the same ints (this form is | intended simply as a "convenience" alternative to the tuple form) | | Returns | ------- | out : ndarray | View of `a`, with axes suitably permuted. | | See Also | -------- | ndarray.T : Array property returning the array transposed. | | Examples | -------- | >>> a = np.array([[1, 2], [3, 4]]) | >>> a | array([[1, 2], | [3, 4]]) | >>> a.transpose() | array([[1, 3], | [2, 4]]) | >>> a.transpose((1, 0)) | array([[1, 3], | [2, 4]]) | >>> a.transpose(1, 0) | array([[1, 3], | [2, 4]]) | | var(...) | a.var(axis=None, dtype=None, out=None, ddof=0, keepdims=False) | | Returns the variance of the array elements, along given axis. | | Refer to `numpy.var` for full documentation. | | See Also | -------- | numpy.var : equivalent function | | view(...) | a.view(dtype=None, type=None) | | New view of array with the same data. | | Parameters | ---------- | dtype : data-type or ndarray sub-class, optional | Data-type descriptor of the returned view, e.g., float32 or int16. The | default, None, results in the view having the same data-type as `a`. | This argument can also be specified as an ndarray sub-class, which | then specifies the type of the returned object (this is equivalent to | setting the ``type`` parameter). | type : Python type, optional | Type of the returned view, e.g., ndarray or matrix. Again, the | default None results in type preservation. | | Notes | ----- | ``a.view()`` is used two different ways: | | ``a.view(some_dtype)`` or ``a.view(dtype=some_dtype)`` constructs a view | of the array's memory with a different data-type. This can cause a | reinterpretation of the bytes of memory. | | ``a.view(ndarray_subclass)`` or ``a.view(type=ndarray_subclass)`` just | returns an instance of `ndarray_subclass` that looks at the same array | (same shape, dtype, etc.) This does not cause a reinterpretation of the | memory. | | For ``a.view(some_dtype)``, if ``some_dtype`` has a different number of | bytes per entry than the previous dtype (for example, converting a | regular array to a structured array), then the behavior of the view | cannot be predicted just from the superficial appearance of ``a`` (shown | by ``print(a)``). It also depends on exactly how ``a`` is stored in | memory. Therefore if ``a`` is C-ordered versus fortran-ordered, versus | defined as a slice or transpose, etc., the view may give different | results. | | | Examples | -------- | >>> x = np.array([(1, 2)], dtype=[('a', np.int8), ('b', np.int8)]) | | Viewing array data using a different type and dtype: | | >>> y = x.view(dtype=np.int16, type=np.matrix) | >>> y | matrix([[513]], dtype=int16) | >>> print(type(y)) | <class 'numpy.matrixlib.defmatrix.matrix'> | | Creating a view on a structured array so it can be used in calculations | | >>> x = np.array([(1, 2),(3,4)], dtype=[('a', np.int8), ('b', np.int8)]) | >>> xv = x.view(dtype=np.int8).reshape(-1,2) | >>> xv | array([[1, 2], | [3, 4]], dtype=int8) | >>> xv.mean(0) | array([ 2., 3.]) | | Making changes to the view changes the underlying array | | >>> xv[0,1] = 20 | >>> print(x) | [(1, 20) (3, 4)] | | Using a view to convert an array to a recarray: | | >>> z = x.view(np.recarray) | >>> z.a | array([1], dtype=int8) | | Views share data: | | >>> x[0] = (9, 10) | >>> z[0] | (9, 10) | | Views that change the dtype size (bytes per entry) should normally be | avoided on arrays defined by slices, transposes, fortran-ordering, etc.: | | >>> x = np.array([[1,2,3],[4,5,6]], dtype=np.int16) | >>> y = x[:, 0:2] | >>> y | array([[1, 2], | [4, 5]], dtype=int16) | >>> y.view(dtype=[('width', np.int16), ('length', np.int16)]) | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | ValueError: new type not compatible with array. | >>> z = y.copy() | >>> z.view(dtype=[('width', np.int16), ('length', np.int16)]) | array([[(1, 2)], | [(4, 5)]], dtype=[('width', '<i2'), ('length', '<i2')]) | | ---------------------------------------------------------------------- | Data descriptors defined here: | | T | Same as self.transpose(), except that self is returned if | self.ndim < 2. | | Examples | -------- | >>> x = np.array([[1.,2.],[3.,4.]]) | >>> x | array([[ 1., 2.], | [ 3., 4.]]) | >>> x.T | array([[ 1., 3.], | [ 2., 4.]]) | >>> x = np.array([1.,2.,3.,4.]) | >>> x | array([ 1., 2., 3., 4.]) | >>> x.T | array([ 1., 2., 3., 4.]) | | __array_finalize__ | None. | | __array_interface__ | Array protocol: Python side. | | __array_priority__ | Array priority. | | __array_struct__ | Array protocol: C-struct side. | | base | Base object if memory is from some other object. | | Examples | -------- | The base of an array that owns its memory is None: | | >>> x = np.array([1,2,3,4]) | >>> x.base is None | True | | Slicing creates a view, whose memory is shared with x: | | >>> y = x[2:] | >>> y.base is x | True | | ctypes | An object to simplify the interaction of the array with the ctypes | module. | | This attribute creates an object that makes it easier to use arrays | when calling shared libraries with the ctypes module. The returned | object has, among others, data, shape, and strides attributes (see | Notes below) which themselves return ctypes objects that can be used | as arguments to a shared library. | | Parameters | ---------- | None | | Returns | ------- | c : Python object | Possessing attributes data, shape, strides, etc. | | See Also | -------- | numpy.ctypeslib | | Notes | ----- | Below are the public attributes of this object which were documented | in "Guide to NumPy" (we have omitted undocumented public attributes, | as well as documented private attributes): | | * data: A pointer to the memory area of the array as a Python integer. | This memory area may contain data that is not aligned, or not in correct | byte-order. The memory area may not even be writeable. The array | flags and data-type of this array should be respected when passing this | attribute to arbitrary C-code to avoid trouble that can include Python | crashing. User Beware! The value of this attribute is exactly the same | as self._array_interface_['data'][0]. | | * shape (c_intp*self.ndim): A ctypes array of length self.ndim where | the basetype is the C-integer corresponding to dtype('p') on this | platform. This base-type could be c_int, c_long, or c_longlong | depending on the platform. The c_intp type is defined accordingly in | numpy.ctypeslib. The ctypes array contains the shape of the underlying | array. | | * strides (c_intp*self.ndim): A ctypes array of length self.ndim where | the basetype is the same as for the shape attribute. This ctypes array | contains the strides information from the underlying array. This strides | information is important for showing how many bytes must be jumped to | get to the next element in the array. | | * data_as(obj): Return the data pointer cast to a particular c-types object. | For example, calling self._as_parameter_ is equivalent to | self.data_as(ctypes.c_void_p). Perhaps you want to use the data as a | pointer to a ctypes array of floating-point data: | self.data_as(ctypes.POINTER(ctypes.c_double)). | | * shape_as(obj): Return the shape tuple as an array of some other c-types | type. For example: self.shape_as(ctypes.c_short). | | * strides_as(obj): Return the strides tuple as an array of some other | c-types type. For example: self.strides_as(ctypes.c_longlong). | | Be careful using the ctypes attribute - especially on temporary | arrays or arrays constructed on the fly. For example, calling | ``(a+b).ctypes.data_as(ctypes.c_void_p)`` returns a pointer to memory | that is invalid because the array created as (a+b) is deallocated | before the next Python statement. You can avoid this problem using | either ``c=a+b`` or ``ct=(a+b).ctypes``. In the latter case, ct will | hold a reference to the array until ct is deleted or re-assigned. | | If the ctypes module is not available, then the ctypes attribute | of array objects still returns something useful, but ctypes objects | are not returned and errors may be raised instead. In particular, | the object will still have the as parameter attribute which will | return an integer equal to the data attribute. | | Examples | -------- | >>> import ctypes | >>> x | array([[0, 1], | [2, 3]]) | >>> x.ctypes.data | 30439712 | >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_long)) | <ctypes.LP_c_long object at 0x01F01300> | >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_long)).contents | c_long(0) | >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_longlong)).contents | c_longlong(4294967296L) | >>> x.ctypes.shape | <numpy.core._internal.c_long_Array_2 object at 0x01FFD580> | >>> x.ctypes.shape_as(ctypes.c_long) | <numpy.core._internal.c_long_Array_2 object at 0x01FCE620> | >>> x.ctypes.strides | <numpy.core._internal.c_long_Array_2 object at 0x01FCE620> | >>> x.ctypes.strides_as(ctypes.c_longlong) | <numpy.core._internal.c_longlong_Array_2 object at 0x01F01300> | | data | Python buffer object pointing to the start of the array's data. | | dtype | Data-type of the array's elements. | | Parameters | ---------- | None | | Returns | ------- | d : numpy dtype object | | See Also | -------- | numpy.dtype | | Examples | -------- | >>> x | array([[0, 1], | [2, 3]]) | >>> x.dtype | dtype('int32') | >>> type(x.dtype) | <type 'numpy.dtype'> | | flags | Information about the memory layout of the array. | | Attributes | ---------- | C_CONTIGUOUS (C) | The data is in a single, C-style contiguous segment. | F_CONTIGUOUS (F) | The data is in a single, Fortran-style contiguous segment. | OWNDATA (O) | The array owns the memory it uses or borrows it from another object. | WRITEABLE (W) | The data area can be written to. Setting this to False locks | the data, making it read-only. A view (slice, etc.) inherits WRITEABLE | from its base array at creation time, but a view of a writeable | array may be subsequently locked while the base array remains writeable. | (The opposite is not true, in that a view of a locked array may not | be made writeable. However, currently, locking a base object does not | lock any views that already reference it, so under that circumstance it | is possible to alter the contents of a locked array via a previously | created writeable view onto it.) Attempting to change a non-writeable | array raises a RuntimeError exception. | ALIGNED (A) | The data and all elements are aligned appropriately for the hardware. | WRITEBACKIFCOPY (X) | This array is a copy of some other array. The C-API function | PyArray_ResolveWritebackIfCopy must be called before deallocating | to the base array will be updated with the contents of this array. | UPDATEIFCOPY (U) | (Deprecated, use WRITEBACKIFCOPY) This array is a copy of some other array. | When this array is | deallocated, the base array will be updated with the contents of | this array. | FNC | F_CONTIGUOUS and not C_CONTIGUOUS. | FORC | F_CONTIGUOUS or C_CONTIGUOUS (one-segment test). | BEHAVED (B) | ALIGNED and WRITEABLE. | CARRAY (CA) | BEHAVED and C_CONTIGUOUS. | FARRAY (FA) | BEHAVED and F_CONTIGUOUS and not C_CONTIGUOUS. | | Notes | ----- | The `flags` object can be accessed dictionary-like (as in ``a.flags['WRITEABLE']``), | or by using lowercased attribute names (as in ``a.flags.writeable``). Short flag | names are only supported in dictionary access. | | Only the WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED flags can be | changed by the user, via direct assignment to the attribute or dictionary | entry, or by calling `ndarray.setflags`. | | The array flags cannot be set arbitrarily: | | - UPDATEIFCOPY can only be set ``False``. | - WRITEBACKIFCOPY can only be set ``False``. | - ALIGNED can only be set ``True`` if the data is truly aligned. | - WRITEABLE can only be set ``True`` if the array owns its own memory | or the ultimate owner of the memory exposes a writeable buffer | interface or is a string. | | Arrays can be both C-style and Fortran-style contiguous simultaneously. | This is clear for 1-dimensional arrays, but can also be true for higher | dimensional arrays. | | Even for contiguous arrays a stride for a given dimension | ``arr.strides[dim]`` may be *arbitrary* if ``arr.shape[dim] == 1`` | or the array has no elements. | It does *not* generally hold that ``self.strides[-1] == self.itemsize`` | for C-style contiguous arrays or ``self.strides[0] == self.itemsize`` for | Fortran-style contiguous arrays is true. | | flat | A 1-D iterator over the array. | | This is a `numpy.flatiter` instance, which acts similarly to, but is not | a subclass of, Python's built-in iterator object. | | See Also | -------- | flatten : Return a copy of the array collapsed into one dimension. | | flatiter | | Examples | -------- | >>> x = np.arange(1, 7).reshape(2, 3) | >>> x | array([[1, 2, 3], | [4, 5, 6]]) | >>> x.flat[3] | 4 | >>> x.T | array([[1, 4], | [2, 5], | [3, 6]]) | >>> x.T.flat[3] | 5 | >>> type(x.flat) | <type 'numpy.flatiter'> | | An assignment example: | | >>> x.flat = 3; x | array([[3, 3, 3], | [3, 3, 3]]) | >>> x.flat[[1,4]] = 1; x | array([[3, 1, 3], | [3, 1, 3]]) | | imag | The imaginary part of the array. | | Examples | -------- | >>> x = np.sqrt([1+0j, 0+1j]) | >>> x.imag | array([ 0. , 0.70710678]) | >>> x.imag.dtype | dtype('float64') | | itemsize | Length of one array element in bytes. | | Examples | -------- | >>> x = np.array([1,2,3], dtype=np.float64) | >>> x.itemsize | 8 | >>> x = np.array([1,2,3], dtype=np.complex128) | >>> x.itemsize | 16 | | nbytes | Total bytes consumed by the elements of the array. | | Notes | ----- | Does not include memory consumed by non-element attributes of the | array object. | | Examples | -------- | >>> x = np.zeros((3,5,2), dtype=np.complex128) | >>> x.nbytes | 480 | >>> np.prod(x.shape) * x.itemsize | 480 | | ndim | Number of array dimensions. | | Examples | -------- | >>> x = np.array([1, 2, 3]) | >>> x.ndim | 1 | >>> y = np.zeros((2, 3, 4)) | >>> y.ndim | 3 | | real | The real part of the array. | | Examples | -------- | >>> x = np.sqrt([1+0j, 0+1j]) | >>> x.real | array([ 1. , 0.70710678]) | >>> x.real.dtype | dtype('float64') | | See Also | -------- | numpy.real : equivalent function | | shape | Tuple of array dimensions. | | The shape property is usually used to get the current shape of an array, | but may also be used to reshape the array in-place by assigning a tuple of | array dimensions to it. As with `numpy.reshape`, one of the new shape | dimensions can be -1, in which case its value is inferred from the size of | the array and the remaining dimensions. Reshaping an array in-place will | fail if a copy is required. | | Examples | -------- | >>> x = np.array([1, 2, 3, 4]) | >>> x.shape | (4,) | >>> y = np.zeros((2, 3, 4)) | >>> y.shape | (2, 3, 4) | >>> y.shape = (3, 8) | >>> y | array([[ 0., 0., 0., 0., 0., 0., 0., 0.], | [ 0., 0., 0., 0., 0., 0., 0., 0.], | [ 0., 0., 0., 0., 0., 0., 0., 0.]]) | >>> y.shape = (3, 6) | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | ValueError: total size of new array must be unchanged | >>> np.zeros((4,2))[::2].shape = (-1,) | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | AttributeError: incompatible shape for a non-contiguous array | | See Also | -------- | numpy.reshape : similar function | ndarray.reshape : similar method | | size | Number of elements in the array. | | Equal to ``np.prod(a.shape)``, i.e., the product of the array's | dimensions. | | Notes | ----- | `a.size` returns a standard arbitrary precision Python integer. This | may not be the case with other methods of obtaining the same value | (like the suggested ``np.prod(a.shape)``, which returns an instance | of ``np.int_``), and may be relevant if the value is used further in | calculations that may overflow a fixed size integer type. | | Examples | -------- | >>> x = np.zeros((3, 5, 2), dtype=np.complex128) | >>> x.size | 30 | >>> np.prod(x.shape) | 30 | | strides | Tuple of bytes to step in each dimension when traversing an array. | | The byte offset of element ``(i[0], i[1], ..., i[n])`` in an array `a` | is:: | | offset = sum(np.array(i) * a.strides) | | A more detailed explanation of strides can be found in the | "ndarray.rst" file in the NumPy reference guide. | | Notes | ----- | Imagine an array of 32-bit integers (each 4 bytes):: | | x = np.array([[0, 1, 2, 3, 4], | [5, 6, 7, 8, 9]], dtype=np.int32) | | This array is stored in memory as 40 bytes, one after the other | (known as a contiguous block of memory). The strides of an array tell | us how many bytes we have to skip in memory to move to the next position | along a certain axis. For example, we have to skip 4 bytes (1 value) to | move to the next column, but 20 bytes (5 values) to get to the same | position in the next row. As such, the strides for the array `x` will be | ``(20, 4)``. | | See Also | -------- | numpy.lib.stride_tricks.as_strided | | Examples | -------- | >>> y = np.reshape(np.arange(2*3*4), (2,3,4)) | >>> y | array([[[ 0, 1, 2, 3], | [ 4, 5, 6, 7], | [ 8, 9, 10, 11]], | [[12, 13, 14, 15], | [16, 17, 18, 19], | [20, 21, 22, 23]]]) | >>> y.strides | (48, 16, 4) | >>> y[1,1,1] | 17 | >>> offset=sum(y.strides * np.array((1,1,1))) | >>> offset/y.itemsize | 17 | | >>> x = np.reshape(np.arange(5*6*7*8), (5,6,7,8)).transpose(2,3,1,0) | >>> x.strides | (32, 4, 224, 1344) | >>> i = np.array([3,5,2,2]) | >>> offset = sum(i * x.strides) | >>> x[3,5,2,2] | 813 | >>> offset / x.itemsize | 813 | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __hash__ = None

In [10]:

#1.生成全零、全一多维数组# 行数，列数ndarray_zero=np.zeros((5,5),dtype=np.int32);ndarray_zerondarray_one=np.ones((5,5),dtype=np.int32);ndarray_one

Out[10]:

array([[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]])

In [63]:

#2.arange函数生成指定开始值、结束值（不包含结束值）、步长（默认为1）的一维数组#arange([start,] stop[, step,], dtype=None)#array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])range_arr=np.arange(10);range_arr#array([1, 2, 3, 4, 5, 6, 7, 8, 9])range_arr=np.arange(1,10);range_arr#array([1, 3, 5, 7, 9])range_arr=np.arange(1,10,2);range_arr#array([10, 8, 6, 4, 2])range_arr=np.arange(10,1,-2);range_arr

Out[63]:

array([10, 8, 6, 4, 2])

In [112]:

#3.linspace函数生成等差数列#等差数列：各个元素之间差值相同#linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)#array([ 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50])lin_array=np.linspace(1,50,11,endpoint=True,dtype=np.int32);lin_array#4.logspace函数生成等比数列#等比数列：等比数列是指从第二项起，每一项与它的前一项的比值等于同一个常数的一种数列#logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)#设迭代i次，从start开始，stop结束，num为数组元素数量，base为底数，每一个元素为base的i次幂(不指定base默认底数为10)# array([ 3, 9, 27, 81, 243])log_array=np.logspace(1,10,10,base=3,dtype=np.int32);log_array

Out[112]:

array([ 3, 9, 27, 81, 243, 729, 2187, 6561, 19683, 59049])

In [118]:

#4使用random函数生成随机数并填充成数组#randint(low, high=None, size=None, dtype='l')#默认参数为维度(row*col)rand_arr=np.random.random((4,5));rand_arr#指定随机范围和维度生成数组# array([[9, 6, 8, 3],# [9, 8, 6, 2],# [2, 3, 7, 2],# [4, 8, 6, 4],# [4, 6, 8, 3]])rand_arr=np.random.randint(1,10,size=(5,4));rand_arr

Out[118]:

array([[6, 3, 5, 6], [8, 1, 4, 4], [3, 4, 6, 7], [1, 3, 3, 8], [9, 1, 2, 9]])

In [134]:

#5修改原数组的数据类型并返回新数组before_arr=np.array([1,2,3,4]);before_arr.dtypebefore_arrnew_arr=before_arr.astype("float");new_arr.dtypenew_arr

Out[134]:

array([1., 2., 3., 4.])

In [180]:

#6使用shape和reshape函数修改ndarray数组对象的形状example_arr=np.arange(2,22,2);example_arr# array([[ 2, 4, 6, 8, 10],# [12, 14, 16, 18, 20]])example_arr.shape=(-1,5);example_arr#reshape重新指定数组形状构建新数组(和原数组共一份内存地址)# array([[ 2, 4],# [ 6, 8],# [10, 12],# [14, 16],# [18, 20]])new_arr=example_arr.reshape((5,-1));new_arr#array([[ 2, 88, 6, 8, 10],# [12, 14, 16, 18, 20]])## array([[ 2, 88],# [ 6, 8],# [10, 12],# [14, 16],# [18, 20]])new_arr[0][1]=88;new_arr#shape修改原数组形状，不影响新数组#reshape修改新数组形状# array([[ 2, 88],# [ 6, 8],# [10, 12],# [14, 16],# [18, 20]])example_arr.shape=(5,-1);example_arr# array([[ 2, 88, 6, 8, 10],# [12, 14, 16, 18, 20]])arr=new_arr.reshape((2,-1));arr

Out[180]:

array([[100, 88], [ 6, 8], [ 10, 12], [ 14, 16], [ 18, 20]])

3.Pandas模块

不管是Series还是DataFrame对象，都有索引对象。(DataFrame基于二维数组构建，DataFrame索引分为行索引和列索引) 索引对象负责管理轴标签和其它元数据(eg：轴名称等等) 通过索引可以从Series、DataFrame中获取值或者对某个索引值进行重新赋值 Series或者DataFrame的自动对齐功能是通过索引实现的

In [12]:

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;#1.二维数组创建DataFrame# DataFrame(datas,row_index,column_index,dtype,copy)# data数据采取各种格式，如ndarray,series# 二维数组中的每一个一维数组为一条记录(行)arr=np.array([ ["Tom","男","18"], ["Jim","女","19"], ["Cindy","女","17"]]);data_frame=DataFrame(arr,index=[1,2,3],columns=["name","sex","age"]);data_frame;print("=============================================");print("行索引==>"+str(data_frame.index));print("列索引==>"+str(data_frame.columns));print("数据==>"+str(data_frame.values));data_frame=============================================行索引==>Int64Index([1, 2, 3], dtype='int64')列索引==>Index(['name', 'sex', 'age'], dtype='object')数据==>[['Tom' '男' '18'] ['Jim' '女' '19'] ['Cindy' '女' '17']]

Out[12]:

	name	sex	age
1	Tom	男	18
2	Jim	女	19
3	Cindy	女	17

In [20]:

#字典创建DataFrame#字典的值必须具有相同的shape和长度,此时字典的key作为列索引，每个字典的值作为一列dict={ "name":["Tom","Jim","Cindy"], "sex":["男","女","女"], "age":[18,19,17]};data_frame=DataFrame(dict,index=[1,2,3]);data_frame

Out[20]:

	name	sex	age
1	Tom	男	18
2	Jim	女	19
3	Cindy	女	17

In [41]:

#series创建DataFrame#zip函数将可迭代的对象作为参数，将对象中对应的每一个元素打包成一个元祖，并返回由元祖组成的列表a=[1,2,3];b=[4,5,6];zipped=zip(a,b);data=list(zipped);data_frame=DataFrame(np.array(list(zip(*data))));data_frame

Out[41]:

	0	1	2
0	1	2	3
1	4	5	6

In [2]:

#set_index分层索引data_frame=DataFrame(np.random.randint(1,9,(3,4)));data_frame;# 0 1 2 3# 0 4 7 3 3# 1 6 8 8 3# 2 8 7 7 6print(data_frame);print("==============================");#参数表示哪一列作为行索引，可以是复合索引# 0 1 3# 2 # 3 4 7 3# 8 6 8 3# 7 8 7 6df2=data_frame.set_index(2);print(df2);print("==============================");# 0 1# 2 3 # 3 3 4 7# 8 3 6 8# 7 6 8 7df3=data_frame.set_index([2,3]);print(df3);print("==============================");#reset_index还原索引#drop强制还原成原型# 0 1 2 3# 0 4 7 3 3# 1 6 8 8 3# 2 8 7 7 6df4=data_frame.reset_index(drop=True);print(df4);---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-2-8fd2b307dec8> in <module>() 1 #set_index分层索引----> 2 data_frame=DataFrame(np.random.randint(1,9,(3,4))); 3 data_frame; 4 5 # 0 1 2 3NameError: name 'DataFrame' is not defined

In [1]:

---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-1-f9a5d0cbfdf8> in <module>() 1 #DataFrame数据获取 2 #1.通过列索引获取DataFrame数据----> 3 data_frameNameError: name 'data_frame' is not definedimport numpy as np;from pandas import Series,DataFrame;import pandas as pd;#DataFrame数据获取#1.使用列索引获取DataFrame数据arr=np.arange(10,34).reshape((4,6));df1=DataFrame(arr,columns=list("ABCDEF"),index=["O","P","Q","R"]);# A B C D E F# O 10 11 12 13 14 15# P 16 17 18 19 20 21# Q 22 23 24 25 26 27# R 28 29 30 31 32 33# O 12# P 18# Q 24# R 30# Name: C, dtype: int32# print(df1,end="\n\n");# print(df1["C"],end="\n\n");'''基于ix函数(loc[start_index,end_index],iloc[start_index,end_index])通过行索引获取多行数据iloc只能适用于行索引为数值的情况'''df1.loc["Q":"R"];'''指定多个索引组成列表获取多个列'''df1[["A","C","E"]]'''DataFrame列数据的修改'''df1["C"]=np.arange(1,5).reshape(4)df1del(df1["F"]);# A B C D E# O 10 11 1 13 14# P 16 17 2 19 20# Q 22 23 3 25 26# R 28 29 4 31 32df1# A B C D E F# O 10 11 1 13 14 5# P 16 17 2 19 20 6# Q 22 23 3 25 26 7# R 28 29 4 31 32 8df1["F"]=np.arange(5,9).reshape(4);df1;'''通过loc函数同时选择行和列数据'''df2=df1.loc["O":"P","A":"C"];df2'''操作DataFrame行数据'''df1.loc["P"]=np.arange(6,12).reshape(6);df1;# 更改多行数据(自动构建对应维度的r*c矩阵，赋值的数组也必须维度相同)# A B C D E F# O 10 11 1 13 14 5# P 1 2 3 4 5 6# Q 7 8 9 10 11 12# R 28 29 4 31 32 8df1.loc["P":"Q"]=np.arange(1,13).reshape(2,6);df1#行数据删除# A B C D E F# P 1 2 3 4 5 6# Q 7 8 9 10 11 12# R 28 29 4 31 32 8df1=df1.drop("O");df1

Out[140]:

	A	B	C	D	E	F
P	1	2	3	4	5	6
Q	7	8	9	10	11	12
R	28	29	4	31	32	8

In [192]:

#DataFrame按行和列添加Series数据#按行添加索引两种方式：1.loc[行索引]=series 2.df.append(series);#使用append添加后,series的name作为该记录的行索引，series的值作为该行记录# A B C D E F# P 6 7 8 9 10 11# Q 22 23 3 25 26 7# R 28 29 4 31 32 8# index_1 50 51 52 53 54 55series1=Series(np.arange(50,56).reshape(6),index=["A","B","C","D","E","F"]);series1.name="index_1";series1# df1.loc["index_1"]=series1df2=df1.append(series1);df2;df3=df2.drop(series1.name);df3;#series按列增加#series数组必须和df的行数相同，否则填充缺失值series2=Series(np.arange(13,16).reshape(3),index=["P","Q","R"]);series2.name="index_2";df1[series2.name]=series2;df1;del(df1[series2.name]);df1

Out[192]:

	A	B	C	D	E	F
P	1	2	3	4	5	6
Q	7	8	9	10	11	12

In [209]:

#DataFrame按行和列添加DataFrame数据#按行添加DataFrame#注意：按行添加DataFrame时，目标DataFrame数据和填充的DataFrame数据列数必须对应df1;new_df=DataFrame(np.arange(13,31).reshape(3,6),index=["R","S","T"],columns=list("ABCDEF"));new_df;df2=df1.append(new_df);# A B C D E F# P 1 2 3 4 5 6# Q 7 8 9 10 11 12# R 13 14 15 16 17 18# S 19 20 21 22 23 24# T 25 26 27 28 29 30df2#按列添加DataFrame#注意：目标DataFrame只能是一维数组，且列数必须和填充DataFrame的行数相同df3=df2.drop(["R","S","T"]);df3new_df=DataFrame(np.arange(7,9).reshape(2),index=["P","Q"],columns=["G"]);df3["G"]=new_df;new_df=DataFrame(np.arange(13,15).reshape(2),index=["P","Q"],columns=["H"]);df3["H"]=new_df;# A B C D E F G H# P 1 2 3 4 5 6 7 13# Q 7 8 9 10 11 12 8 14df3

Out[209]:

	A	B	C	D	E	F	G	H
P	1	2	3	4	5	6	7	13
Q	7	8	9	10	11	12	8	14

from pandas import Series,DataFrame;import pandas as pd;import numpy as np;arr=np.array([1,2,3,4,5]);#1.通过一维数组创建Seriesseries1=Series(arr);series1print("series的值类型是==>"+str(series1.dtype));print("series的索引是==>"+str(series1.index));print("series的值是==>"+str(series1.values));#通过一维数组和索引创建seriesseries1=Series(arr,index=["A","B","C","D","E"],dtype=np.int32);series1series的值类型是==>int32series的索引是==>RangeIndex(start=0, stop=5, step=1)series的值是==>[1 2 3 4 5]

Out[10]:

A 1B 2C 3D 4E 5dtype: int32

In [12]:

#2.通过字典创建seriesdict={ "A":1, "B":2, "C":3};series1=Series(dict);series1

Out[12]:

A 1B 2C 3dtype: int64

In [15]:

#3.series对象及其属性#series对象及其索引都有name属性，默认为空series1.name="一维数组对象";series1.index.name="索引名称";series1.name

Out[15]:

'一维数组对象'

In [31]:

#4.series常用属性#行轴标签列表series1.axes#series值的数据类型series1.dtype#series序列是否为空series1.empty#series底层数据维数series1.ndim#series基础数据中的元素数series1.size#series中的值series1.values#series返回前n行series1.head(2)#series返回后n行series1.tail(2);

In [43]:

#使用切片获取series值series1series1["A"]series1["A":"C"]series1[0:2]

Out[43]:

索引名称A 1B 2Name: 一维数组对象, dtype: int64

series计算 Series进行数组运算的时候，索引与值之间的映射关系不会发生改变转存失败重新上传取消转存失败重新上传取消转存失败重新上传取消其实在操作Series的时候，基本上可以把Series看成NumPy中的ndarray数组来进行操作。ndarray数组的绝大多数操作都可以应用到Series上。转存失败重新上传取消转存失败重新上传取消转存失败重新上传取消

In [13]:

from pandas import Series,DataFrameimport pandas as pd;import numpy as np;#Series基本计算arr=np.array([3,8,7,5,56,72.5,48.5,120]);series_index=["A","B","C","D","E","F","G","H"];series=Series(arr,index=series_index,dtype=np.float64);seriesprint(series+3);print("======================");print(series-3);print("======================");print(series*3);print("======================");print(series/3);print("======================");print(series**3);print("======================");print(np.fabs(series));print("======================");print(np.square(series));A 6.0B 11.0C 10.0D 8.0E 59.0F 75.5G 51.5H 123.0dtype: float64======================A 0.0B 5.0C 4.0D 2.0E 53.0F 69.5G 45.5H 117.0dtype: float64======================A 9.0B 24.0C 21.0D 15.0E 168.0F 217.5G 145.5H 360.0dtype: float64======================A 1.000000B 2.666667C 2.333333D 1.666667E 18.666667F 24.166667G 16.166667H 40.000000dtype: float64======================A 27.000B 512.000C 343.000D 125.000E 175616.000F 381078.125G 114084.125H 1728000.000dtype: float64======================A 3.0B 8.0C 7.0D 5.0E 56.0F 72.5G 48.5H 120.0dtype: float64======================A 9.00B 64.00C 49.00D 25.00E 3136.00F 5256.25G 2352.25H 14400.00dtype: float64

In [27]:

#series自动对齐#多个series运算时，如果索引值不同，则会自动对齐相同索引值的数据，如果对应的series没有某个索引值，最终结果赋值为NaN#多个series索引必须一一对应，否则NaN填充arr1=np.random.randint(1,9,5);arr2=np.random.randint(1,9,4);series1=Series(arr1,index=list("ABCDE"));series2=Series(arr2,index=list("ACDE"));print(arr1);print("======================");print(arr2);print("======================");print(series1+series2);[8 2 5 8 6]======================[5 8 3 6]======================A 13.0B NaNC 13.0D 11.0E 12.0dtype: float64

In [37]:

#series缺失值检测#使用pandas的isnull和notnull检测series缺失值,isnull ==> True表示对应值为NaN notnull ==> False表示对应值为NaNadd_series=series1+series2;add_series;pd.isnull(add_series);pd.notnull(add_series);#获取缺失值print(add_series[pd.isnull(add_series)]);print("=============================================");#获取非缺失值print(add_series[pd.notnull(add_series)]);B NaNdtype: float64=============================================A 13.0C 13.0D 11.0E 12.0dtype: float64

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;#1.唯一值使用df.unique()函数获取Series中的唯一值#对于DataFrame而言，每一行和每一列都是一个Seriesdf=DataFrame(np.random.randint(1,9,(5,5)),index=list("ABCDE"),columns=list("abcde"));series=Series(np.arange(11,16).reshape(5),index=list("ABCDE"));series.name="index_one";# pd.unique(df)print(df);print(pd.unique(df.loc["D"]));print(df.loc["D"].unique());#2.值计数：使用pd.value_counts()函数返回Series中每个值出现的次数并组成DataFrame，且按照出现次数从高到低排序#在重组后的DataFrame中，元素值作为索引,元素出现次数作为对应值#value_counts()函数仅适用于一个Series（DataFrame一行或一列数据）pd.value_counts(df.loc["D"]);pd.value_counts(df["a"]);df.loc["A"].value_counts();#3.成员资格：使用isin()函数查找DataFrame/Series中匹配的元素，并在DataFrame/Series中匹配元素的对应位置使用True标注，反之标注为False,返回新的布尔数组df.isin([1,2,3]);series.isin([14,15]) a b c d eA 5 7 1 1 2B 6 1 2 7 8C 3 4 6 6 1D 7 2 8 3 2E 7 5 2 1 5[7 2 8 3][7 2 8 3]

Out[109]:

A FalseB FalseC FalseD TrueE TrueName: index_one, dtype: bool

In [110]:

#pandas层次索引#1.series的层次索引series=Series(np.arange(11,16).reshape(5),index=[ ["2011","2012","2013","2014","2015"], #第一层索引 list("ABCDE"), #第二层索引 list("abcde"), #第三层索引 list("12345") #第四层索引]);series#根据第一层索引获取值series["2011"]#根据第二层索引获取值series["2011"]["A"]#根据第三层索引获取值series["2011"]["A"]["a"]["1"];series["2011","A","a","1"]#swaplevel()函数交换最内层分层索引# series.swaplevel()df=DataFrame({ "name":["zhangsan","lisi","wangwu"], "sex":["男","女","男"], "age":[18,20,25]});df;#Series==>unstack()函数将最内层行索引转换为列索引，返回DataFramedf2=series.unstack();df2#DataFrame==>stack()函数将最内层列索引转换为行索引，返回Seriesseries=df.stack()series

Out[110]:

0 name zhangsan sex 男 age 181 name lisi sex 女 age 202 name wangwu sex 男 age 25dtype: object

In [132]:

#DataFrame的层次索引#使用set_index([])设置DataFrame的分层索引,参数为需要设置为索引的列索引，可以是组合列索引df=df.reset_index();df=df.set_index(["name","sex"]);df# #根据层次索引分组统计数据df.sum(level="sex");df["age"].mean(level="sex")df["age"].min(level="sex")dfdf3=df.reset_index();df3;df3=df3.set_index(["name","sex"]);df3;df3.mean(level="sex");df3

Out[132]:

		age
name	sex
zhangsan	男	18
lisi	女	20
wangwu	男	25

In [170]:

#根据层次索引统计series数据series=Series(np.arange(15,18).reshape(3),index=[ ["A","B","C"], #第一层索引 ["A2","B2","C2"], #第二层索引 ["A3","B3","C3"] #第三层索引]);series;#使用unstack函数将最内层的列索引转换为行索引,使用level指定索引层次(即指定将哪一列索引转换为行索引，不指定默认转换最内层的列索引为行索引)df=series.unstack();df;#使用sort_index()函数对Series和DataFrame索引进行排序#对于DataFrame而言，需要指定坐标轴axis:0代表所有纵轴索引(列),1代表所有横轴索引(行)#ascending指定升序还是降序，默认升序排序(True)series.sort_index(ascending=True)df.sort_index(axis=1,ascending=False);

In [226]:

#Series和DataFrame值排序#使用sort_values()函数对Series和DataFrame的值进行排序#任何缺失值都会放到Series的末尾(默认)df5=DataFrame(np.arange(300,321).reshape(3,7),index=["A","B","C"]);df5#sort_values()参数by指定根据传递的轴方向和索引，决定对哪个行或列进行排序，可以传递列表指定排序范围#axis=0则by指定的是列索引,axis=1则by指定的是行索引df6=df5.sort_values(by="A",ascending=True,axis=1);df6df6=df6.sort_values(by=["B","C"],ascending=False,axis=1);df6

Out[226]:

0 151 152 173 144 17dtype: int64

In [286]:

#pandas rank排序，增设一个排名值（从1开始，一直到数组中有效数据的数量）#默认rank排序是通过为每个分组分配一个平均排名的方式来破坏平级关系#相同数值排名相同#Series rank排序series2=Series(np.arange(15,17).reshape(2),index=[ ["A","B"], #一层索引 ["A2","B2"], #二层索引 ["A3","B3"] #三层索引]);series2;series3=Series(np.arange(17,21).reshape(4),index=[ ["C","D","E","F"], #一层索引 ["C2","D2","E2","F2"], #二层索引 ["C3","D3","E3","F3"] #三层索引]);series3series4=series2.append(series3);series4#更改2,4为13series4["C"]["C2"]["C3"]=14;series4["E"]["E2"]["E3"]=14;series4#series rank排名 series4.rank() # 排名相同默认记为1.5series4.rank(method="min") #排名相同记为1.0series4.rank(method="max") #排名相同记为2.0#DataFrame rank排序df8=DataFrame(np.arange(21,36).reshape(3,5));df8;#更改索引为1的行的2-4列值为25df8.loc[1,2:4]=25;df8#DataFrame排名计算：默认纵向排名，可以通过axis指定轴方向，axis=0为纵向排名，axis=1为横向排名#rank 计算纵轴排名时(默认)：#对于相同排名：参数method默认值average，取值1.5；参数method min参数指定后，取值为1.0；参数method max指定后，取值为2.0df8.rank()df8.loc[0,3]=25;df8.rank()df8.rank(method="average")df8.rank(method="min")df8.rank(method="max")df8.rank(axis=1)df8.rank(axis=1,method="min")df8.rank(axis=1,method="max")

Out[286]:

	0	1	2	3	4
0	1.0	2.0	3.0	5.0	5.0
1	4.0	5.0	3.0	3.0	3.0
2	1.0	2.0	3.0	4.0	5.0

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;#pandas常用基础数学统计函数#对于series针对每一个元素为单位计算，对于DataFrame针对每一列为单位计算df=DataFrame(np.arange(3,27).reshape(4,6));df;series=Series(np.arange(1,11).reshape(10));series;df.loc[0,1]=np.NaN;df.loc[0,3]=np.NaN;df.loc[1,1]=np.NaN;df.loc[3,3:5]=np.NaN;dfprint("count计算每一列非缺失值数量==>\n"+str(df.count()));dfprint("=============================");print("describe计算总统计值==>\n"+str(df.describe()));print("=============================");print("max计算每一列最大值==>\n"+str(df.max()));print("=============================");print("min计算每一列最小值==>\n"+str(df.min()));print("=============================");print("idxmax计算每一列最大值的行索引==>\n"+str(df.idxmax()));print("=============================");print("idxmin计算每一列最小值的行索引==>\n"+str(df.idxmin()));print("=============================");print("quantile计算每一列(样本)的分位数==>\n"+str(df.quantile()));print("=============================");print("sum列值的总和==>\n"+str(df.sum()));print("=============================");print("mean列值的平均数==>\n"+str(df.mean()));print("=============================");print("median列值的中位数==>\n"+str(df.median()));dfcount计算每一列非缺失值数量==>0 41 22 43 24 35 3dtype: int64=============================describe计算总统计值==> 0 1 2 3 4 5count 4.000000 2.000000 4.000000 2.000000 3.0 3.0mean 12.000000 19.000000 14.000000 15.000000 13.0 14.0std 7.745967 4.242641 7.745967 4.242641 6.0 6.0min 3.000000 16.000000 5.000000 12.000000 7.0 8.025% 7.500000 17.500000 9.500000 13.500000 10.0 11.050% 12.000000 19.000000 14.000000 15.000000 13.0 14.075% 16.500000 20.500000 18.500000 16.500000 16.0 17.0max 21.000000 22.000000 23.000000 18.000000 19.0 20.0=============================max计算每一列最大值==>0 21.01 22.02 23.03 18.04 19.05 20.0dtype: float64=============================min计算每一列最小值==>0 3.01 16.02 5.03 12.04 7.05 8.0dtype: float64=============================idxmax计算每一列最大值的行索引==>0 31 32 33 24 25 2dtype: int64=============================idxmin计算每一列最小值的行索引==>0 01 22 03 14 05 0dtype: int64=============================quantile计算每一列(样本)的分位数==>0 12.01 19.02 14.03 15.04 13.05 14.0Name: 0.5, dtype: float64=============================sum列值的总和==>0 48.01 38.02 56.03 30.04 39.05 42.0dtype: float64=============================mean列值的平均数==>0 12.01 19.02 14.03 15.04 13.05 14.0dtype: float64=============================median列值的中位数==>0 12.01 19.02 14.03 15.04 13.05 14.0dtype: float64

Out[34]:

	0	1	2	3	4	5
0	3	NaN	5	NaN	7.0	8.0
1	9	NaN	11	12.0	13.0	14.0
2	15	16.0	17	18.0	19.0	20.0
3	21	22.0	23	NaN	NaN	NaN

In [45]:

#pandas高阶数学统计函数df=df.replace(np.NaN,0);print("===================================================================");print("mad根据平均值计算平均绝对距离差==>\n"+str(df.mad()));print("=============================");print("var样本数值的方差==>\n"+str(df.var()));print("=============================");print("std样本标准偏差==>\n"+str(df.std()));print("=============================");print("cumsum样本值累计和==>\n"+str(df.cumsum()));print("=============================");print("cummin样本累计最小值==>\n"+str(df.cummin()));print("=============================");print("cummax样本累计最大值 ==>\n"+str(df.cummax()));print("=============================");print("cumprod样本值的累计积==>\n"+str(df.cumprod()));print("=============================");print("pct_change计算百分数变化==>\n"+str(df.pct_change()));df===================================================================mad根据平均值计算平均绝对距离差==>0 6.001 9.502 6.003 7.504 6.255 6.50dtype: float64=============================var样本数值的方差==>0 60.0000001 126.3333332 60.0000003 81.0000004 66.2500005 73.000000dtype: float64=============================std样本标准偏差==>0 7.7459671 11.2398102 7.7459673 9.0000004 8.1394105 8.544004dtype: float64=============================cumsum样本值累计和==> 0 1 2 3 4 50 3.0 0.0 5.0 0.0 7.0 8.01 12.0 0.0 16.0 12.0 20.0 22.02 27.0 16.0 33.0 30.0 39.0 42.03 48.0 38.0 56.0 30.0 39.0 42.0=============================cummin样本累计最小值==> 0 1 2 3 4 50 3.0 0.0 5.0 0.0 7.0 8.01 3.0 0.0 5.0 0.0 7.0 8.02 3.0 0.0 5.0 0.0 7.0 8.03 3.0 0.0 5.0 0.0 0.0 0.0=============================cummax样本累计最大值 ==> 0 1 2 3 4 50 3.0 0.0 5.0 0.0 7.0 8.01 9.0 0.0 11.0 12.0 13.0 14.02 15.0 16.0 17.0 18.0 19.0 20.03 21.0 22.0 23.0 18.0 19.0 20.0=============================cumprod样本值的累计积==> 0 1 2 3 4 50 3.0 0.0 5.0 0.0 7.0 8.01 27.0 0.0 55.0 0.0 91.0 112.02 405.0 0.0 935.0 0.0 1729.0 2240.03 8505.0 0.0 21505.0 0.0 0.0 0.0=============================pct_change计算百分数变化==> 0 1 2 3 4 50 NaN NaN NaN NaN NaN NaN1 2.000000 NaN 1.200000 inf 0.857143 0.7500002 0.666667 inf 0.545455 0.500000 0.461538 0.4285713 0.400000 0.375000 0.352941 -1.000000 -1.000000 -1.000000

Out[45]:

	0	1	2	3	4	5
0	3	0.0	5	0.0	7.0	8.0
1	9	0.0	11	12.0	13.0	14.0
2	15	16.0	17	18.0	19.0	20.0
3	21	22.0	23	0.0	0.0	0.0

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;import json;'''通过pandas提供的read_xxx相关的函数可以读取文件中的数据，并形成DataFrame.如果包含中文数据，则必须强制指定编码为gbk,默认为utf-8'''#1.read_csv读取文本数据#pd.read_csv(file_path,separator,header,encoding);文件路径、属性分隔符、是否读取头部数据、字符编码df=pd.read_csv("data_test.csv",sep=",",header=None,encoding="gbk");df;'''DataFrame数据导出为csv'''df2=DataFrame(np.array([ ["Tom","100","男"], ["Gina","100","女"], ["Cindy","100","女"]]),index=[0,1,2],columns=["name","score","sex"]);df2df2.to_csv("data01.csv",sep=",",encoding="gbk");

In [3]:

#DataFrame导入导出为exceldf2.to_excel("data.xlsx",encoding="gbk");df3=pd.read_excel("data.xlsx",encoding="gbk");df3#DataFrame导入导出为jsondf2.to_json("data.json");print("OK");df4=pd.read_json("data.json");df4OK

Out[3]:

	name	score	sex
0	Tom	100	男
1	Gina	100	女
2	Cindy	100	女

引自-高数基础-概率论数理统计篇
相关系数（Correlation coefficient）：反映两个样本/样本之间的相互关系以及之间的相关程度。在COV的基础上进行了无量纲化操作，也就是进行了标准化操作。协方差(Covariance, COV)：反映两个样本/变量之间的相互关系以及之间的相关程度。转存失败重新上传取消转存失败重新上传取消转存失败重新上传取消

In [1]:

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;#pandas算法运算和数据对齐#相关系数和协方差运算#相关系数：样本和样本之间的相互关系和相关程度，说白了其实就是DataFrame/Series之间的相互关系和相关程度#协方差：样本和变量之间的相互关系和相关程度，说白了其实就是DataFrame/Series和变量之间的相互关系和相关程度#协方差计算公式：假设经过N个时刻，对于每个时刻的 [X-MEAN(X,Y)]*[Y-MEAN(X,Y)]累加求和并求出均值#即：MEAN({[X1-MEAN(X1,Y1)]*[Y1-MEAN(X1,Y1)]} + {[X2-MEAN(X2,Y2)]*[Y2-MEAN(X2,Y2)]} +...+{[Xn-MEAN(Xn,Yn)]*[Yn-MEAN(Xn,Yn)]})

总结：
协方差计算公式： P=MEAN({[X1-MEAN(X1,Y1)][Y1-MEAN(X1,Y1)]} + {[X2-MEAN(X2,Y2)][Y2-MEAN(X2,Y2)]} +...+{[Xn-MEAN(Xn,Yn)]*[Yn-MEAN(Xn,Yn)]})

如果P>0，则X&Y同向变化，协方差P越大说明同向程度越高

如果P<0，则X&Y反向变化，协方差P越小说明反向程度越高

In [80]:

df=DataFrame(np.arange(21,41).reshape(4,5));series=Series(np.arange(33,38).reshape(5));series.name=4;df2=df.append(series);df2;#使用df.cov()函数计算协方差#当传递参数时，参数为需要比较的另外一个变量，使用df中的某一个series调用，如A.cov(B)计算A和B的协方差#当不传递参数时，返回相同维度协方差矩阵#计算索引3、4列的协方差p=df2[3].cov(df[4])p;df2.cov();#计算索引0、1行的协方差p=df2.loc[0].cov(df2.loc[1]);pdf2;np.cov(df,df2);# df.cov()df2.cov()

Out[80]:

	0	1	2	3	4
0	35.3	35.3	35.3	35.3	35.3
1	35.3	35.3	35.3	35.3	35.3
2	35.3	35.3	35.3	35.3	35.3
3	35.3	35.3	35.3	35.3	35.3
4	35.3	35.3	35.3	35.3	35.3

相关系数:corr() 就是用X、Y的协方差除以X的标准差和Y的标准差。所以，相关系数也可以看成协方差：一种剔除了两个变量量纲影响、标准化后的特殊协方差。 1.也可以反映两个变量变化时是同向还是反向，如果同向变化为正，反向变化为负 2.由于它是标准化后的协方差，因此更重的特性是，它消除了两个变量变化幅度的影响，而只是单纯反应两个变量单位变化的相似程度。注意：相关系数不像协方差一样可以在＋\infty 到－\infty 间变化，它只能在＋1到－1之间变化当相关系数为1的时候两者相识度最大，同向正相关当相关系数为0的时候两者没有任何相似度，两个变量无关当相关系数为-1的时候两者变化的反向相似度最大，完全反向负相关转存失败重新上传取消转存失败重新上传取消转存失败重新上传取消

总结：
相关系数计算公式：
P1=MEAN({[X1-MEAN(X1,Y1)][Y1-MEAN(X1,Y1)]} + {[X2-MEAN(X2,Y2)][Y2-MEAN(X2,Y2)]} +...+{[Xn-MEAN(Xn,Yn)][Yn-MEAN(Xn,Yn)]})
P2=MEAN({[X1-MEAN(X1,Y1)][Y1-MEAN(X1,Y1)]} + {[X2-MEAN(X2,Y2)][Y2-MEAN(X2,Y2)]} +...+{[Xn-MEAN(Xn,Yn)][Yn-MEAN(Xn,Yn)]})

R1=P1/STD(X)
R2=P2/STD(Y)
-1=<R<=1

R=1 同向正相关
R=0 无关
R=-1 完全反向负相关

In [84]:

#使用df.corr()函数计算相关系数df=DataFrame(np.arange(151,176).reshape(5,5));df;df.corr()

Out[84]:

	0	1	2	3	4
0	1.0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0	1.0
3	1.0	1.0	1.0	1.0	1.0
4	1.0	1.0	1.0	1.0	1.0

import numpy as np;from pandas import Series,DataFrame;import pandas as pd;'''DataFrame数据过滤获取，返回过滤后的DataFrame'''df=DataFrame(np.arange(100,150).reshape(5,10));df#获取第1-3列数据# 1 2 3# 0 101 102 103# 1 111 112 113# 2 121 122 123# 3 131 132 133# 4 141 142 143df2=df[[1,2,3]];df2;#获取第2行后面的行第4-6列数据# 4 5 6# 2 124 125 126# 3 134 135 136# 4 144 145 146df3=df.loc[2:,4:6];df3# RangeIndex(start=2, stop=10, step=1)df.columns[2:]

Out[14]:

RangeIndex(start=2, stop=10, step=1)

In [34]:

'''pandas缺失值处理(针对Series和DataFrame的缺失值处理)删除对应的行/列或填充默认值'''# 0 1 2 3 4 5 6 7 8 9# 0 100 101.0 102 103.0 104 105 106 107 108 109.0# 1 110 111.0 112 113.0 114 115 116 117 118 119.0# 2 120 NaN 122 123.0 124 125 126 127 128 129.0# 3 130 131.0 132 NaN 134 135 136 137 138 139.0# 4 140 141.0 142 143.0 144 145 146 147 148 NaNdf.loc[2,1]=np.NaN;df.loc[3,3]=np.NaN;df.loc[4,9]=np.NaN;df#1.使用dropna删除包含缺失值的行/列# 0 1 2 3 4 5 6 7 8 9# 0 100 101.0 102 103.0 104 105 106 107 108 109.0# 1 110 111.0 112 113.0 114 115 116 117 118 119.0# df2=df.dropna();#删除包含缺失值的行# df2# 0 2 4 5 6 7 8# 0 100 102 104 105 106 107 108# 1 110 112 114 115 116 117 118# 2 120 122 124 125 126 127 128# 3 130 132 134 135 136 137 138# 4 140 142 144 145 146 147 148# df2=df.dropna(axis=1);#删除包含缺失值的列# df2df2=df.dropna(how="all",axis=1);#删除行/列中所有元素为缺失值的行和列df2

Out[34]:

	0	1	2	3	4	5	6	7	8	9
0	100	101.0	102	103.0	104	105	106	107	108	109.0
1	110	111.0	112	113.0	114	115	116	117	118	119.0
2	120	NaN	122	123.0	124	125	126	127	128	129.0
3	130	131.0	132	NaN	134	135	136	137	138	139.0
4	140	141.0	142	143.0	144	145	146	147	148	NaN

In [93]:

#2.使用fillna函数填充/插入的方式填充缺失值df2;#使用0填充缺失值df3=df2.fillna(0);df3;#使用向前填充和向后填充#ffill向前填充，使用纵向的上一个数据填充当前缺失值，找不到则不填充#bfill向后填充，使用纵向的下一个数据填充当前缺失值,找不到则不填充# 0 1 2 3 4 5 6 7 8 9# 0 100 101.0 102 103.0 104 105 106 107 108 109.0# 1 110 111.0 112 113.0 114 115 116 117 118 119.0# 2 120 111.0 122 123.0 124 125 126 127 128 129.0# 3 130 131.0 132 123.0 134 135 136 137 138 139.0# 4 140 141.0 142 143.0 144 145 146 147 148 139.0df3=df2.fillna(method="ffill");df3# 0 1 2 3 4 5 6 7 8 9# 0 100 101.0 102 103.0 104 105 106 107 108 109.0# 1 110 111.0 112 113.0 114 115 116 117 118 119.0# 2 120 131.0 122 123.0 124 125 126 127 128 129.0# 3 130 131.0 132 143.0 134 135 136 137 138 139.0# 4 140 141.0 142 143.0 144 145 146 147 148 NaNdf3=df2.fillna(method="bfill");df3;df2#填充指定列的缺失值,{列索引:替换的目标值}# 0 1 2 3 4 5 6 7 8 9# 0 100 101.0 102 103 104 105 106 107 108 109# 1 110 111.0 112 113 114 115 116 117 118 119# 2 120 NaN 122 123 124 125 126 127 128 129# 3 130 131.0 132 33333 134 135 136 137 138 139# 4 140 141.0 142 143 144 145 146 147 148 99999df3=df2.fillna({3:"33333",9:"99999"});df3#df.replace函数全局替换成给定值# 0 1 2 3 4 5 6 7 8 9# 0 100 101 102 103 104 105 106 107 108 109# 1 110 111 112 113 114 115 116 117 118 119# 2 120 0 122 123 124 125 126 127 128 129# 3 130 131 132 0 134 135 136 137 138 139# 4 140 141 142 143 144 145 146 147 148 0df3=df.replace(np.NaN,"0");df3;#pd.isnull(df)将缺失值替换为True/False后的DataFrame# 0 1 2 3 4 5 6 7 8 9# 0 False False False False False False False False False False# 1 False False False False False False False False False False# 2 False True False False False False False False False False# 3 False False False True False False False False False False# 4 False False False False False False False False False Truedf3=pd.isnull(df);df3;#pd.notnull(df)将缺失值替换为True/False后的DataFrame，和isnull相反# 0 1 2 3 4 5 6 7 8 9# 0 True True True True True True True True True True# 1 True True True True True True True True True True# 2 True False True True True True True True True True# 3 True True True False True True True True True True# 4 True True True True True True True True True Falsedf3=pd.notnull(df2);df3

Out[93]:

	0	1	2	3	4	5	6	7	8	9
0	True	True	True	True	True	True	True	True	True	True
1	True	True	True	True	True	True	True	True	True	True
2	True	False	True	True	True	True	True	True	True	True
3	True	True	True	False	True	True	True	True	True	True
4	True	True	True	True	True	True	True	True	True	False

#pandas表合并#对于DataFrame需要使用字典的方式创建import numpy as np;from pandas import Series,DataFrame;import pandas as pd;#使用pd.merge函数根据多个键将多个DataFrame中的行列起来# on=None 用于显示指定列名（键名），如果该列在两个对象上的列名不同，则可以通过 left_on=None, right_on=None 来分别指定。或者想直接使用行索引作为连接键的话，就将 left_index=False, right_index=False 设为 True。# how='inner' 参数指的是当左右两个对象中存在不重合的键时，取结果的方式：inner 代表交集；outer 代表并集；left 和 right 分别为取一边。# suffixes=('_x','_y') 指的是当左右对象中存在除连接键外的同名列时，结果集中的区分方式，可以各加一个小尾巴。# 对于多对多连接，结果采用的是行的笛卡尔积。#构造多个DataFramedf1=pd.DataFrame({ "A":np.array(list([v for v in range(1,8,3)])), "B":np.array(list([k for k in range(2,9,3)])), "C":np.array(list([j for j in range(3,10,3)]))});df2=pd.DataFrame({ "D":np.array(list([v for v in range(10,17,3)])), "A":np.array(list([k for k in range(11,18,3)])), "E":np.array(list([j for j in range(12,19,3)]))});df3=pd.DataFrame(list(np.arange(26,51).reshape(5,5)),index=list("abcde"),columns=["A","B","C","D","E"]);df1;df2;df3#1.使用pd.merge函数根据列键拼接多个DataFrame的行，取交集#inner内连接 outer外连接 left/right分别取左边和右边# A B C D E# 0 11 NaN NaN 10 12# 1 14 NaN NaN 13 15# 2 17 NaN NaN 16 18df4=pd.merge(df1,df2,on="A",how="right");df4

Out[3]:

	A	B	C	D	E
0	11	NaN	NaN	10	12
1	14	NaN	NaN	13	15
2	17	NaN	NaN	16	18

In [25]:

#使用pd.concat()函数延一条轴合并多个Series或DataFrame#axis指定坐标轴，0是列纵向合并，1是行横向合并。不指定默认按列纵向合并，即axis默认取值为0# objs: series，dataframe或者是panel构成的序列lsit # axis：需要合并链接的轴，0是行，1是列 # join：连接的方式 inner，或者outer# A B C D A E# 0 1 2 3 10 11 12# 1 4 5 6 13 14 15# 2 7 8 9 16 17 18# pd.concat(objs=[df1,df2],sort=True,join="inner",axis=1);df5=DataFrame({ "A":range(3,6), "B":range(6,9), "C":range(9,12)});df5df6=DataFrame({ "D":range(12,15), "B":range(15,18), "C":range(18,21)});df6;# A B C A B C# 0 3 6 9 12 15 18# 1 4 7 10 13 16 19# 2 5 8 11 14 17 20# pd.concat(objs=[df5,df6],axis=1);# A B C# 0 3 6 9# 1 4 7 10# 2 5 8 11# 0 12 15 18# 1 13 16 19# 2 14 17 20pd.concat(objs=[df5,df6],axis=0,sort=True,join="inner");# pd.concat(objs=[df5,df6]);# A B C D# 0 3.0 6 9 NaN# 1 4.0 7 10 NaN# 2 5.0 8 11 NaN# 0 NaN 15 18 12.0# 1 NaN 16 19 13.0# 2 NaN 17 20 14.0pd.concat(objs=[df5,df6],join="inner",sort=True);pd.concat(objs=[df5,df6],join="outer",sort=True)

Out[25]:

	A	B	C	D
0	3.0	6	9	NaN
1	4.0	7	10	NaN
2	5.0	8	11	NaN
0	NaN	15	18	12.0
1	NaN	16	19	13.0
2	NaN	17	20	14.0

pandas数据分组

groupby 对DataFrame进行数据分组，传入列名列表或者Series序列对象，返回生成一个GroupBy对象。它实际上还没有进行任何计算。

GroupBy对象是一个迭代对象，每次迭代结果是一个元组。

元组的第一个元素是该组的名称(就是groupby的列的元素名称)。

第二个元素是该组的具体信息，是一个数据框。

索引是以前的数据框的总索引。

In [49]:

df2;# One Two A B# 0 6 8 a 2# 1 1 4 a 4# 2 2 5 c 6# 3 2 9 c 8# 4 1 6 e 10# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B44E36ABA8># ('a', One Two A B# 0 6 8 a 2# 1 1 4 a 4)# ('c', One Two A B# 2 2 5 c 6# 3 2 9 c 8)# ('e', One Two A B# 4 1 6 e 10)# ==========================# One Two B# A # a 7 12 6# c 4 14 14# e 1 6 10# ==========================# One Two B# A # a 3.5 6.0 3# c 2.0 7.0 7# e 1.0 6.0 10# ==========================# One Two B# A # a 6 8 4# c 2 9 8# e 1 6 10# ==========================# One Two B# A # a 1 4 2# c 2 5 6# e 1 6 10# ==========================# ('a', One Two A B# 0 6 8 a 2# 1 1 4 a 4)# ('c', One Two A B# 2 2 5 c 6# 3 2 9 c 8)# ('e', One Two A B# 4 1 6 e 10)# ==========================# One Two# A # a 1 4# c 2 5# e 1 6df7=DataFrame({ "One":np.random.randint(1,10,5), "Two":np.random.randint(1,10,5), "A":["a","a","c","c","e"], "B":range(2,12,2)});print(df7);#使用groupby函数返回groupby对象进行分组# df7.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)groupby=df7.groupby(by=["A"],sort=True);# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B44E24C5C0>print(groupby);#分组后返回的三对元祖：a/c/e作为key 对应元祖作为valuefor i in groupby: print(i); #分组后对每个元祖的聚合操作print("==========================");print(groupby.sum());print("==========================");print(groupby.mean());print("==========================");print(groupby.max());print("==========================");print(groupby.min());#对某一列分组后，根据列索引获取数据print("==========================");for i in groupby["One","Two"]: print(i); print("==========================");print(groupby["One","Two"].min()); One Two A B0 3 5 a 21 6 2 a 42 8 9 c 63 9 1 c 84 2 5 e 10<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B44E373C88>('a', One Two A B0 3 5 a 21 6 2 a 4)('c', One Two A B2 8 9 c 63 9 1 c 8)('e', One Two A B4 2 5 e 10)========================== One Two BA a 9 7 6c 17 10 14e 2 5 10========================== One Two BA a 4.5 3.5 3c 8.5 5.0 7e 2.0 5.0 10========================== One Two BA a 6 5 4c 9 9 8e 2 5 10========================== One Two BA a 3 2 2c 8 1 6e 2 5 10==========================('a', One Two A B0 3 5 a 21 6 2 a 4)('c', One Two A B2 8 9 c 63 9 1 c 8)('e', One Two A B4 2 5 e 10)========================== One TwoA a 3 2c 8 1e 2 5

In [83]:

#使用apply函数对pandas数据进行聚合操作df8=DataFrame(np.arange(100,125).reshape(5,5));df8;df8.apply(lambda x:x*2);df8[0]df8[0].apply(lambda x:x-5 if x>100 else x);df8.loc[3:,2].apply(lambda x:x*3);df8;df9=DataFrame({ "A":range(1,7), "B":range(7,13), "C":pd.date_range(start="2019-04-01",end="2019-04-06")});df9;# 0 1 2# 0 2019 04 01# 1 2019 04 02# 2 2019 04 03# 3 2019 04 04# 4 2019 04 05# 5 2019 04 06years=df9["C"].apply(lambda x:str(x)[0:4]);months=df9["C"].apply(lambda y:str(y)[5:7]);days=df9["C"].apply(lambda z:str(z)[8:10]);df10=DataFrame(np.array(list(years)));df10[1]=list(months);df10[2]=list(days);df10

Out[83]:

	0	1	2
0	2019	04	01
1	2019	04	02
2	2019	04	03
3	2019	04	04
4	2019	04	05
5	2019	04	06

In [107]:

#pandas数据透视表pivot_table# 根据一个或多个键进行聚合，并根据行列上的分组键将数据分配到各个矩形区域中。# pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')#默认使用mean函数进行均值计算df11=DataFrame({ "A":np.arange(15,20).reshape(5), "B":np.arange(20,25).reshape(5), "C":np.arange(25,30).reshape(5)});df11;df11["B"]=[20,21,21,22,22];df11["C"]=[25,25,26,27,28];pd.pivot_table(data=df11,index=["B"],columns=["C"],aggfunc=np.mean)# df11

Out[107]:

	A
C	25	26	27	28
B
20	15.0	NaN	NaN	NaN
21	16.0	17.0	NaN	NaN
22	NaN	NaN	18.0	19.0

4.Matplotlib模块

import matplotlib.pyplot as plt;import numpy as np;#使用plot()函数指定坐标轴的坐标# plot([x], y, [fmt], data=None, **kwargs)# plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)# 指定函数的x坐标和y坐标列表#使用%matplotlib tk 和%matplotlib inline设置图像的显示，tk为在GUI显示,inline在行内显示%matplotlib inline#构造抛物线函数x=np.arange(-7.8,8,0.1);y=x**2+1;plt.plot(x,y);plt.show()

matplotlib基本图形figure
matplotlib中的所有图像都是位于figure对象中，一个图像只能有一个figure对象。
matplotlib 的 figure 就是一个单独的 figure 小窗口, 小窗口里面还可以有更多的小图片.

In [4]:

arr=np.arange(-5,5,0.1);arr#计算正弦值，返回的ndarray作为行坐标# array([ 0.95892427, 0.97753012, 0.7568025 , 0.35078323, -0.14112001,# -0.59847214, -0.90929743, -0.99749499, -0.84147098, -0.47942554,# 0. , 0.47942554, 0.84147098, 0.99749499, 0.90929743,# 0.59847214, 0.14112001, -0.35078323, -0.7568025 , -0.97753012])x=np.sin(arr);x#计算余弦值，返回ndarray作为纵坐标# array([ 0.28366219, -0.2107958 , -0.65364362, -0.93645669, -0.9899925 ,# -0.80114362, -0.41614684, 0.0707372 , 0.54030231, 0.87758256,# 1. , 0.87758256, 0.54030231, 0.0707372 , -0.41614684,# -0.80114362, -0.9899925 , -0.93645669, -0.65364362, -0.2107958 ])y=np.cos(arr);y;#图形构建开始#先创建一个图形plt.figure(figsize=(5,5));#指定shape1坐标plt.plot(x,y);#再创建一个图形plt.figure(num=3,figsize=(8,8));#指定shape2坐标plt.plot(x,y);#绘图plt.show();# figure(num=None, figsize=None, dpi=None, facecolor=None, edgecolor=None, frameon=True)# num:图像编号或名称，数字为编号，字符串为名称# figsize:指定figure的宽和高，单位为英寸；# dpi参数指定绘图对象的分辨率，即每英寸多少个像素，缺省值为80 1英寸等于2.5cm,A4纸是 21*30cm的纸张 # facecolor:背景颜色# edgecolor:边框颜色# frameon:是否显示边框

In [5]:

#plot函数指定多个参数绘制高阶函数图像arr=np.arange(-2,2,0.1);x=np.sin(arr);y1=np.cos(arr);z1=np.cos(arr);z2=np.tan(arr);plt.plot(x,y1,z1,z2);plt.show();##如果只接收到一个值就默认为y值，而x默认为从0 到 n ，使用默认的线条样式和颜色plt.plot(np.arange(2,12,2));plt.plot([1,2,4,8,16]);plt.show();plt.plot(y1,z1)plt.show();

In [132]:

#matplotlib颜色、标记和线型x=np.arange(-5,5,0.1);y=np.sin(x);z=np.cos(x);plt.plot(x,y,x,z);plt.show();#plot(x,y,color='red', linestyle='dashed', marker='o'.....)# 绘图中用到的直线属性包括：# （1）LineStyle：线形# （2）LineWidth：线宽# （3）Color：颜色# （4）Marker：标记点的形状# （5）label:用于图例的标签plt.plot(x,y,x,z,color='purple',linestyle='solid',linewidth="3")

Out[132]:

[<matplotlib.lines.Line2D at 0x21c8a07a278>, <matplotlib.lines.Line2D at 0x21c8a07ad30>]

Python科学计算库核心知识点总结_代码篇(ML/DL依赖语法)相关推荐

一文带你熟悉简单实用的Python科学计算库NumPy
Python科学计算库NumPy 安装数组的创建 array创建 **arange** 创建 **随机数创建** 方法numpy.random.random(size=None) 方法numpy.r ...
初识 Python 科学计算库之 NumPy（创建多维数组对象）
文章目录参考描述 NumPy 特点获取导入多维数组对象 np.array() np.asarray() 范围随机概览 np.random.randn() np.random.normal ...
python科学计算库-数值计算库与科学计算库
BLAS 接口 BLAS , LAPACK , ATLAS 这些数值计算库的名字很类似,他们之间有什么关系呢?BLAS是一组线性代数运算接口,目前是事实上的标准,很多数值计算/科学计算都实现了这套接口 ...
Python 科学计算库 Numpy 准备放弃 Python 2 了
Numpy 是 Python 的一个科学计算库,提供了矩阵运算的功能,一般与 Scipy.matplotlib 一起使用. 今天 Numpy 的 GitHub 主页上发文称,Numpy 库准备从 20 ...
python科学计算库安装
python科学计算相关的库包括numpy,scipy,matplotlib等,但是自己安装比较不容易,倒不是安装过程有多难,而是会出现各种各样的问题,现在做一记录安装顺序numpy -> s ...
python科学计算库numpy和绘图库PIL的结合,素描图片(原创)
# 导入绘图库 from PIL import Image #导入科学计算库 import numpy as np #封装一个图像处理工具类 class TestNumpy(object):def p ...
Python | 科学计算库
一.Numpy 1.ndarray对象 python提供了array模块,它可以直接保存数值(而不是对象),但是它不支持多维数组,也缺乏丰富的运算函数 ndarray即n维数组,它弥补了以上不足,提供 ...
Python 科学计算库 Numpy（一）—— 概述
目录一 Numpy(Numerical Python) 1. Numpy 是什么 2. Numpy 的主要用途二 Numpy 数组 VS Python 列表三 Numpy 数据类型和属性 1. ...
python科学计算库-python 科学计算基础库安装
1.numpy NumPy(Numeric Python)是用Python进行科学计算的基本软件包. NumPy是Python编程语言的扩展,增加了对大型多维数组和矩阵的支持,以及一个大型的高级数学函 ...

Python科学计算库核心知识点总结_代码篇(ML/DL依赖语法)