pandas.dataframe用法总结何时返回dataframe 何时返回series

pandas.dataframe用法总结：

1 df[df.Datatype=='train'] 返回的是一个dataframe ,中括号里==返回的为series 它的特点是有索引有值

2 df['Class'] 返回的为Series type(tr['Class']=) <class 'pandas.core.series.Series'>

####surce code###############################################################################

# Takes in dataframes and a list of selected features (column names)
# and returns (train_x, train_y), (test_x, test_y)
def train_test_data(complete_df, features_df, selected_features):
'''Gets selected training and test features from given dataframes, and
returns tuples for training and test features and their corresponding class labels.
:param complete_df: A dataframe with all of our processed text data, datatypes, and labels
:param features_df: A dataframe of all computed, similarity features
:param selected_features: An array of selected features that correspond to certain columns in `features_df`
:return: training and test features and labels: (train_x, train_y), (test_x, test_y)'''

# get the training features
df = pd.concat([complete_df,features_df],axis=1)
tr = df[df.Datatype=='train']

print("type(sf=",type(tr))
print("tr=",tr)
print("type(df.Datatype=='train')=",type(df.Datatype == 'train'))
train_x = tr[selected_features].values
print("train_x=",train_x)
# And training class labels (0 or 1)
t = df.Datatype == "Class"
print("type(t)=",type(t),"t=",t)
train_y = tr['Class'].values

# get the test features and labels
test= df[df.Datatype == 'test']
test_x = test[selected_features].values
test_y = test['Class'].values

return (train_x, train_y), (test_x, test_y)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
test_selection = list(features_df)[:2] # first couple columns as a test
print("test_selection=",test_selection)
# test that the correct train/test data is created
(train_x, train_y), (test_x, test_y) = train_test_data(complete_df, features_df, test_selection)

# params: generated train/test data
tests.test_data_split(train_x, train_y, test_x, test_y)

#result##############################################################################################

test_selection= ['c_1', 'c_2']
type(sf= <class 'pandas.core.frame.DataFrame'>
tr=               File Task  Category  Class  \
0   g0pA_taska.txt    a         0      0
2   g0pA_taskc.txt    c         2      1
3   g0pA_taskd.txt    d         1      1
4   g0pA_taske.txt    e         0      0
5   g0pB_taska.txt    a         0      0
..             ...  ...       ...    ...
89  g4pD_taske.txt    e         1      1
90  g4pE_taska.txt    a         1      1
91  g4pE_taskb.txt    b         2      1
92  g4pE_taskc.txt    c         3      1
93  g4pE_taskd.txt    d         0      0   Text Datatype       c_1  \
0   inheritance is a basic concept of object orien...    train  0.398148
2   the vector space model also called term vector...    train  0.869369
3   bayes theorem was names after rev thomas bayes...    train  0.593583
4   dynamic programming is an algorithm design tec...    train  0.544503
5   inheritance is a basic concept in object orien...    train  0.329502
..                                                ...      ...       ...
89  dynamic programming is a method of providing s...    train  0.845188
90  object oriented programming is a style of prog...    train  0.485000
91  pagerankalgorithm is also known as link analys...    train  0.950673
92  the definition of term depends on the applicat...    train  0.551220
93   bayes theorem or bayes rule  or something cal...    train  0.361257   c_2       c_3       c_4       c_5       c_6  lcs_word
0   0.079070  0.009346  0.000000  0.000000  0.000000  0.191781
2   0.719457  0.613636  0.515982  0.449541  0.382488  0.846491
3   0.268817  0.156757  0.108696  0.081967  0.060440  0.316062
4   0.115789  0.031746  0.005319  0.000000  0.000000  0.242574
5   0.053846  0.007722  0.003876  0.000000  0.000000  0.161172
..       ...       ...       ...       ...       ...       ...
89  0.546218  0.400844  0.347458  0.302128  0.273504  0.643725
90  0.105528  0.025253  0.005076  0.000000  0.000000  0.242718
91  0.878378  0.823529  0.800000  0.780822  0.761468  0.839506
92  0.328431  0.285714  0.252475  0.233831  0.220000  0.283019
93  0.031579  0.000000  0.000000  0.000000  0.000000  0.161765  [70 rows x 13 columns]
type(df.Datatype=='train')= <class 'pandas.core.series.Series'>
train_x= [[0.39814815 0.07906977][0.86936937 0.71945701][0.59358289 0.2688172 ][0.54450262 0.11578947][0.32950192 0.05384615][0.59030837 0.15044248][0.75977654 0.50561798][0.51612903 0.07027027][0.44086022 0.11891892][0.97945205 0.91724138][0.95138889 0.7972028 ][0.97647059 0.85798817][0.81176471 0.55621302][0.44117647 0.03030303][0.48888889 0.06741573][0.81395349 0.67058824][0.61111111 0.15492958][1.         1.        ][0.63402062 0.20207254][0.58293839 0.29047619][0.63793103 0.42857143][0.42038217 0.07692308][0.68776371 0.40677966][0.67664671 0.31927711][0.76923077 0.53355705][0.71226415 0.37914692][0.62992126 0.33992095][0.71573604 0.26020408][0.33206107 0.03065134][0.71721311 0.36213992][0.87826087 0.71179039][0.52980132 0.35548173][0.57211538 0.14009662][0.31967213 0.04115226][0.53       0.13567839][0.78       0.65829146][0.65269461 0.18674699][0.44394619 0.15315315][0.66502463 0.39108911][0.72815534 0.30731707][0.76204819 0.54984894][0.94701987 0.67333333][0.36842105 0.0619469 ][0.53289474 0.09933775][0.61849711 0.16860465][0.51030928 0.09326425][0.57983193 0.11814346][0.40703518 0.06565657][0.51546392 0.09310345][0.58454106 0.27669903][0.6171875  0.33858268][1.         0.96153846][0.99166667 0.96638655][0.5505618  0.15819209][0.41935484 0.07608696][0.83516484 0.45555556][0.92708333 0.69473684][0.492891   0.05714286][0.70873786 0.52682927][0.86338798 0.66483516][0.96060606 0.92097264][0.43801653 0.08333333][0.73366834 0.35353535][0.51388889 0.09302326][0.48611111 0.07906977][0.84518828 0.54621849][0.485      0.10552764][0.95067265 0.87837838][0.55121951 0.32843137][0.36125654 0.03157895]]
type(t)= <class 'pandas.core.series.Series'> t= 0     False
1     False
2     False
3     False
4     False...
95    False
96    False
97    False
98    False
99    False
Name: Datatype, Length: 100, dtype: bool
Tests Passed!

#https://notebookinstance.notebook.us-east-2.sagemaker.aws/notebooks/CN-ML_SageMaker_Studies/Project_Plagiarism_Detection/2_Plagiarism_Feature_Engineering.ipynb

pandas.dataframe用法总结何时返回dataframe 何时返回series相关推荐

pandas使用np.where函数计算返回dataframe中指定数据列包含缺失值的行索引列表list
pandas使用np.where函数计算返回dataframe中指定数据列包含缺失值的行索引列表list(index of rows with missing values in dataframe ...
pandas使用dropna函数计算返回dataframe中不包含缺失值的行索引列表list（index of rows without missing values in dataframe）
pandas使用dropna函数计算返回dataframe中不包含缺失值的行索引列表list(index of rows without missing values in dataframe) 目录
pandas使用isna函数和any函数计算返回dataframe中包含缺失值的数据行（rows with missing values in dataframe）
pandas使用isna函数和any函数计算返回dataframe中包含缺失值的数据行(rows with missing values in dataframe) 目录
pandas使用groupby函数和count函数返回的是分组下每一列的统计值（不统计NaN缺失值）、如果多于一列返回dataframe、size函数返回分组下的行数结果为Series(缺失值不敏感）
pandas使用groupby函数和count函数返回的是分组下每一列的统计值(不统计NaN缺失值).如果多于一列返回dataframe.size函数返回分组下的行数结果为Series(不区分缺失值和 ...
pandas使用read_excel函数读取excel表格数据为dataframe、设置sheet_name参数为表单索引位置列表则读取多个表单的数据并返回dataframe字典
pandas使用read_excel函数读取excel表格数据为dataframe.使用sheet_name参数指定读取excel表格中指定的sheet表单.设置sheet_name参数为表单索引位置 ...
panda 函数笔记(merge\DataFrame用法\DataFrame.plot)
1.merge( ) 2.DataFrame用法 2.1.创建一个DataFrame: 2. ...
Pandas的学习(4.DataFrame之间的运算以及DataFrame和Series之间的运算)
DataFrame的运算 1.DataFrame之间的运算同Series一样: --- 在运算中自动对齐不同索引的数据 --- 如果索引不对应,则补NaN 无论是行不对应还是列不对应,都 ...
pandas的自带数据集_pandas.DataFrame.sample随机抽样
从0到1Python数据科学之旅:http://dwz.date/cqpw 微信公众号:pythonEducation模型和统计项目QQ:231469242 1 数据切片选取 1.1 pa ...
python读hadoop_python读取hdfs并返回dataframe教程
不多说,直接上代码 from hdfs import Client import pandas as pd HDFSHOST = "http://xxx:50070" FILENA ...

pandas.dataframe用法总结何时返回dataframe 何时返回series

pandas.dataframe用法总结何时返回dataframe 何时返回series相关推荐

最新文章

热门文章

pandas.dataframe用法总结 何时返回dataframe 何时返回series

pandas.dataframe用法总结 何时返回dataframe 何时返回series相关推荐

最新文章

热门文章

pandas.dataframe用法总结何时返回dataframe 何时返回series

pandas.dataframe用法总结何时返回dataframe 何时返回series相关推荐