In particular, pandas offers data structures and operations for manipulating numerical tables and time series:

  1. pd.Series
// create a series
In [3]: s = pd.Series([1, 1, 12, 6, np.nan])In [4]: s
0    1.0
1    1.0
2    12.0
3    6.0
5    NaN
dtype: float64

Slice operation in series:
1\ To print elements from beginning to a range use [:Index] (does not include Index)
2\ To print elements till end-use [:-Index] (does not include Index)
3\ To print elements from specific Index till the end use [Index:] (does not include Index)
4\ To print elements within a range, use [Start Index:End Index]
5\ To print whole Series with the use of slicing operation, use [:]
6\ To print the whole Series in reverse order, use [::-1]

  1. pd.DataFrame:
// create a series
In [1]: df = pd.Series.to_frame(s)
In [2]: df.column = ['Rate']

//selecting columns:
1\ use .loc to do conditional selection.

  1. dictionary
//create a dictionary
dict = {'Pheobe':[95,93,11,65],'luis':[99,97,66,70] }
  1. how to join two tables
df_off = df_off.set_index('trans_dt').join(total_gmb.set_index('trans_dt'),how='left', lsuffix='',rsuffix='_total', sort=False).reset_index()

Encountered errors and solutions:

1. datetime package:

AttributeError: 'datetime' module has no attribute 'strptime'


from datetime import datetime


#module  class    method
datetime.datetime.strptime(date, "%Y-%m-%d")

2. space in the string

In[5]: validation_data1['VERTICAL'].unique()
Out[5]: array(['Home\xa0&\xa0Garden', 'Other', 'Electronics'], dtype=object)

\xa0 is a non-breaking space in Latin1 (ISO 8859-1), also chr(160). Here is how to replace it with normal space:

string = string.replace(u'\xa0', u' ')

3. ‘pandas’ has no attribute ‘ewma’

    # 对size个数进行加权移动平均rol_weighted_mean = pd.ewma(timeSeries, span=size)改为rol_weighted_mean = pd.DataFrame.ewm(timeSeries, span=size).mean()

4. 解决plt图像交叠问题:

fig = plt.figure()

5. 怎么从string中选取特定字符

original array:
array(['Evergreen, Core, GEO1','PL_TS_Control, Evergreen, Promoted Listings','Evergreen, GEO4, Core', 'Evergreen, GEO2, Core','Evergreen, Core, GEO 5', 'Evergreen, GEO3, Core','Evergreen, PL_TS_Treatment, Promoted Listings','Trading Cards, Strategic', 'CR, Strategic','Evergreen, FBK, Core', 'Evergreen, Core, CTRL','Watches, Strategic', 'Watches_SSC_Treatment_GEOB, Strategic','Sneakers, Strategic', 'Watches_SSC_Control_GEOA, Strategic','Evergreen, Core', 'Evergreen, Core, C2C','Strategic, Sneaker_Showcase'], dtype=object)data_test=data.loc[data['Labels on Campaign'].apply(lambda x: (x.split(',',1)[1] == ' Strategic') or (x.split(',',1)[0] == 'Strategic'))]array(['Trading Cards, Strategic', 'CR, Strategic', 'Watches, Strategic','Watches_SSC_Treatment_GEOB, Strategic', 'Sneakers, Strategic','Watches_SSC_Control_GEOA, Strategic','Strategic, Sneaker_Showcase'], dtype=object)
  1. 浅拷贝和深拷贝
dict2 = dict1          # 浅拷贝: 引用对象
dict3 = dict1.copy()   # 浅拷贝:深拷贝父对象(一级目录),子对象(二级目录)不拷贝,还是引用
  1. is 和 == 的区别
    ‘b is a’ returns ‘True’ when a and b point to the same object
    ‘b == a’ returns ‘True’ when a and b have the same variables
a = [1,2,3]
b = a                      #copy a
c = a[:]                   #copy a using slice operator
if b == a:print('True1')
if b is a:print('True2')
if c == a:print('True3')
if c is a:print('True4')[out]: True1True2True3

