文章接着前一篇文章《用python对android APP进行分析1》的内容

转换其他列数据类型

data.Reviews=data['Reviews'].astype(np.int,inpalce=True)

data.Reviews.head()

0 159

1 967

2 87510

3 215644

4 967

Name: Reviews, dtype: int32

print(data[~data.Size.str.contains('M')].head())

App Category Rating Reviews \

37 Floor Plan Creator ART_AND_DESIGN 4.1 36639

42 Textgram - write on photos ART_AND_DESIGN 4.4 295221

52 Used Cars and Trucks for Sale AUTO_AND_VEHICLES 4.6 17057

58 Restart Navigator AUTO_AND_VEHICLES 4.0 1403

67 Ulysse Speedometer AUTO_AND_VEHICLES 4.3 40211

Size Installs Type Price Content Rating Genres \

37 Varies with device 5000000 Free 0 Everyone Art & Design

42 Varies with device 10000000 Free 0 Everyone Art & Design

52 Varies with device 1000000 Free 0 Everyone Auto & Vehicles

58 201k 100000 Free 0 Everyone Auto & Vehicles

67 Varies with device 5000000 Free 0 Everyone Auto & Vehicles

Last Updated Current Ver Android Ver installs_range

37 July 14, 2018 Varies with device 2.3.3 and up 百万+

42 July 30, 2018 Varies with device Varies with device 百万+

52 July 30, 2018 Varies with device Varies with device 十万+

58 August 26, 2014 1.0.1 2.2 and up 万+

67 July 30, 2018 Varies with device Varies with device 百万+

大体发现有三种大小,k级的,m级的,不确定的

#定义改变大小统一单位的函数

def size_normal(x):

if 'M' in x.upper():

return float(x.replace('M',''))*1000

elif 'k' in x.lower():

return float(x.replace('k',''))

else:

return np.nan

data.Size.map(size_normal)[[1,146,10595]]#检验是否装换好

1 14000.0

146 NaN

10595 470.0

Name: Size, dtype: float64

data['size_k']=data.Size.map(size_normal)

print(data.head())

App Category Rating \

0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN 4.1

1 Coloring book moana ART_AND_DESIGN 3.9

2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN 4.7

3 Sketch - Draw & Paint ART_AND_DESIGN 4.5

4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3

Reviews Size Installs Type Price Content Rating \

0 159 19M 10000 Free 0 Everyone

1 967 14M 500000 Free 0 Everyone

2 87510 8.7M 5000000 Free 0 Everyone

3 215644 25M 50000000 Free 0 Teen

4 967 2.8M 100000 Free 0 Everyone

Genres Last Updated Current Ver \

0 Art & Design January 7, 2018 1.0.0

1 Art & Design;Pretend Play January 15, 2018 2.0.0

2 Art & Design August 1, 2018 1.2.4

3 Art & Design June 8, 2018 Varies with device

4 Art & Design;Creativity June 20, 2018 1.1

Android Ver installs_range size_k

0 4.0.3 and up 千+ 19000.0

1 4.0.3 and up 十万+ 14000.0

2 4.0.3 and up 百万+ 8700.0

3 4.2 and up 千万+ 25000.0

4 4.4 and up 万+ 2800.0

更新时间转换

from dateutil.parser import parse

def time_normal(time):

return parse(time)

data['Last Updated']=data['Last Updated'].map(time_normal)

print(data.head())

App Category Rating \

0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN 4.1

1 Coloring book moana ART_AND_DESIGN 3.9

2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN 4.7

3 Sketch - Draw & Paint ART_AND_DESIGN 4.5

4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3

Reviews Size Installs Type Price Content Rating \

0 159 19M 10000 Free 0 Everyone

1 967 14M 500000 Free 0 Everyone

2 87510 8.7M 5000000 Free 0 Everyone

3 215644 25M 50000000 Free 0 Teen

4 967 2.8M 100000 Free 0 Everyone

Genres Last Updated Current Ver Android Ver \

0 Art & Design 2018-01-07 1.0.0 4.0.3 and up

1 Art & Design;Pretend Play 2018-01-15 2.0.0 4.0.3 and up

2 Art & Design 2018-08-01 1.2.4 4.0.3 and up

3 Art & Design 2018-06-08 Varies with device 4.2 and up

4 Art & Design;Creativity 2018-06-20 1.1 4.4 and up

installs_range size_k

0 千+ 19000.0

1 十万+ 14000.0

2 百万+ 8700.0

3 千万+ 25000.0

4 万+ 2800.0

更新时间转换为时间格式,此处如果把时间装换为索引,通时间序列方法进行操作,但不做本次分析探讨内容。

检查异常值

print(data.describe())

Rating Reviews Installs size_k

count 10841.000000 1.084100e+04 1.084100e+04 9146.000000

mean 4.190739 4.441119e+05 1.546291e+07 21514.504975

std 0.479738 2.927629e+06 8.502557e+07 22588.342683

min 1.000000 0.000000e+00 0.000000e+00 8.500000

25% 4.100000 3.800000e+01 1.000000e+03 4900.000000

50% 4.200000 2.094000e+03 1.000000e+05 13000.000000

75% 4.500000 5.476800e+04 5.000000e+06 30000.000000

max 5.000000 7.815831e+07 1.000000e+09 100000.000000

发现数值类型列没有异常值,price将会在后面内容进行装换

删除重复值

data.duplicated().sum()

483

data.drop_duplicates(inplace=True)

data.info()

Int64Index: 10358 entries, 0 to 10840

Data columns (total 15 columns):

App 10358 non-null object

Category 10358 non-null object

Rating 10358 non-null float64

Reviews 10358 non-null int32

Size 10358 non-null object

Installs 10358 non-null int32

Type 10358 non-null object

Price 10358 non-null object

Content Rating 10358 non-null object

Genres 10358 non-null object

Last Updated 10358 non-null datetime64[ns]

Current Ver 10350 non-null object

Android Ver 10356 non-null object

installs_range 10358 non-null category

size_k 8832 non-null float64

dtypes: category(1), datetime64[ns](1), float64(2), int32(2), object(9)

memory usage: 1.1+ MB

data.to_csv(r'C:\Users\19078\Desktop\中级\第三关\android_data.csv',sep=',',encoding='utf_8_sig')#保存数据到csv格式

数据分析

分类对评论数数的影响

a=pd.pivot_table(data,columns='Type',index='Category',values='Reviews',aggfunc='mean').sort_values(by='Free',ascending=False)[:10]

b=pd.pivot_table(data,columns='Type',index='Category',values='Reviews',aggfunc='mean').sort_values(by='Paid',ascending=False)[:10]

a['Free'].plot(kind='bar',rot=60)

b['Paid'].plot(kind='bar',rot=60)

从两个图对比发现,不同类型app平均评论数相差较大,免费方面以游戏,社交,聊天居多,而付费中家庭,游戏,天气app评论居多,所以app种类和付费类型对评论数有一定影响。

类别与app软件大小的关系

a=pd.pivot_table(data,index='Category',values='size_k',aggfunc='mean').sort_values(by='size_k',ascending=False)[:15]

print(a)

size_k

Category

GAME 44126.850000

FAMILY 27930.435770

TRAVEL_AND_LOCAL 24515.994413

SPORTS 24181.192568

ENTERTAINMENT 22638.805970

PARENTING 22512.962963

FOOD_AND_DRINK 22056.122449

HEALTH_AND_FITNESS 21643.216667

EDUCATION 20076.895833

AUTO_AND_VEHICLES 20037.146667

MEDICAL 19383.681579

FINANCE 17937.730263

SOCIAL 16875.827586

PHOTOGRAPHY 16832.045267

MAPS_AND_NAVIGATION 16614.712963

可以看出不同类型软件大小也不同,游戏会比较大。同时也发现app普遍大小都是几十兆,所以可以了解app趋向的大小也是十几到及时兆比较合适。

付费软件中什么类别价格更高

data_paid=data[data.Type.isin(['Paid'])]

print(data_paid.head())

App Category Rating \

234 TurboScan: scan documents and receipts in PDF BUSINESS 4.7

235 Tiny Scanner Pro: PDF Doc Scan BUSINESS 4.8

427 Puffin Browser Pro COMMUNICATION 4.0

476 Moco+ - Chat, Meet People DATING 4.2

477 Calculator DATING 2.6

Reviews Size Installs Type Price Content Rating \

234 11442 6.8M 100000 Paid $4.99 Everyone

235 10295 39M 100000 Paid $4.99 Everyone

427 18247 Varies with device 100000 Paid $3.99 Everyone

476 1545 Varies with device 10000 Paid $3.99 Mature 17+

477 57 6.2M 1000 Paid $6.99 Everyone

Genres Last Updated Current Ver Android Ver installs_range \

234 Business 2018-03-25 1.5.2 4.0 and up 万+

235 Business 2017-04-11 3.4.6 3.0 and up 万+

427 Communication 2018-07-05 7.5.3.20547 4.1 and up 万+

476 Dating 2018-06-19 2.6.139 4.1 and up 千+

477 Dating 2017-10-25 1.1.6 4.0 and up 百+

size_k

234 6800.0

235 39000.0

427 NaN

476 NaN

477 6200.0

data_paid.Price=data_paid.Price.str.replace('$','').astype('float')

a=data_paid.groupby('Category')['Price'].agg(['mean','count']).sort_values(by='mean',ascending=False)[:15]

print(a)

mean count

Category

FINANCE 170.637059 17

LIFESTYLE 124.256316 19

EVENTS 109.990000 1

BUSINESS 14.607500 12

FAMILY 12.945561 187

MEDICAL 12.151071 84

PRODUCTIVITY 8.961786 28

PHOTOGRAPHY 6.111500 20

MAPS_AND_NAVIGATION 5.390000 5

SOCIAL 5.323333 3

PARENTING 4.790000 2

DATING 4.490000 7

EDUCATION 4.490000 4

AUTO_AND_VEHICLES 4.490000 3

HEALTH_AND_FITNESS 4.290000 15

C:\Users\19078\Anaconda3\envs\py\lib\site-packages\pandas\core\generic.py:4405: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

self[name] = value

从上述结果看出,金融理财,生活类和事件类软件收费较高。

不同类型软件付费比率

data.size

155370

def p_f_rate(group):

rate=(group[group['Type'].isin(['Paid'])].size)/(group[group['Type'].isin(['Free'])].size)

return rate.round(2)

data.groupby('Category').apply(p_f_rate).sort_values(ascending=False)[:15]

Category

PERSONALIZATION 0.27

MEDICAL 0.26

BOOKS_AND_REFERENCE 0.14

WEATHER 0.11

FAMILY 0.11

TOOLS 0.10

COMMUNICATION 0.08

GAME 0.08

SPORTS 0.07

PRODUCTIVITY 0.07

PHOTOGRAPHY 0.07

LIFESTYLE 0.05

FINANCE 0.05

HEALTH_AND_FITNESS 0.05

ART_AND_DESIGN 0.05

dtype: float64

可以看出付费率高的个性化和医疗的app,纵观所有,发现app不管什么类型,多数都是免费的,所以互联网的免费思维对于运营很关键

python 安卓app 缺点_用python对android APP进行分析2相关推荐

  1. php手机端开发,php手机app开发_开发点餐平台app

    如何用Wordpress制作App客户端并在AppStore上线 我猜你大概想表达用Wordpress制作App客户端的信息源,供App获取必要的信息.比如在Wordpress上发布一篇文章,然后Ap ...

  2. python手机app开发_利用python开发app实战的方法

    我很早之前就想开发一款app玩玩,无奈对java不够熟悉,之前也没有开发app的经验,因此一直耽搁了.最近想到尝试用python开发一款app,google搜索了一番后,发现确实有路可寻,目前也有了一 ...

  3. python在线搭建教程_理解python web开发,轻松搭建web app!

    大家好,今天分享给大家的是理解python web开发,轻松搭建web app,希望大家学有所获! 因为 python代码的优雅美观且易于维护这一特点,越来越多的人选择使用 Python做web开发. ...

  4. python测验7答案_中国大学MOOC的APP(慕课)2021用Python玩转数据章节测验答案

    中国大学MOOC的APP(慕课)2021用Python玩转数据章节测验答案 更多相关问题 如图是一个液晶显示器厂去年四个季度产值统计图,看图填空.(1)这是______统计图.(2)产值最少的是第__ ...

  5. python 有什么一般人不知道的缺点_关于python,你知道它的优缺点吗?

    python语言的优势介绍: 1.python是一门简单的编程语言,代表简单主义思想; 2.python简单容易上手,语法简单文档也非常明确; 3.python免费开源,是一款FLOSS(自由/源代码 ...

  6. python刷抖音_用Python生成抖音字符视频!

    抖音字符视频在去年火过一段时间. 反正我是始终忘不了那段极乐净土的音乐... 这一次自己也来实现一波,做一个字符视频出来. 主要用到的库有cv2,pillow库. 原视频如下,直接抖音下载的,妥妥的水 ...

  7. python安卓自动化实现方法_uiautomator +python 实现安卓UI自动化

    简单实例 注:安卓6.0以上的手机不会自动安装app-uiautomator.apk和app-uiautomator-test.apk,需要手动安装,否则报错ioerror RPC server no ...

  8. python安卓版开发环境搭建_React Native Android 开发环境搭建(Windows 版)

    补上之前说的 Windows 系统的 React Native 开发环境搭建,坑还是比 Mac 环境下的多些.此文的受众还是已经搭建过 Android 开发环境的同学. 需要安装的软件 Chocola ...

  9. python简单实践作业_【Python】:简单爬虫作业

    使用Python编写的图片爬虫作业: #coding=utf-8 import urllib import re def getPage(url): #urllib.urlopen(url[, dat ...

最新文章

  1. HaoZip(好压) 去广告纯净版 4.4
  2. 配置web项目session永不超时
  3. 大一计算机课实训总结1000字,大一计算机实训报告.doc
  4. 复习:线性表——双链表、循环链表
  5. SpringBootDubboZookeeper远程调用项目搭建
  6. Android模拟器的建立以及HelloWorld的编写
  7. 遗传算法MATLAB
  8. EPLAN教程——导出CAD如何快捷配置
  9. Qt优秀开源项目之十四:SortFilterProxyModel
  10. win下装django
  11. JAVA获得股票数据大全
  12. php不显示notice,解决PHP显示Warning和Notice等问题
  13. 华为手机能隐藏蓝牙吗_华为手机隐藏功能大全展示!
  14. Linux 操作必备 150 个命令,速度收藏~
  15. 搭建自己的github.io博客
  16. Unity 3D 遮挡剔除(仅专业版) Occlusion Culling (Pro only)
  17. JAVA大作业-购物车 (持续更新)
  18. 明明的随机数c++超短题解
  19. 看,2021年,一个普通应届生的成长之旅
  20. 落枕的原因 神奇穴位 预防落枕

热门文章

  1. 基于Zigbee的智能路灯控制系统的Qt操作界面
  2. RocketMQ(四)——消息重试
  3. 合工大 编译原理 实验三
  4. 政府招商引资合同履约怎么做好风险规避?
  5. SEO服务合同范本(转)
  6. MacX DVD Ripper Pro for Mac(DVD解码器) 6.2.120190416免激活版
  7. python随机抽号_使用python的random编写抽奖程序
  8. 【Java】从头开始学习-基础语法
  9. 计算机网络领略真实的arp,超星地址协议解析失败.doc
  10. Win10 某个移动硬盘不自动分配盘符