Python活动报名表的分析、处理和筛选

最近比较清闲，部门竞标成功了一个联谊活动，和其他两个女生比例比较高的学院一起承办，我是负责报名表的制作，从13号还是报名到现在（19号凌晨1点）已经有超过670位同学报名参加，男女比例趋向于1比2。

好像说的有点多，因为是学校实际数据，数据不会公开，报名表通过问卷星制作并下载得到原始数据表格。原始报名表共有8个，第一个姓名必填，第二个性别必选，第三个学历必选，第四个学院必填，第五个手机号码必填，六七八分别为照片、自我介绍和对另一半的要求，为选填项。

问卷星下载得到的exce格式如下：

0序号

1提交答卷时间

2所用时间

3来源

4来源详情

5来自IP

6姓名

7性别

8学历

9所在学院

10手机号码

11个人照片

12自我介绍

13对另一半的要求

对于8个问题中的姓名、学院、手机号为非空字符串，性别、学历为数字，分别代表可选项的序号，照片为附件类型，如果上传则显示为照片地址，否则为字符串'(空)'，介绍和要求如果填写则为对应字符串，否则为'(空)'。

首先，对于学院方面，有的喜欢写缩写，有的写全拼，例如对于文化与新闻传播学院，有的填“文新”、“文新学院”、“文化与新闻学院”等等，为了方便统计，需要对他们输入的千姿百态的学院名做出调整。于是手动整理了一个表格college_name.xls：

第一列为全拼，第二列为他们各种填写方式中对应为该学院的关键词的组合，多个间用空格隔开，并且不同学院的关键词不能相同或者包含关系。

college_name=xlrd.open_workbook(r'collegename.xls')#打开表格
cn=college_name.sheet_by_index(0)#获得第一张表
dictionary=dict()#字典存放关键词到学院的映射
mapping=dict()#字典存放学院到编号的映射
map_number=0#学院编号
start_row=1;#遍历开始行数 第一行为表头，所以跳过
row=start_row;
while row<cn.nrows:#cn.nrows:cn的行数wordset=re.split('  ',cn.cell_value(row,1))#通过split对多个关键词进行分割，放入wordset里for word in wordset:   dictionary[word]=cn.cell_value(row,0)#将每个关键词到学院全拼的映射存在dictionary字典if cn.cell_value(row,0) not in mapping:mapping[cn.cell_value(row,0)]=map_numbermap_number=map_number+1#如果学院名不在mapping的key中，则存入，并映射为map_numberrow=row+1

操作完后获得关键词到学院全名的映射和学院全名到编号的映射，之后储存每个学院分别的男女报名人数：

signtable=xlrd.open_workbook(r'联谊活动报名表.xls')#原始表
st=signtable.sheet_by_index(0)
row=start_row;
statistic=np.zeros([len(mapping),6]);#每一行代表一个学院的男生报名数、女生报名数、总报名人数占比、男生占比、女生占比和男女比例 先计算前两列
while row<st.nrows:college=st.cell_value(row,9);#或者原始输入的学院名find=0;#标记是否找到for keyword in dictionary:if keyword  in college:#如果dictionary的key中存在关键词是原始输入学院名的子串，则找到find=1;#statistic[mapping[dictionary[keyword]],int(st.cell_value(row,7)-1)]+=1break#将对应学院所在行（mapping映射的值）的第1列或第2列（取决于性别）+1if find==0:print(college+'can not find')row=row+1

然后用plt的饼状图绘制一下饼状图：

import matplotlib.pyplot as plt
labels = [str(i) for i in range((len(mapping)))]
sizes = [(i[0]+i[1])for i in statistic]plt.pie(sizes, labels=labels, autopct='%1.1f%%',shadow=True, startangle=90)
plt.axis('equal')
plt.show()

结果：

由于默认不支持中文所以将每一块的标签设置为mapping映射的编号。

开始写入第一张表，格式如下：

0姓名

1填表时长/s

2学院

3性别

4学历

5手机号码

6照片上传情况

7介绍

8要求

import xlwt
workspace=xlwt.Workbook(encoding='ascii')
excel=workspace.add_sheet('报名表完整版',cell_overwrite_ok=True)#添加第一张表
excel.write(0,0,'姓名')
excel.write(0,1,'填表时长/s')
excel.write(0,2,'学院')
excel.write(0,3,'性别')
excel.write(0,4,'学历')
excel.write(0,5,'手机号码')
excel.write(0,6,'照片上传情况')
excel.write(0,7,'介绍')
excel.write(0,8,'要求')#第一行添加表头
for row in range(1,st.nrows):excel.write(row,0,st.cell_value(row,6))#写入姓名excel.write(row,1,st.cell_value(row,2)[0:-1])#原始数据为'xxx秒'，为了方便以后筛选，在这去掉最后一个字符，故设置为取0:-1temp_college=st.cell_value(row,9)#原始学院名输入for keyword in dictionary:if keyword  in temp_college:excel.write(row,2,dictionary[keyword])#将在字典中查到的完整学院名写入excel.write(row,3,st.cell_value(row,7))#写入性别excel.write(row,4,st.cell_value(row,8))#写入学历excel.write(row,5,st.cell_value(row,10))#写入手机号码if st.cell_value(row,11)=='(空)':#如果照片未上传，则写为0，否则写入1excel.write(row,6,0)else:excel.write(row,6,1)excel.write(row,7,st.cell_value(row,12))#写入介绍excel.write(row,8,st.cell_value(row,13))#写入要求

同理，制作第二张表，用于到时候在群里发布，方便他们根据表内容自行匹配，所以对手机号码、性别进行了部分隐藏。为了阅读方便，用jieba对他们写的介绍和要求进行了核心词的提取：

import jieba.posseg as posseg
excel2=workspace.add_sheet('报名表简洁版',cell_overwrite_ok=True)
excel2.write(0,0,'姓名')
excel2.write(0,1,'填表时长/s')
excel2.write(0,2,'学院')
excel2.write(0,3,'性别')
excel2.write(0,4,'学历')
excel2.write(0,5,'手机号码')
excel2.write(0,6,'照片上传情况')
excel2.write(0,7,'介绍')
excel2.write(0,8,'要求')word_type=['n','nz','ns','vn','v','a','an']#保留的词性
for row in range(1,st.nrows):temp_name=st.cell_value(row,6)if len(temp_name)==2:write_name=temp_name[0]+'*'elif len(temp_name)==3:write_name=temp_name[0]+'*'+temp_name[2]else:write_name=temp_name[0]+'**'+temp_name[3]excel2.write(row,0,write_name)#对姓名的第一个字改为*，四字名字的第二三个字改为*excel2.write(row,1,st.cell_value(row,2)[0:-1])temp_college=st.cell_value(row,9)for keyword in dictionary:if keyword  in temp_college:excel2.write(row,2,dictionary[keyword])excel2.write(row,3,st.cell_value(row,7))excel2.write(row,4,st.cell_value(row,8))excel2.write(row,5,str(st.cell_value(row,10)[:3])+'****'+str(st.cell_value(row,10)[-4:]))#手机号码的中间四位改为*if st.cell_value(row,11)=='(空)':excel2.write(row,6,0)else:excel2.write(row,6,1)words_info = posseg.cut(st.cell_value(row,12))words_demm = posseg.cut(st.cell_value(row,13))#对他们的介绍和要求进行词性切分info=''demm=''for word, flag in words_info:if flag in word_type:info+=word#遍历切分后的每一个词word和他的词性flag，如果词性满足要求，则添加改词for word, flag in words_demm:if flag in word_type:demm+=wordexcel2.write(row,7,info)excel2.write(row,8,demm)

部分相关词性如下：

a	形容词
an	名形词
d	副词
e	叹词
m	数量词
n	名词
ns	地名
nr	人名
nt	机构名
nz	其他名
v	动词
vn	动名词

计算statistic的第3-6列，并写入第三张表作为写入每个学院的数据统计：

excel3=workspace.add_sheet('数据统计',cell_overwrite_ok=True)
excel3.write(0,0,'学院名')
excel3.write(0,1,'报名人数')
excel3.write(0,2,'总占比')
excel3.write(0,3,'男生报名数')
excel3.write(0,4,'男生占比')
excel3.write(0,5,'女生报名数')
excel3.write(0,6,'女生占比')
excel3.write(0,7,'男女比例')
number=1
for college in mapping:excel3.write(number,0,college)#第三张表写入学院名boy=statistic[mapping[college],0];#获得该学院男生报名数，之前计算得girl=statistic[mapping[college],1];print(college+'总报名人数'+str(boy+girl))excel3.write(number,1,str(boy+girl))#第二列写入总数excel3.write(number,3,str(boy))#第四列写入男生人数excel3.write(number,5,str(girl))#第六列写入女生人数statistic[mapping[college],2]=round((boy+girl)/st.nrows,5)#储存数据在变量statistic里print('占'+str(statistic[mapping[college],2]))statistic[mapping[college],3]=round(boy/191,3)#statistic[mapping[college],4]=round(girl/384,3)if girl==0:girl=1#在计算男女比例时可能出现分母（女生为0）的情况，故设置为1statistic[mapping[college],5]=round(boy/girl,3)excel3.write(number,2,str(100*statistic[mapping[college],2])+'%')#分别百分数写入占比excel3.write(number,4,str(100*statistic[mapping[college],3])+'%')excel3.write(number,6,str(100*statistic[mapping[college],4])+'%')excel3.write(number,7,str(statistic[mapping[college],5]))print('男生占'+str(statistic[mapping[college],3]))print('女生占'+str(statistic[mapping[college],4]))print('男女比例'+str(statistic[mapping[college],5]))   number+=1

最后保存表（内含刚刚写入的excel、excel2、excel3）

workspace.save('报名表.xls')

完整代码：

# -*- coding: utf-8 -*-
"""
Created on Sat Nov 16 17:20:04 2019@author: 71405
"""import xlrd
import re
import numpy as np
college_name=xlrd.open_workbook(r'collegename.xls')
cn=college_name.sheet_by_index(0)
dictionary=dict()
mapping=dict()
map_number=0
start_row=1;
row=start_row;
while row<cn.nrows:#cn.nrows:行数wordset=re.split('  ',cn.cell_value(row,1))for word in wordset:   dictionary[word]=cn.cell_value(row,0)if cn.cell_value(row,0) not in mapping:mapping[cn.cell_value(row,0)]=map_numbermap_number=map_number+1row=row+1signtable=xlrd.open_workbook(r'联谊活动报名表.xls')
st=signtable.sheet_by_index(0)
row=start_row;
statistic=np.zeros([len(mapping),6]);
while row<st.nrows:college=st.cell_value(row,9);find=0;for keyword in dictionary:if keyword  in college:find=1;statistic[mapping[dictionary[keyword]],int(st.cell_value(row,7)-1)]+=1breakif find==0:print(college+'can not find')row=row+1import matplotlib.pyplot as plt
labels = [str(i) for i in range((len(mapping)))]
sizes = [(i[0]+i[1])for i in statistic]plt.pie(sizes, labels=labels, autopct='%1.1f%%',shadow=True, startangle=90)
plt.axis('equal')
plt.show()import jieba.posseg as posseg
import xlwt
workspace=xlwt.Workbook(encoding='ascii')
excel=workspace.add_sheet('报名表完整版',cell_overwrite_ok=True)
excel.write(0,0,'姓名')
excel.write(0,1,'填表时长/s')
excel.write(0,2,'学院')
excel.write(0,3,'性别')
excel.write(0,4,'学历')
excel.write(0,5,'手机号码')
excel.write(0,6,'照片上传情况')
excel.write(0,7,'介绍')
excel.write(0,8,'要求')
for row in range(1,st.nrows):excel.write(row,0,st.cell_value(row,6))excel.write(row,1,st.cell_value(row,2)[0:-1])temp_college=st.cell_value(row,9)for keyword in dictionary:if keyword  in temp_college:excel.write(row,2,dictionary[keyword])excel.write(row,3,st.cell_value(row,7))excel.write(row,4,st.cell_value(row,8))excel.write(row,5,st.cell_value(row,10))if st.cell_value(row,11)=='(空)':excel.write(row,6,0)else:excel.write(row,6,1)excel.write(row,7,st.cell_value(row,12))excel.write(row,8,st.cell_value(row,13))
excel2=workspace.add_sheet('报名表简洁版',cell_overwrite_ok=True)
excel2.write(0,0,'姓名')
excel2.write(0,1,'填表时长/s')
excel2.write(0,2,'学院')
excel2.write(0,3,'性别')
excel2.write(0,4,'学历')
excel2.write(0,5,'手机号码')
excel2.write(0,6,'照片上传情况')
excel2.write(0,7,'介绍')
excel2.write(0,8,'要求')word_type=['n','nz','ns','vn','v','a','an']
for row in range(1,st.nrows):temp_name=st.cell_value(row,6)if len(temp_name)==2:write_name=temp_name[0]+'*'elif len(temp_name)==3:write_name=temp_name[0]+'*'+temp_name[2]else:write_name=temp_name[0]+'**'+temp_name[3]excel2.write(row,0,write_name)excel2.write(row,1,st.cell_value(row,2)[0:-1])temp_college=st.cell_value(row,9)for keyword in dictionary:if keyword  in temp_college:excel2.write(row,2,dictionary[keyword])excel2.write(row,3,st.cell_value(row,7))excel2.write(row,4,st.cell_value(row,8))excel2.write(row,5,str(st.cell_value(row,10)[:3])+'****'+str(st.cell_value(row,10)[-4:]))if st.cell_value(row,11)=='(空)':excel2.write(row,6,0)else:excel2.write(row,6,1)words_info = posseg.cut(st.cell_value(row,12))words_demm = posseg.cut(st.cell_value(row,13))info=''demm=''for word, flag in words_info:if flag in word_type:info+=wordfor word, flag in words_demm:if flag in word_type:demm+=wordexcel2.write(row,7,info)excel2.write(row,8,demm)excel3=workspace.add_sheet('数据统计',cell_overwrite_ok=True)
excel3.write(0,0,'学院名')
excel3.write(0,1,'报名人数')
excel3.write(0,2,'总占比')
excel3.write(0,3,'男生报名数')
excel3.write(0,4,'男生占比')
excel3.write(0,5,'女生报名数')
excel3.write(0,6,'女生占比')
excel3.write(0,7,'男女比例')
number=1
for college in mapping:excel3.write(number,0,college)boy=statistic[mapping[college],0];girl=statistic[mapping[college],1];print(college+'总报名人数'+str(boy+girl))excel3.write(number,1,str(boy+girl))excel3.write(number,3,str(boy))excel3.write(number,5,str(girl))statistic[mapping[college],2]=round((boy+girl)/st.nrows,5)print('占'+str(statistic[mapping[college],2]))statistic[mapping[college],3]=round(boy/191,3)statistic[mapping[college],4]=round(girl/384,3)if girl==0:girl=1statistic[mapping[college],5]=round(boy/girl,3)excel3.write(number,2,str(100*statistic[mapping[college],2])+'%')excel3.write(number,4,str(100*statistic[mapping[college],3])+'%')excel3.write(number,6,str(100*statistic[mapping[college],4])+'%')excel3.write(number,7,str(statistic[mapping[college],5]))print('男生占'+str(statistic[mapping[college],3]))print('女生占'+str(statistic[mapping[college],4]))print('男女比例'+str(statistic[mapping[college],5]))   number+=1
workspace.save('报名表.xls')

———————————————————————我是分割线————————————————————————————

在进行分析、处理完后，由于报名人数众多，无法一个一个筛选时，又编写了筛选算法：

通过四个条件控制：填表时长、是否上传照片、是否写介绍、是否写要求，一般地，这四个维度的数据就能表明该同学对该活动是否热情、认真，比如某同学用了20s填完，三个可填的都没填，太过于潦草，而有的同学写了很多内容，照片也上传了，填表时长100多秒（数据中最长的填表时间为2600s，大概是20多分钟，一看就认真！）。


import xlrd
import xlwt
workspace=xlwt.Workbook(encoding='ascii')
excel=workspace.add_sheet('报名表筛选版',cell_overwrite_ok=True)
select_list=[1,1,1,1]#分别代表时间/照片/介绍/要求为空时是否筛选
col_num=[1,6,7,8]#四个属性所在列号
time_thre=100#阈值table=xlrd.open_workbook(r'报名表.xls')#打开之前所写入完成的表的第一张
t=table.sheet_by_index(0)remain=1#保留的行号
for i in range(t.ncols):#t.ncols:t的列数excel.write(0,i,t.cell_value(0,i))#老规矩，第一行复制表头for row in range(1,t.nrows):state=True#表示是否保留if select_list[0]==1:if int(t.cell_value(row,col_num[0]))<time_thre:state=False#填写时长小于阈值的置为不保留for i in range(1,4):if select_list[i]==1:#是否分别开启照片/介绍/要求非空筛选if t.cell_value(row,col_num[i])=='0' or  t.cell_value(row,col_num[i])==0  or t.cell_value(row,col_num[i])=='(空)'  or t.cell_value(row,col_num[i])=='无':state=False#如果对应字符串为无、空、0的任意字符，则不保留if state:for i in range(t.ncols):excel.write(remain,i,t.cell_value(row,i))#如果以上都通过了则写入新表remain+=1workspace.save('报名表筛选版.xls')

Python活动报名表的分析、处理和筛选相关推荐

2016 CCF大数据与计算智能大赛——活动报名表
[关于 | 2016 CCF大数据与计算智能大赛 ] 由中国计算机学会主办,教育部易班发展中心.CCF大数据专家委员会.CCF高性能计算专业委员会.CCF中文信息技术专业委员会.CCF数据库专业委员会 ...
2023基于微信小程序的大学生社团活动报名管理系统(SSM+mysql)-JAVA.VUE(论文+开题报告+运行)
摘要随着信息技术在管理上越来越深入而广泛的应用,管理信息系统的实施在技术上已逐步成熟.本文介绍了基于大学生社团活动管理的微信小程序的开发全过程.通过分析大学生社团活动管理的不足,创建了一个计算机管理 ...
2023基于微信小程序的高校暑期社会实践爱心捐物活动报名宣传平台(SSM+mysql)-JAVA.VUE(论文+开题报告+运行)
摘要如今的信息时代,对信息的共享性,信息的流通性有着较高要求,因此传统管理方式就不适合.为了让管理模式进行升级,也为了更好的维护信息,高校暑期社会实践微信小程序的开发运用就显得很有必要.并且通过开 ...
【Java实训】基于Swing开发的校园活动报名管理系统【附完整报告+演示视频+源码】
为了冲个官方认证新星博主,发点库存文章目录一.需求分析二.系统软件开发环境 1.Eclipse 2 Navicat Premium 3 TencentDB for MySQL 三.总系统流程图 ...
python+nodejs+vue社区志愿者活动报名服务管理系统源码
随着社会的发展,社会的各行各业都在利用信息化时代的优势.计算机的优势和普及使得各种信息系统的开发成为必需. 社区志愿者服务管理系统,主要的模块包括查看首页.个人中心.通知公告管理.志愿者管理.普通管理 ...
python+django高校志愿者活动报名系统vue+elementui
校园志愿者系统是基于python编程语言,mysql数据库,django框架,pycharm开发工具进行开发,本系统主要分为志愿者和管理员两个角色,其中志愿者的主要功能是查看系统公告,活动信息,在线报 ...
python实操100例乘法表_Python编程快速上手——Excel表格创建乘法表案例分析
本文实例讲述了Python Excel表格创建乘法表.分享给大家供大家参考,具体如下: 题目如下: 创建程序multiplicationTable.py,从命令行接受数字N,在一个Excel电子表格中 ...
2023基于微信小程序的校园第二课堂活动报名系统+后台管理系统(Springboot+mysql)-JAVA.VUE(论文+开题报告+运行)
摘要随着信息技术和网络技术的飞速发展,人类已进入全新信息化时代,传统管理技术已无法高效,便捷地管理信息.为了迎合时代需求,优化管理效率,各种各样的管理系统应运而生,各行各业相继进入信息管理时代,基 ...
vue+springboot+java志愿者活动报名网站系统maven源码
志愿者招募网站,在网站首页可以查看首页,组织信息,志愿活动,新闻资讯,个人中心,后台管理等内容,并进行详细操作,组织信息,在组织信息页面可以查看组织名称,组织编号,组织宣言,负责人,联系电话等内容,并 ...

Python活动报名表的分析、处理和筛选

Python活动报名表的分析、处理和筛选相关推荐

最新文章

热门文章