上次将数据训练了模型

由于数据中的大多数候选人都有70%以上的机会,许多不成功的候选人都没有很好的预测。

df["Chance of Admit"].plot(kind = 'hist',bins = 200,figsize = (6,6))
plt.title("Chance of Admit")
plt.xlabel("Chance of Admit")
plt.ylabel("Frequency")
plt.show()


为分类准备数据

如果候选人的录取机会大于80%,则该候选人将获得1个标签。
如果候选人的录取机会小于或等于80%,则该候选人将获得0标签。

# reading the dataset
df = pd.read_csv("../input/Admission_Predict.csv",sep = ",")# it may be needed in the future.
serialNo = df["Serial No."].values
df.drop(["Serial No."],axis=1,inplace = True)y = df["Chance of Admit"].values
x = df.drop(["Chance of Admit"],axis=1)# separating train (80%) and test (%20) sets
from sklearn.model_selection import train_test_split
x_train, x_test,y_train, y_test = train_test_split(x,y,test_size = 0.20,random_state = 42)# normalization
from sklearn.preprocessing import MinMaxScaler
scalerX = MinMaxScaler(feature_range=(0, 1))
x_train[x_train.columns] = scalerX.fit_transform(x_train[x_train.columns])
x_test[x_test.columns] = scalerX.transform(x_test[x_test.columns])y_train_01 = [1 if each > 0.8 else 0 for each in y_train]
y_test_01  = [1 if each > 0.8 else 0 for each in y_test]# list to array
y_train_01 = np.array(y_train_01)
y_test_01 = np.array(y_test_01)

逻辑回归

from sklearn.linear_model import LogisticRegression
lrc = LogisticRegression()
lrc.fit(x_train,y_train_01)
print("score: ", lrc.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(lrc.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(lrc.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_lrc = confusion_matrix(y_test_01,lrc.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_lrc,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,lrc.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,lrc.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,lrc.predict(x_test)))

score: 0.9
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]


precision_score: 0.9565217391304348
recall_score: 0.7586206896551724
f1_score: 0.8461538461538461

Test for Train Dataset:

cm_lrc_train = confusion_matrix(y_train_01,lrc.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_lrc_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()

SVC

from sklearn.svm import SVC
svm = SVC(random_state = 1)
svm.fit(x_train,y_train_01)
print("score: ", svm.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(svm.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(svm.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_svm = confusion_matrix(y_test_01,svm.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_svm,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,svm.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,svm.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,svm.predict(x_test)))

score: 0.9
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]

precision_score: 0.9565217391304348
recall_score: 0.7586206896551724
f1_score: 0.8461538461538461

Test for Train Dataset

cm_svm_train = confusion_matrix(y_train_01,svm.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_svm_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()

朴素贝叶斯

from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(x_train,y_train_01)
print("score: ", nb.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(nb.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(nb.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_nb = confusion_matrix(y_test_01,nb.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29
# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_nb,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,nb.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,nb.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,nb.predict(x_test)))

score: 0.9625
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]


precision_score: 0.9333333333333333
recall_score: 0.9655172413793104
f1_score: 0.9491525423728815

Test for Train Dataset:

cm_nb_train = confusion_matrix(y_train_01,nb.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_nb_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()

决策树

from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(x_train,y_train_01)
print("score: ", dtc.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(dtc.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(dtc.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_dtc = confusion_matrix(y_test_01,dtc.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_dtc,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,dtc.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,dtc.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,dtc.predict(x_test)))

score: 0.9375
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]

precision_score: 0.9615384615384616
recall_score: 0.8620689655172413
f1_score: 0.9090909090909091

Test for Train Dataset

cm_dtc_train = confusion_matrix(y_train_01,dtc.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_dtc_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()

随机森林

from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators = 100,random_state = 1)
rfc.fit(x_train,y_train_01)
print("score: ", rfc.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(rfc.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(rfc.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_rfc = confusion_matrix(y_test_01,rfc.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29
# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_rfc,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,rfc.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,rfc.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,rfc.predict(x_test)))

score: 0.9375
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]

precision_score: 0.9615384615384616
recall_score: 0.8620689655172413
f1_score: 0.9090909090909091

Test for Train Dataset

cm_rfc_train = confusion_matrix(y_train_01,rfc.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_rfc_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()

kNN

from sklearn.neighbors import KNeighborsClassifier# finding k value
scores = []
for each in range(1,50):knn_n = KNeighborsClassifier(n_neighbors = each)knn_n.fit(x_train,y_train_01)scores.append(knn_n.score(x_test,y_test_01))plt.plot(range(1,50),scores)
plt.xlabel("k")
plt.ylabel("accuracy")
plt.show()knn = KNeighborsClassifier(n_neighbors = 3) # n_neighbors = k
knn.fit(x_train,y_train_01)
print("score of 3 :",knn.score(x_test,y_test_01))
print("real value of y_test_01[1]: " + str(y_test_01[1]) + " -> the predict: " + str(knn.predict(x_test.iloc[[1],:])))
print("real value of y_test_01[2]: " + str(y_test_01[2]) + " -> the predict: " + str(knn.predict(x_test.iloc[[2],:])))# confusion matrix
from sklearn.metrics import confusion_matrix
cm_knn = confusion_matrix(y_test_01,knn.predict(x_test))
# print("y_test_01 == 1 :" + str(len(y_test_01[y_test_01==1]))) # 29# cm visualization
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_knn,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.title("Test for Test Dataset")
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.show()from sklearn.metrics import precision_score, recall_score
print("precision_score: ", precision_score(y_test_01,knn.predict(x_test)))
print("recall_score: ", recall_score(y_test_01,knn.predict(x_test)))from sklearn.metrics import f1_score
print("f1_score: ",f1_score(y_test_01,knn.predict(x_test)))


score of 3 : 0.9375
real value of y_test_01[1]: 0 -> the predict: [0]
real value of y_test_01[2]: 1 -> the predict: [1]


precision_score: 0.9285714285714286
recall_score: 0.896551724137931
f1_score: 0.912280701754386

Test for Train Dataset:

cm_knn_train = confusion_matrix(y_train_01,knn.predict(x_train))
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm_knn_train,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("predicted y values")
plt.ylabel("real y values")
plt.title("Test for Train Dataset")
plt.show()


所有分类算法都取得了大约90%的成功。最成功的是高斯朴素贝叶斯,得分为96%。

y = np.array([lrc.score(x_test,y_test_01),svm.score(x_test,y_test_01),nb.score(x_test,y_test_01),dtc.score(x_test,y_test_01),rfc.score(x_test,y_test_01),knn.score(x_test,y_test_01)])
#x = ["LogisticRegression","SVM","GaussianNB","DecisionTreeClassifier","RandomForestClassifier","KNeighborsClassifier"]
x = ["LogisticReg.","SVM","GNB","Dec.Tree","Ran.Forest","KNN"]plt.bar(x,y)
plt.title("Comparison of Classification Algorithms")
plt.xlabel("Classfication")
plt.ylabel("Score")
plt.show()


上文是回归算法,此文分类

kaggle研究生招生(中)相关推荐

  1. 【模糊数学】编程实现文献“研究生招生中的模糊聚类分析方法”

    模糊数学是一门很有用的应用型数学,之前做手势识别时,曾看到过NUS使用模糊数学做的手势检测.本文是"模糊数学"课程作业的总结,使用matlab编程实现,在此记录下来以备之后的学习. ...

  2. kaggle研究生招生(上)

    每天逛 kaggle https://www.kaggle.com/mohansacharya/graduate-admissions 看来这个也是非常出名的数据集 GRE分数(290至340) 托福 ...

  3. kaggle研究生招生(下)

    对于该数据先采用回归算法,再转为分类算法,这次使用聚类算法 聚类算法(无监督机器学习算法) df = pd.read_csv("../input/Admission_Predict.csv& ...

  4. 中科大计算机招非全日制,中国科技大学有非全日制研究生招生吗?

    [导读]非全日制研究生是一边工作一边上课的一种研究生学历教育,可以获得双证.那么中国科技大学有非全日制研究生招生吗? 根据在职研究生考试网老师的介绍,中国科技大学是有非全日制研究生招生的.以下是中国科 ...

  5. 数学二英语一计算机学校,研究生招生考试中英语一和英语二,数学一、数学二和数学三有什么区别、侧重点?听说英语一比英语二简单,数学一是计算机类专业考的等等,是这样的情况吗?...

    优质解答 英语二比英语一简单 一.学术型研究生初试考英语一 专业学位研究生初试考英语二 (一)学术型研究生 学术型硕士研究生入学考试科目设置办法要求与2009年相同.除教育学.历史学.医学门类设置三个 ...

  6. 研究生招生多次被“放鸽子”:给学生几点诚信方面的建议

    点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达 来源:张吴明科学网博客 研究生招生已经数次遇到放鸽子的情况,学生在联 ...

  7. 海南大学2020年申请考核博士研究生招生工作办法

    海南大学2020年申请考核博士研究生招生工作办法 时间: 2020/6/19 18:14:00 来源:研究生处 为建立与国际接轨的博士研究生招生选拔制度,鼓励拔尖创新型人才脱颖而出,进一步提高海南大学 ...

  8. 川大计算机考研2020招生数,四川大学等大学,2020年研究生招生简章发布,这3个信息很重要!...

    随着考研预报名临近,各个大学都相继发布了<2020年硕士研究生招生简章>(以下简称招生简章),可能很多同学觉得这个只是走一个流程,没什么用,或者还不知道如何利用好它,其实,招生简章在我们报 ...

  9. 2022年全国硕士研究生招生国家线公布

    今日,教育部公布了2022年全国硕士研究生招生考试考试进入复试的初试成绩基本要求,即2022年考研国家复试线,划线如下 单位要在复试工作中,采取"两识别""四比对&quo ...

最新文章

  1. unity加载sprite_Unity 利用UGUI打包图集,动态加载sprite资源
  2. UA MATH523A 实分析3 积分理论例题 证明函数列L1收敛的一个题目
  3. 用Hibernate Tools生成Hibernate Mapping映射文件
  4. 关于WEB ServiceWCFWebApi实现身份验证之WebApi篇
  5. 33 | 关于 Linux 网络,你必须知道这些(上)
  6. Python矩阵计算
  7. jdk lambda表达式的坑
  8. freemarker自动生成html页面,利用Freemarker生成html静态页面_html/css_WEB-ITnose
  9. Java中字符串的全部知识_java基础教程之字符串的介绍,比较重要的一个知识点「中」...
  10. 一文7个步骤教你搭建测试web测试项目实战环境,
  11. 讨伐Zookeeper
  12. python中字符串的使用04字符串大小写转换、删除空白字符
  13. 浅析错误:software IO TLB: coherent allocation failed for device
  14. 磁滞回线如何用计算机画图,利用Origin8.5软件简化磁滞回线数据处理
  15. SpringCloud Alibaba 从入门到精通(精选)
  16. matlab课程报告锅炉水温在10,基于matlab的锅炉水温
  17. Oracle字节、字符的截取
  18. 20230408英语学习
  19. HTML5 video视频播放
  20. 基于CNN的手写数字识别

热门文章

  1. 启明云端基于sigmastarSSD201/202核心板\开发板资料分享地址,另外还可以加入技术沟通群聊,及时解决相关技术问题!
  2. java怎么用return代替else_java – 从一个隐含或明确的“else”方法返回,还是用一个“return”语句返回?...
  3. 银行数字化转型指南:《区域性银行数字化转型白皮书》完整版重磅发布
  4. java getmethod类_Java getMethod类型参数
  5. vue 部门tree样式_vue+Element实现tree树形数据展示
  6. 获取某个时间开始 之后的 N次[周几,周几]
  7. 小程序动态设置style,使用内部数据
  8. 实验七——函数定义及调用总结
  9. python tips(持续更新)
  10. CCSpriteBatchNode的使用