上一期文章:「12」你们啊,naive!——朴素贝叶斯谈笑录 中,我们剖析了朴素贝叶斯算法的本质和特点以及贝叶斯学派的一些知识。这里我们用python代码进行Naive Bayes算的的实现。第1部分是计算打喷嚏的建筑工人患上新冠肺炎的概率,第2部分是上一期文章中提到的西瓜分类实战项目。

实战项目一、计算打喷嚏的建筑工人患病的概率有多大?


class NBClassify(object):def __init__(self, fillNa = 1):self.fillNa = 1passdef train(self, trainSet):# 计算每种类别的概率
# 保存所有tag的所有种类,及它们出现的频次dictTag = {}for subTuple in trainSet:dictTag[str(subTuple[1])] = 1 if str(subTuple[1]) not in dictTag.keys() else                 dictTag[str(subTuple[1])] + 1# 保存每个tag本身的概率tagProbablity = {}
totalFreq = sum([value for value in dictTag.values()])for key, value in dictTag.items():
tagProbablity[key] = value / totalFreq# print(tagProbablity)self.tagProbablity = tagProbablity############################################################################### 计算特征的条件概率# 保存特征属性基本信息{特征1:{值1:出现5次, 值2:出现1次}, 特征2:{值1:出现1次, 值2:出现5次}}dictFeaturesBase = {}for subTuple in trainSet:
for key, value in subTuple[0].items():
if key not in dictFeaturesBase.keys():
dictFeaturesBase[key] = {value:1}
else:
if value not in dictFeaturesBase[key].keys():
dictFeaturesBase[key][value] = 1
else:
dictFeaturesBase[key][value] += 1# dictFeaturesBase = {# '职业': {'农夫': 1, '教师': 2, '建筑工人': 2, '**': 1},
# '症状': {'打喷嚏': 3, '头痛': 3}
# }dictFeatures = {}.fromkeys([key for key in dictTag])for key in dictFeatures.keys():dictFeatures[key] = {}.fromkeys([key for key in dictFeaturesBase])
for key, value in dictFeatures.items():
for subkey in value.keys():
value[subkey] = {}.fromkeys([x for x in dictFeaturesBase[subkey].keys()])# dictFeatures = {# '感冒 ': {'症状': {'打喷嚏': None, '头痛': None}, '职业': {'**': None, '农夫': None, '建筑工人': None, '教师': None}},
# '脑震荡': {'症状': {'打喷嚏': None, '头痛': None}, '职业': {'**': None, '农夫': None, '建筑工人': None, '教师': None}},
# '过敏 ': {'症状': {'打喷嚏': None, '头痛': None}, '职业': {'**': None, '农夫': None, '建筑工人': None, '教师': None}}
# }# initialise dictFeaturesfor subTuple in trainSet:
for key, value in subTuple[0].items():
dictFeatures[subTuple[1]][key][value] = 1
ifdictFeatures[subTuple[1]][key][value] == None
else
dictFeatures[subTuple[1]][key][value] + 1# print(dictFeatures)# 将样本中没有的项目,由None改为一个非常小的数值,表示其概率极小而并非是零for tag, featuresDict in dictFeatures.items():
for featureName, fetureValueDict in featuresDict.items():
for featureKey, featureValues in fetureValueDict.items():
if featureValues == None:
fetureValueDict[featureKey] = 1# 由特征频率计算特征的条件概率P(feature|tag)for tag, featuresDict in dictFeatures.items():
for featureName, fetureValueDict in featuresDict.items():totalCount = sum([x for x in fetureValueDict.values() if x != None])for featureKey, featureValues in fetureValueDict.items():
fetureValueDict[featureKey] = featureValues/totalCount
if
featureValues != None
else
Noneself.featuresProbablity = dictFeatures##############################################################################def classify(self, featureDict):resultDict = {}# 计算每个tag的条件概率for key, value in self.tagProbablity.items():iNumList = []for f, v in featureDict.items():
if self.featuresProbablity[key][f][v]:
iNumList.append(self.featuresProbablity[key][f][v])conditionPr = 1for iNum in iNumList:
conditionPr *= iNum
resultDict[key] = value * conditionPr# 对比每个tag的条件概率的大小resultList = sorted(resultDict.items(), key=lambda x:x[1], reverse=True)
return resultList[0][0]if __name__ == '__main__':trainSet = [({"症状":"打喷嚏", "职业":"**"}, "感冒 "),({"症状":"打喷嚏", "职业":"农夫"}, "过敏 "),({"症状":"头痛", "职业":"建筑工人"}, "脑震荡"),({"症状":"头痛", "职业":"建筑工人"}, "感冒 "),({"症状":"打喷嚏", "职业":"教师"}, "感冒 "),({"症状":"头痛", "职业":"教师"}, "脑震荡"),]trainSet = [({"age":"youth", "收入":"高","学生":"no","信用":"fair"}, "不买"),({"age":"youth", "收入":"高","学生":"no","信用":"excellent"}, "不买"),({"age":"midden_aged", "收入":"高","学生":"no","信用":"fair"}, "买"),({"age":"senior", "收入":"中等","学生":"no","信用":"fair"}, "买"),({"age":"senior", "收入":"低","学生":"yes","信用":"fair"}, "买"),({"age":"senior", "收入":"低","学生":"yes","信用":"excellent"}, "不买"),({"age":"midden_aged", "收入":"低","学生":"yes","信用":"excellent"}, "买"),({"age":"youth", "收入":"中等","学生":"no","信用":"fair"}, "不买"),({"age":"youth", "收入":"低","学生":"yes","信用":"fair"}, "买"),({"age":"senior", "收入":"中等","学生":"yes","信用":"fair"}, "买"),({"age":"youth", "收入":"中等","学生":"yes","信用":"excellent"}, "买"),({"age":"midden_aged", "收入":"中等","学生":"no","信用":"excellent"}, "买"),({"age":"midden_aged", "收入":"高","学生":"yes","信用":"fair"}, "买"),({"age":"senior", "收入":"中等","学生":"no","信用":"excellent"}, "不买")]monitor = NBClassify()# trainSet is something like that [(featureDict, tag), ]monitor.train(trainSet)# 打喷嚏的建筑工人
# 请问他患上肺炎的概率有多大?# result = monitor.classify({"症状":"头痛", "职业":"教师"})result = monitor.classify({"age":"midden_aged", "收入":"高","学生":"yes","信用":"excellent"})print(result)

实战项目二、西瓜判断

import numpy as np
import pandas as pddataset = pd.read_csv('watermelon_3.csv', delimiter=",")
del dataset['编号']
print(dataset)
X = dataset.values[:, :-1]
m, n = np.shape(X)
for i in range(m):X[i, n - 1] = round(X[i, n - 1], 3)X[i, n - 2] = round(X[i, n - 2], 3)
y = dataset.values[:, -1]
columnName = dataset.columns
colIndex = {}
for i in range(len(columnName)):colIndex[columnName[i]] = iPmap = {}  # 函数P很耗时间,而且经常会求一样的东西,因此我加了个记忆化搜索,用map存一下,避免重复计算
kindsOfAttribute = {}  # kindsOfAttribute[0] = 3,因为有3种不同的类型的"色泽"
for i in range(n):kindsOfAttribute[i] = len(set(X[:, i]))
continuousPara = {}  # 记忆一些参数的连续数据,以避免重复计算goodList = []
badList = []
for i in range(len(y)):if y[i] == '是':goodList.append(i)else:badList.append(i)import mathdef P(colID, attribute, C):  # P(colName=attribute|C) P(色泽=青绿|是)if (colID, attribute, C) in Pmap:return Pmap[(colID, attribute, C)]curJudgeList = []if C == '是':curJudgeList = goodListelse:curJudgeList = badListans = 0if colID >= 6:mean = 1std = 1if (colID, C) in continuousPara:curPara = continuousPara[(colID, C)]mean = curPara[0]std = curPara[1]else:curData = X[curJudgeList, colID]mean = curData.mean()std = curData.std()# print(mean,std)continuousPara[(colID, C)] = (mean, std)ans = 1 / (math.sqrt(math.pi * 2) * std) * math.exp((-(attribute - mean) ** 2) / (2 * std * std))else:for i in curJudgeList:if X[i, colID] == attribute:ans += 1ans = (ans + 1) / (len(curJudgeList) + kindsOfAttribute[colID])Pmap[(colID, attribute, C)] = ans# print(ans)return ansdef predictOne(single):ansYes = math.log2((len(goodList) + 1) / (len(y) + 2))  ansNo = math.log2((len(badList) + 1) / (len(y) + 2))for i in range(len(single)):  # 书上是连乘,但在实践中要把“连乘”通过取对数的方式转化为“连加”以避免数值下溢ansYes += math.log2(P(i, single[i], '是'))ansNo += math.log2(P(i, single[i], '否'))# print(ansYes,ansNo,math.pow(2,ansYes),math.pow(2,ansNo))if ansYes > ansNo:return '是'else:return '否'def predictAll(iX):predictY = []for i in range(m):predictY.append(predictOne(iX[i]))return predictYpredictY = predictAll(X)
print(y)
print(np.array(predictAll(X)))confusionMatrix = np.zeros((2, 2))
for i in range(len(y)):if predictY[i] == y[i]:if y[i] == '否':confusionMatrix[0, 0] += 1else:confusionMatrix[1, 1] += 1else:if y[i] == '否':confusionMatrix[0, 1] += 1else:confusionMatrix[1, 0] += 1
print(confusionMatrix)

「13」朴素贝叶斯Python实战:计算打喷嚏的工人患病的概率相关推荐

  1. python程序员买西瓜代码_朴素贝叶斯python代码实现(西瓜书)

    本文将要为您介绍的是朴素贝叶斯python代码实现(西瓜书),具体完成步骤:朴素贝叶斯python代码实现(西瓜书) 摘要: 朴素贝叶斯也是机器学习中一种非常常见的分类方法,对于二分类问题,并且数据集 ...

  2. 基于朴素贝叶斯+Python实现垃圾邮件分类和结果分析

    基于朴素贝叶斯+Python实现垃圾邮件分类 朴素贝叶斯原理 请参考: 贝叶斯推断及其互联网应用(二):过滤垃圾邮件 Python实现 源代码主干来自: python实现贝叶斯推断--垃圾邮件分类 我 ...

  3. 学习笔记——Kaggle_Digit Recognizer (朴素贝叶斯 Python实现)

    本文是个人学习笔记,该篇主要学习朴素贝叶斯算法概念,并应用sklearn.naive_bayes算法包解决Kaggle入门级Digit Recognizer. 贝叶斯定理 朴素贝叶斯 Python 代 ...

  4. 数据挖掘十大算法(九):朴素贝叶斯 python和sklearn实现

    第三个算法终于算是稍有了解了,其实当你结合数据了解了它的实现原理后,你会发现确实很朴素.这里对朴素贝叶斯算法做一个介绍和总结,包括(原理.一个代码示例.sklearn实现),皆为亲自实践后的感悟,下面 ...

  5. 朴素贝叶斯 python 实现

    百度文库 文库2 机器学习实战的朴素贝叶斯的代码太复杂 """ Created on Thu Aug 10 15:08:59 2017@author: luogan &q ...

  6. 朴素贝叶斯python代码_朴素贝叶斯模型及python实现

    1 朴素贝叶斯模型 朴素贝叶斯法是基于贝叶斯定理.特征条件独立假设的分类方法.在预测时,对输入x,找出对应后验概率最大的 y 作为预测. NB模型: 输入: 先验概率分布:P(Y=ck),k=1,2, ...

  7. 朴素贝叶斯python实现预测_Python实现朴素贝叶斯分类器的方法详解

    本文实例讲述了Python实现朴素贝叶斯分类器的方法.分享给大家供大家参考,具体如下: 贝叶斯定理 贝叶斯定理是通过对观测值概率分布的主观判断(即先验概率)进行修正的定理,在概率论中具有重要地位. 先 ...

  8. [转载] 朴素贝叶斯python实现预测_Python实现朴素贝叶斯分类器的方法详解

    参考链接: Python朴素贝叶斯分类器 本文实例讲述了Python实现朴素贝叶斯分类器的方法.分享给大家供大家参考,具体如下: 贝叶斯定理 贝叶斯定理是通过对观测值概率分布的主观判断(即先验概率)进 ...

  9. 朴素贝叶斯 python

    #!/usr/bin/python3 # -*- coding: utf-8 -*- from numpy import * import random import math import oper ...

  10. 朴素贝叶斯+Python3实现高斯朴素贝叶斯

    1. 什么是朴素贝叶斯法 朴素贝叶斯(naive Bayes)法是基于贝叶斯定理与特征条件独立假设的分类方法.对于给定的训练数据集,首先基于特征条件独立假设学习输入输出的联合概率分布:然后基于此模型, ...

最新文章

  1. 2 用python进行OpenCV实战之图像基本知识
  2. C 判断 —— switch语句(多个switch值与一组语句联系起来、case顺序是可任意的,default不一定是最后一个case)
  3. php获取一维,二维数组长度的方法(有实例)
  4. OmniPeek与Sniffer比较区别
  5. python中pos()_python中不带NLTK的POS标记器
  6. 罗永浩直播带货花落谁家?不止是价高者得之
  7. java去除字符串的空格,换行符,水平制表符,回车
  8. CAP-微服务间通信实践
  9. 大量数据+同步+多线程_Vulkan 多线程渲染
  10. java框架ssh实验报告_基于SSH的实验报告提交系统
  11. CentOS 7 最小化安装简单配置
  12. java窗体输入数据怎么存,java编程之计算矩阵对角线和(从对话窗体输入数据)...
  13. Intel IOMMU Introduction
  14. 解决 Android App 上架 Google play后 ,签名变更,第三方sdk无法登录
  15. Virtual Box 打开.vmdk文件
  16. O2OA的流程管理详情
  17. C语言和Bash脚本实现身份证号码尾号验证
  18. 机器学习笔记--微积分
  19. 一次UDP收不到问题排查
  20. html5 localstorage 生命周期,cookie、localStorage和sessionStorage 三者之间的区别以及存储、获取、删除等使用方式...

热门文章

  1. 综合函数矩量法原理及实现思路
  2. R语言使用dlnorm函数生成对数正态分布密度数据、使用plot函数可视化对数正态分布密度数据(logarithmic normal distribution)
  3. C++身份证校验码计算器
  4. 【思维导图】redis详解
  5. 13.罗马数字转整数
  6. 【PAT】PAT那些破事
  7. 从session里面取得值为null
  8. 网络安全——病毒详解以及批处理01(自启动,修改密码 定时关机,蓝屏,进程关闭)
  9. MATLAB电话拨号音仿真,MATLAB电话拨号音的合成与识别
  10. 翡翠手链的寓意是什么?要如何保养它才好!