带有拉普拉斯平滑的Naive Bayes python代码实现

今天，数据分析课讲到了朴素贝叶斯算法，想起以前用MATLAB写过一次，但是时间过得太久，代码早就不知道哪去了，正好最近本人在专攻python，也是为了提升一下python能力，所以一时兴起，在下课之后，以朴素贝叶斯的基本思路，对照着老师课后留的作业题目，自己用python码了码，因为本文只是一个学习笔记作用，也无关于什么算法讲解，再加上时间有限（作业巨多呀），下面就简单说明这个算法中的思路关键点和python实现代码。

首先针对朴素贝叶斯的作用需要明确一下：分类，比如我们可以把它用到文本分类、垃圾文本过滤、情感预测、推荐系统等领域，那么首先，我们在我们依据贝叶斯原理进行分类时，最应该清楚的就是贝叶斯原理：

P(A∣B)=P(B∣A)∗P(A)P(B)P(A|B)=\frac{P(B|A)*P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)∗P(A)

其次呢，要了解到几个关键点是：

独立性、条件独立性概念；
独立≠不相关，一定要明确区分二者，不相关，代表着两个变量之间的正负变化关系；
拉普拉斯平滑，这个平滑处理主要要解决的是：零概率问题的出现
（上次错误的发生也是由于这个导致的，千万注意，分量x的拉普拉斯平滑操作，会导致样本量的变化，从而也就导致了最终右边的分类的先验概率发生变化，所以需要对y进行拉普拉斯平滑操作，我们这么理解，y难道就不会出现零概率的情况了吗？这么想，相信你会很容易理解的）

那么接下来我将以这道例题来进行python代码的演示：

例题：由下表训练集学习一个朴素贝叶斯分类器，并确定x=(2, Small)的类别y。表中X1，X2为特征，取值的集合分别为X1={1, 2, 3}，X2={Small, Medium, Large}，Y为类标记，Y={1, -1}。

表：训练集

ID	X1	X2	Y	ID	X1	X2	Y
1	1	Small	-1	8	2	Medium	1
2	1	Medium	-1	9	2	Large	1
3	1	Medium	1	10	2	Large	1
4	1	Small	1	11	3	Large	1
5	1	Small	-1	12	3	Medium	1
6	2	Small	-1	13	3	Medium	1
7	2	Medium	-1	14	3	Large	1

那么这道题的手算过程我就不演示了，主要看代码部分（由于写的比较仓促，里面的代码不够简洁，但思路还算清晰，可以解决所有关于这道题的情况判断）：

废话不多说,直接贴代码：

import numpy as np
import pandas as pd#  训练集14组（X1,X2,Y）
a = np.array([[1, "Small", -1], [1, "Medium", -1], [1, "Medium", 1],[1, "Small", 1], [1, "Small", -1], [2, "Small", -1],[2, "Medium", -1], [2, "Medium", 1], [2, "Large", 1],[2, "Large", 1], [3, "Large", 1], [3, "Medium", 1],[3, "Medium", 1], [3, "Large", 1]])
b = pd.DataFrame(a, columns=['X1', 'X2', 'Y'])# 多计数器
count_x1 = np.zeros((3, 2))
count_x2 = np.zeros((3, 2))
# 概率值(1,-1)
p_y1 = 0
p_y2 = 0
# 测试集
test = pd.Series({'X1': 2, 'X2': 'Small'})# P(YK|X1,X2)=P(X1|YK)*P(X2|YK)*P(YK)/P(X)
# 计数器统计训练集相应数据
for index, row in b.iterrows():if row['Y'] == "1":if row['X1'] == "1":count_x1[0][0] += 1elif row['X1'] == "2":count_x1[1][0] += 1else:count_x1[2][0] += 1if row['X2'] == "Small":count_x2[0][0] += 1elif row['X2'] == "Medium":count_x2[1][0] += 1else:count_x2[2][0] += 1else:if row['X1'] == "1":count_x1[0][1] += 1elif row['X1'] == "2":count_x1[1][1] += 1else:count_x1[2][1] += 1if row['X2'] == "Small":count_x2[0][1] += 1elif row['X2'] == "Medium":count_x2[1][1] += 1else:count_x2[2][1] += 1# 带有拉普拉斯平滑的朴素贝叶斯计算过程
def train(test):if test['X1'] == 1:if test['X2'] == 'Small':p_y11 = ((count_x1[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Medium':p_y11 = ((count_x1[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Large':p_y11 = ((count_x1[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)elif test['X1'] == 2:if test['X2'] == 'Small':p_y11 = ((count_x1[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Medium':p_y11 = ((count_x1[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Large':p_y11 = ((count_x1[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)else:if test['X2'] == 'Small':p_y11 = ((count_x1[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[0][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[0][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Medium':p_y11 = ((count_x1[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[1][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[1][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)if test['X2'] == 'Large':p_y11 = ((count_x1[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x2[2][0] + 1) / (count_x1.sum(axis=0)[0] + 3)) * ((count_x1.sum(axis=0)[0] + 1) / 16)p_y12 = ((count_x1[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x2[2][1] + 1) / (count_x2.sum(axis=0)[1] + 3)) * ((count_x1.sum(axis=0)[1] + 1) / 16)p_y1 = p_y11 / (p_y11 + p_y12)p_y2 = p_y12 / (p_y11 + p_y12)print("Y为1的概率为：", p_y1)print("Y为-1的概率为：", p_y2)if p_y1 > p_y2:print("该测试集为1类")else:print("该测试集为-1类")# 调用
train(test)

以上就是关于这道题的代码以及运行结果，具体大家需要先去了解一下朴素贝叶斯分类算法的思路（针对小白，大佬当我没说，尽管在评论区怼我，技术进步源于“交流”嘛！），剩下的看注释就完全可以理解，我也是个菜鸡，代码也没用到什么高深的东西，相信大家都能看懂，其余就留在评论区吧，我每天大约都会登录csdn，大家的留言基本都会看到。
（特别感谢逯敬一同学的纠错和老师的讲解）

带有拉普拉斯平滑的Naive Bayes python代码实现相关推荐

一文速学数模-时序预测模型(四)二次指数平滑法和三次指数平滑法详解+Python代码实现
目录前言二次指数平滑法(Holt's linear trend method) 1.定义 2.公式二次指数平滑值: 二次指数平滑数学模型: 3.案例实现三次指数平滑法(Holt-Winters ...
一文速学数模-时序预测模型(五)指数平滑法详解+Python代码实现
目录前言一.指数平滑法 1.简介 2.特点 3.基本原理 4.优缺点优点缺点</
逻辑回归和拉普拉斯平滑作业
任务执行您认为必要的预处理使用分层抽样将数据集划分为训练和验证数据集检查训练和验证数据集的类分布安装.加载和阅读"naivebayes"包使用以下命令构建基本的朴素贝叶斯 ...
python机器学习-朴素贝叶斯（Naive Bayes）模型建立及评估（完整代码+实现效果）
实现功能: python机器学习-朴素贝叶斯(Naive Bayes)模型建立及评估. 实现代码: # 导入需要的库 from warnings import simplefilter simplef ...
Naive Bayes 朴素贝叶斯代码实现-Python
Implementing Naive Bayes in Python To actually implement the naive Bayes classifier model, we're goi ...
python文本分类算法_基于Naive Bayes算法的文本分类
理论什么是朴素贝叶斯算法? 朴素贝叶斯分类器是一种基于贝叶斯定理的弱分类器,所有朴素贝叶斯分类器都假定样本每个特征与其他特征都不相关.举个例子,如果一种水果其具有红,圆,直径大概3英寸等特征,该水果 ...
python贝叶斯模型_【机器学习速成宝典】模型篇05朴素贝叶斯【Naive Bayes】（Python版）...
目录先验概率与后验概率条件概率公式.全概率公式.贝叶斯公式什么是朴素贝叶斯(Naive Bayes) 拉普拉斯平滑(Laplace Smoothing) 应用:遇到连续变量怎么办?(多项式分布, ...
朴素贝叶斯（Naive Bayes）原理+编程实现拉普拉斯修正的朴素贝叶斯分类器
贝叶斯方法与朴素贝叶斯 1.生成模型与判别模型 2.贝叶斯 2.1贝叶斯公式 2.2贝叶斯方法 3朴素贝叶斯 3.1条件独立性假设 3.2朴素贝叶斯Naive在何处? 3.3朴素贝叶斯的三种模型 3. ...
机器学习---用python实现朴素贝叶斯算法（Machine Learning Naive Bayes Algorithm Application）...
在<机器学习---朴素贝叶斯分类器(Machine Learning Naive Bayes Classifier)>一文中,我们介绍了朴素贝叶斯分类器的原理.现在,让我们来实践一下. 在 ...
朴素贝叶斯（Naive Bayes）(原理+Python实现)
朴素贝叶斯(Naive Bayes)(原理+Python实现) 创作背景算法分类生成类算法判别类算法区别知识补充朴素贝叶斯算法举个栗子求解思路求解过程(数学计算) 代码实现自己实现 ...

带有拉普拉斯平滑的Naive Bayes python代码实现

带有拉普拉斯平滑的Naive Bayes python代码实现

带有拉普拉斯平滑的Naive Bayes python代码实现相关推荐

最新文章

热门文章