【AI数学原理】概率机器学习(四):半朴素贝叶斯之TAN算法实例
概率机器学习的系列博文已经写到第四篇了,依然关注者乏。虽说来只是自己的学习记录,不过这样下去显得自己贡献值比较低下。于是今天换一种模式来写博文——结合代码实现。
欢迎各位指点交流~
预备知识:
1.朴素贝叶斯算法
2.Python代码阅读能力:基于python3.6实现,无须安装机器学习模块。
1. 半朴素贝叶斯的“半”
前面说到,朴素贝叶斯(NB)的‘朴素’就体现在它假设各属性之间没有相互依赖,可以简化贝叶斯公式中P(x|c)的计算。但事实上,属性直接完全没有依赖的情况是非常少的。如果简单粗暴用朴素贝叶斯建模,泛化能力和鲁棒性都难以达到令人满意。
这就可以引出主角半朴素贝叶斯了,它假定每个属性最多只依赖一个(或k个)其他属性。它考虑属性间的相互依赖,但假定依赖只有一个(ODE)或k个(kDE)其他属性。这就是半朴素贝叶斯的’半’所体现之处。所以有:
PPP(c'>ccc|x⃗x→\vec{x})相关于相关于相关于P(c)P(c)P(c)∏di=1∏i=1d\prod_{i=1}^dP(xi|c,pai)P(xi|c,pai)P(x_i|c,pa_i)
paipaipa_i表示xixix_i的父属性
2. 常见的半朴素贝叶斯算法SPODE和TAN
SPODE算法:假设所有属性都依赖同一个属性,这个属性称为“超父”属性。
TAN算法:通过最大带权生成树算法确定属性之间的依赖关系,简单点说,就是每个属性找到跟自己最相关的属性,然后形成一个有向边(只能往一个方向)。
下面的图借自周志华老师西瓜书:
3. 详解TAN算法
关于TAN算法的实现步骤,网上有很严谨的说明。我尽量用好理解的话解释TAN的步骤,不太严谨切勿深究。步骤如下:
- 计算任意两个属性之间的条件互信息(CMI,即相互依赖程度)
以每个属性为节点(nodenodenode),CMI为边(edgeedgeedge)形成一张图。找到这张图的最大带权生成树。即找到一个节点之间的连接规则,这个规则满足三个条件:
- 能够连接所有节点;
- 使用最少数目的边;
- 边长(CMI)总和最大
再把节点连接关系设置为有向,即从父节点指向子节点。在这里把最先出现的属性设置为根节点,再由根节点出发来确定边的方向。在代码中把第2步和第3步一起执行。
- 求∏di=1∏i=1d\prod_{i=1}^dP(xi|c,pai)P(xi|c,pai)P(x_i|c,pa_i)
代码,训练数据和测试数据都在我的github上有:
https://github.com/leviome/TAN
import re as pattern
import sys
import math
if len(sys.argv) != 4:train = open('lymph_train.arff')test = open('lymph_test.arff')mode = 't'
else:train = open(sys.argv[1])test = open(sys.argv[2])mode = sys.argv[3]# name list of different features : ['A1', 'A2', 'A3', 'A4', 'A5', 'A8', 'A14', 'A15']
feature_names = []# value lists for different features : len(value)=len(type), each inner list includes all value for one feature
# [ [..], [..], [..], [..]', [..], [..], [..], [..] ]
feature_values = []# value list for different labels : ['+', '-']
class_list = []# list of training data (features)
train_data = []# list of training data (labels)
train_data_label = []# list of testing data (features)
test_data = []# list of testing data (labels)
test_data_label = []# calculate CMI between i feature and j feature, traverse all possible values and classes
# return CMI value between feature[I] and feature[J]
def calculate_conditional_mutual_information(I, J):# list for all possible values within a certain featureXI = feature_values[I]XJ = feature_values[J]CMI = 0.0# get the list of combination of XI, XJ and all classesXI_XJ_Class = list([])for k in range(len(train_data)):XI_XJ_Class.append([train_data[k][I], train_data[k][J], train_data_label[k]])# get the list of [XI], [XJ], [XI,XJ]for single_class in class_list:# extract column with same class and certain featurefeature_is_XI = extract_feature_col(train_data_label, single_class, train_data, I, None)feature_is_XJ = extract_feature_col(train_data_label, single_class, train_data, J, None)feature_is_XIXJ = extract_feature_col(train_data_label, single_class, train_data, I, J)for i in range(len(XI)):# single value of XIxi = XI[i]for j in range(len(XJ)):# single value of XJxj = XJ[j]# calculate conditional possibility of xi, xj, or xi,xj , given class y# need to match xi or xj value with all train data within same classpossibility_xi_given_y = laplace_estimate_possibility(feature_is_XI, [xi], len(feature_values[I]))possibility_xj_given_y = laplace_estimate_possibility(feature_is_XJ, [xj], len(feature_values[J]))possibility_xixj_given_y = laplace_estimate_possibility(feature_is_XIXJ, [xi, xj], len(feature_values[I]) * len(feature_values[J]))possibility_xixjy = laplace_estimate_possibility(XI_XJ_Class, [xi, xj, single_class], len(feature_values[I]) * len(feature_values[J]) * len(class_list))CMI = CMI + possibility_xixjy * math.log(possibility_xixj_given_y / (possibility_xi_given_y * possibility_xj_given_y), 2)return CMI# extract train_data (only the columns of index_of_feature) by checking whether train_data_label == goal_class_value
# return 1 column or 2 columns which has the same class
def extract_feature_col(class__train, goal_class_value, feature__train, index1_of_feature, index2_of_feature):col_of_certain_feature = list([])# traverse all train datafor i in range(len(class__train)):if class__train[i] == goal_class_value:tem = list([])# row: i, column: I, in train datatem.append(feature__train[i][index1_of_feature])if index2_of_feature is not None:tem.append(feature__train[i][index2_of_feature])# make a list in a list: [ [XI or XJ or [XI,XJ], ... , [XI or XJ or [XI,XJ] ]col_of_certain_feature.append(tem)return col_of_certain_feature# in the same class, each xi or xj value's ratio/possibility;
# return a possibility value
def laplace_estimate_possibility(feature_column_with_same_class, feature_value, amount_of_values_combination):num = 0for i in range(len(feature_column_with_same_class)):if feature_column_with_same_class[i] == feature_value:num += 1# numerator + 1 to avoid 0; corresponds to denominator add the amount of all value combinationreturn float(num + 1) / (len(feature_column_with_same_class) + amount_of_values_combination)# find a maximum search tree using the edges existed; finally, each node has and has only one parent
# return a matrix (graph) which represents the maximum search tree (row is parent, column is child)
def prims_algorithm(edges, graph):all_candidates = set(range(0, len(feature_names)))# To root the maximal weight spanning tree, pick the first attribute (index=0) in the input file as the rootparent_candidates = set()parent_candidates.add(0)child_candidates = set(range(1, len(feature_names)))parent_child_list = list([])# If there are ties in selecting maximum weight edges, use the following preference criteria:# 1. Prefer edges emanating from attributes listed earlier in the input file.while parent_candidates != all_candidates:current_max = float('-inf')parent = Nonechild = Nonefor i in parent_candidates:for j in child_candidates:# 2. If there are multiple maximal weight edges emanating from the first such attribute,# prefer edges going to attributes listed earlier in the input file.if edges[i][j] <= current_max:passelif edges[i][j] > current_max:parent = ichild = jcurrent_max = edges[i][j]parent_child_list.append([parent, child])parent_candidates.add(child)child_candidates.remove(child)# Finally, in parent_child_list, each child [i][1] will appear only once,no repetition, totally len(features)-1# (exclusively 0, namely the root)for i in range(len(parent_child_list)):graph[parent_child_list[i][0]][parent_child_list[i][1]] = True# print(parent_child_list)# return conditional probability p(child_node_value | class_node_value, parent_node_value),
# given class_value and parent_value
def conditional_probability(child_node_value, class_node_value, parent_node_value, parent_feature_index, child_feature_index):# extract the column of child_feature_index from the segment which has parent_node_value and class_node_valuechild_column = list([])for i in range(len(train_data)):if train_data[i][parent_feature_index] == parent_node_value:if train_data_label[i] == class_node_value:child_column.append(train_data[i][child_feature_index])# child_column has all values of feature[child_feature_index]num = 0for i in range(len(child_column)):if child_column[i] == child_node_value:num += 1# numerator + 1 to avoid 0; corresponds to denominator add the amount of all value combinationreturn float(num + 1) / (len(child_column) + len(feature_values[child_feature_index]))# return a dictionary includes all cases for all XI, XJ, class values (traverse 3 "for")
# key order is child_value_index + class_value_index + parent_value_index
def create_dictionary_of_conditional_probability(parent_feature_index, child_feature_index):parent_feature = feature_values[parent_feature_index]child_feature = feature_values[child_feature_index]dictionary = {}for i in range(len(parent_feature)):for j in range(len(child_feature)):for k in range(len(class_list)):# p(child_value | class_value, parent_value)key = str(j) + str(k) + str(i)dictionary[key] = conditional_probability(child_feature[j], class_list[k], parent_feature[i], parent_feature_index, child_feature_index)return dictionary# return p( X | class_list[class_value_index] ), X is determined by a_row_test_data
def prior_probability(graph, dictionaries, a_row_test_data, class_value_index):single_prior_probability = list([])chain_rule_prior_probability = 1.0for child in range(len(a_row_test_data)):parent_value_index = Nonechild_value_index = None# find current child_feature's parent_feature and their value_indexfor parent in range(len(a_row_test_data)):if graph[parent][child] == 1:# find parent_value_index using parent_value in a_row_test_datafor parent_value_index in range(len(feature_values[parent])):if feature_values[parent][parent_value_index] == a_row_test_data[parent]:break# find child_value_index using child_value in a_row_test_datafor child_value_index in range(len(feature_values[child])):if feature_values[child][child_value_index] == a_row_test_data[child]:breakbreakif child == 0: # root# find the child_value_index inside the root feature, using the a_row_test_data[child]for child_value_index in range(len(feature_values[child])):if feature_values[child][child_value_index] == a_row_test_data[child]:break# p( x0 | class_list[class_value_index])single_prior_probability.append(dictionaries[child][str(child_value_index) + str(class_value_index)])else: # other nodes# p( xj | class_list[class_value_index], x_parent )single_prior_probability.append(dictionaries[child][str(child_value_index) + str(class_value_index) + str(parent_value_index)])for i in range(len(single_prior_probability)):chain_rule_prior_probability = chain_rule_prior_probability * single_prior_probability[i]return chain_rule_prior_probability# binary classification problem
def TAN_predict(graph, dictionaries, a_row_test_data):probability_list = list([]) # for all class valuessum = 0.0# class_value_index is ifor i in range(len(class_list)):# p(yi)probability_of_yi = laplace_estimate_possibility(train_data_label, class_list[i], len(class_list))# compute prior probability p( X | yi )probability_of_feature_i = prior_probability(graph, dictionaries, a_row_test_data, i)probability_list.append(probability_of_yi * probability_of_feature_i)# the sum of all probability_list[i] is not 1 !!!sum += probability_list[i]max_probability = float('-inf')y_predict_index = 0for i in range(len(probability_list)):# now, the sum of all probability_list[i] is 1# now, probability_list[i] is posterior probability p( y_predict | test_data[i] )probability_list[i] = probability_list[i] / sumif probability_list[i] > max_probability:max_probability = probability_list[i]y_predict_index = ireturn [class_list[y_predict_index], max_probability]# compute edges' weights -> find maximum spanning tree (graph) ->
# according to the graph, construct a dictionary includes all combinations' conditional probability (dictionaries) ->
# classify test_data using Bayes Net chain rules (y_predict and probability_is_predict)
def TAN():# weights' matrix ( k by k, k is the length of features)edges = list([])for i in range(len(feature_names)):edges.append([])for j in range(len(feature_names)):edges[i].append(0)# calculate_edges(weights)for i in range(len(feature_names)):for j in range(i + 1, len(feature_names)):edges[i][j] = calculate_conditional_mutual_information(i, j)edges[j][i] = edges[i][j]# find maximum weight spanning tree (MST)graph = list([])for i in range(len(feature_names)):graph.append([])for j in range(len(feature_names)):graph[i].append(0)prims_algorithm(edges, graph)# print(graph)# output the maximum weight spanning tree with edge directions# after prims_algorithm, graph matrix's each column should have and have only one True# (each node have only one parent, except the root)for j in range(len(graph)):if j == 0: # the rootprint('1: %s class' % feature_names[j])else: # other nodesfor i in range(len(graph)):if graph[i][j] is True:# output child first and then parentprint('2: %s %s class' % (feature_names[j], feature_names[i]))breakprint('')# construct a big dictionary to store all possible conditional probability according to "graph" structurelength = len(feature_names)dictionaries = list([{}] * length)for j in range(len(graph)): # child_feature_indexif j == 0: # the root# root: p( x0 | all y )dictionary = {}for i in range(len(feature_values[j])): # root feature's all valuesfor k in range(len(class_list)):# certain class_valuefeature_is_xroot = extract_feature_col(train_data_label, class_list[k], train_data, j, None)# get the ratio of certain feature_value in the column above# p ( xroot_value | class_value)dictionary[str(i) + str(k)] = laplace_estimate_possibility(feature_is_xroot, [feature_values[j][i]], len(feature_values[j]))dictionaries[j] = dictionaryelse: # other nodesfor i in range(len(graph)): # parent_feature_indexif graph[i][j] is True:dictionaries[j] = create_dictionary_of_conditional_probability(i, j)# print(dictionaries)# output: (i) the predicted class, (ii) the actual class, (iii) and the posterior probability of the predicted classcorrect = 0result = list([])for i in range(len(test_data)):[y_predict, probability_is_predict] = TAN_predict(graph, dictionaries, test_data[i])dummy = [y_predict, probability_is_predict]result.append(dummy)for i in range(len(result)):if result[i][0] == test_data_label[i]:correct += 1print('3: %s %s %.12f' % (result[i][0], test_data_label[i], result[i][1]))print('')print(correct)# comopute p(xj|class_value) -> p(X|class_value) -> p(X|class_value)*p(class_value) (nominator) ->
# sum all p(X|class_value)*p(class_value) for all class_value (denominator) ->
# p(class_value|X) (the probability of "prediction is class_value") ->
# find max probability for all class_value, namely prediction label
def naive_bayes():correct = 0probability_is_class_i = list([])for i in range(len(feature_names)):print(feature_names[i] + ' class')print('')# pre-set memoryfor t in range(len(test_data)):probability_is_class_i.append([])# get p(Y|X) for each y of each test samplesfor t in range(len(test_data)):sample_data = test_data[t]# get p(Y|X) for each yfor i in range(len(class_list)):# p(X|class_value) = p(x1|class_value) * p(x2|class_value) * ... * p(xj|class_value)probability_X_given_class_value = 1.0for k in range(len(feature_names)):temp_list = extract_feature_col(train_data_label, class_list[i], train_data, k, None)probability_X_given_class_value *= laplace_estimate_possibility(temp_list, [sample_data[k]], len(feature_values[k]))# p(class_value) * p(X | class_value)probability_class_value = laplace_estimate_possibility(train_data_label, class_list[i], len(class_list))probability_is_class_i[t].append(probability_class_value * probability_X_given_class_value)# find max probability as prediction and match it with ture labelfor t in range(len(test_data)):denominator = 0.0max_probability = 0.0index_of_class_value_prediction = 0# get bayes rule's denominatorfor i in range(len(class_list)):denominator += float(probability_is_class_i[t][i])# p(class_list[i] | X)for i in range(len(class_list)):probability_is_class_i[t][i] = probability_is_class_i[t][i] / denominatorif probability_is_class_i[t][i] > max_probability:max_probability = probability_is_class_i[t][i]index_of_class_value_prediction = i# match prediction with true labelif class_list[index_of_class_value_prediction] == test_data_label[t]:correct += 1print('%s %s %.12f' % (class_list[index_of_class_value_prediction], test_data_label[t], max_probability))print('')print(correct)begin_data = False
for line in train:if pattern.findall('@data', line) != []:begin_data = Trueelif pattern.findall('@attribute', line) != []:line = line.lstrip(' ')line = line.rstrip('\n')line = line.rstrip('\r')line = line.rstrip(' ')line = line.split(None, 2)line[1] = line[1].replace(' ', '')line[1] = line[1].replace('\'', '')line[2] = line[2].replace(' ', '')line[2] = line[2].replace('\'', '')line[2] = line[2].strip('{')line[2] = line[2].strip('}')line[2] = line[2].split(',')if line[1] != 'class':feature_names.append(line[1])feature_values.append(line[2])else:class_list = line[2]elif begin_data is True:line = line.strip('\n')line = line.strip('\r')line = line.replace(' ', '')line = line.replace('\'', '')line = line.split(',')temp = []for i in range(0, len(line) - 1):temp.append(line[i])train_data.append(temp)train_data_label.append(line[len(line) - 1])else:passbegin_data = False
for line in test:if pattern.findall('@data', line) != []:begin_data = Trueelif begin_data is True:line = line.strip('\n')line = line.strip('\r')line = line.replace(' ', '')line = line.replace('\'', '')line = line.split(',')temp = []for i in range(0, len(line) - 1):temp.append(line[i])test_data.append(temp)test_data_label.append(line[len(line) - 1])else:passif mode == 'n':naive_bayes()
elif mode == 't':TAN()
else:pass
【AI数学原理】概率机器学习(四):半朴素贝叶斯之TAN算法实例相关推荐
- 【机器学习】贝叶斯分类(通过通俗的例子轻松理解朴素贝叶斯与半朴素贝叶斯)
贝叶斯分类 贝叶斯分类是一类分类算法的总称,这类算法均以贝叶斯定理为基础,故统称为贝叶斯分类.而朴素朴素贝叶斯分类是贝叶斯分类中最简单,也是常见的一种分类方法.这篇文章我尽可能用直白的话语总结一下我们 ...
- 机器学习:基于朴素贝叶斯(Naive Bayes)的分类预测
目录 一.简介和环境准备 简介: 环境: 二.实战演练 2.1使用葡萄(Wine)数据集,进行贝叶斯分类 1.数据导入 2.模型训练 3.模型预测 2.2模拟离散数据集–贝叶斯分类 1.数据导入.分析 ...
- 机器学习基础:朴素贝叶斯及经典实例讲解
原文连接: https://www.cnblogs.com/lliuye/p/9178090.html https://www.cnblogs.com/huangyc/p/10327209.html ...
- 垃圾邮件分类快速理解机器学习中的朴素贝叶斯(Naive Bayes)
贝叶斯方法是一个历史悠久,有着坚实的理论基础的方法,同时处理很多问题时直接而又高效,很多高级自然语言处理模型也可以从它演化而来.因此,学习贝叶斯方法,是研究自然语言处理问题的一个非常好的切入口. 其实 ...
- 温州大学《机器学习》课程代码(四)朴素贝叶斯
机器学习练习4 朴素贝叶斯 代码修改并注释:黄海广,haiguang2000@wzu.edu.cn 代码下载: https://github.com/fengdu78/WZU-machine-lear ...
- 机器学习(四)朴素贝叶斯
朴素贝叶斯 1.朴素贝叶斯概述 1.1 条件概率 1.2 全概率公式 1.3 贝叶斯推论 2.朴素贝叶斯分类器应用 3.使用朴素贝叶斯过滤垃圾邮件 3.1 准备数据:切分文本 3.2 测试算法:使用朴 ...
- 机器学习面试题——朴素贝叶斯
机器学习面试题--朴素贝叶斯 提示:这些知识点也是大厂笔试经常考的题目,我记得阿里和京东就考!!!想必在互联网大厂就会用这些知识解决实际问题 朴素贝叶斯介绍一下 朴素贝叶斯优缺点 贝叶斯公式 朴素贝叶 ...
- 机器学习理论《统计学习方法》学习笔记:第四章 朴素贝叶斯法
机器学习理论<统计学习方法>学习笔记:第四章 朴素贝叶斯法 4 朴素贝叶斯法 4.1 朴素贝叶斯法的学习与分类 4.1.1 基本方法 4.1.2 后验概率最大化的含义 4.2 朴素贝叶斯法 ...
- 朴素贝叶斯 半朴素贝叶斯_使用朴素贝叶斯和N-Gram的Twitter情绪分析
朴素贝叶斯 半朴素贝叶斯 In this article, we'll show you how to classify a tweet into either positive or negativ ...
最新文章
- Solaris 10 x86 上折腾Mono
- (转)spring aop(下)
- boot.img的分析
- SAP Fiori Elements 应用里的 Title 显示的内容是从哪里来的
- 和SAP Sales Organization相关的一些有用函数
- mysql 添加用户_mysql创建用户与授权
- 原生类型的autoboxing和auto-unboxing
- 栈的复习(加减乘除表达式求值)
- mac搭建svn服务器文件被锁定,MAC搭建SVN服务器并配置Cornerstone
- 圈圈USB开发板 IDE40
- AndroidStudio实现在图片上涂鸦并记录涂鸦轨迹
- java判断字符是否为0_Java判断字符串是否为空
- 视频解码芯片GM7150BC功能CVBS转BT656/601低功耗替代TVP5150
- HTML期末学生大作业 响应式动漫网页作业 html+css+javascript (1)
- 不同以“网”,云端地球全新升级上线!
- 单位转换html代码,JavaScript实现长度单位转换
- 面向虚拟现实更新旧版 Unity* 游戏
- ICCV2017 论文浏览记录(转)
- 量子计算机是伪科学,科学网-量子力学一门无人理解也没用处的“伪科学”?-张林的博文...
- 智慧工业:RFID智能资产管理,RFID资产管理高效,便捷-新导智能