概率机器学习的系列博文已经写到第四篇了,依然关注者乏。虽说来只是自己的学习记录,不过这样下去显得自己贡献值比较低下。于是今天换一种模式来写博文——结合代码实现。
欢迎各位指点交流~


预备知识:
1.朴素贝叶斯算法
2.Python代码阅读能力:基于python3.6实现,无须安装机器学习模块。


1. 半朴素贝叶斯的“半”

前面说到,朴素贝叶斯(NB)的‘朴素’就体现在它假设各属性之间没有相互依赖,可以简化贝叶斯公式中P(x|c)的计算。但事实上,属性直接完全没有依赖的情况是非常少的。如果简单粗暴用朴素贝叶斯建模,泛化能力和鲁棒性都难以达到令人满意。
      这就可以引出主角半朴素贝叶斯了,它假定每个属性最多只依赖一个(或k个)其他属性。它考虑属性间的相互依赖,但假定依赖只有一个(ODE)或k个(kDE)其他属性。这就是半朴素贝叶斯的’半’所体现之处。所以有:

PPP(c'>ccc|x⃗x→\vec{x})相关于相关于相关于P(c)P(c)P(c)∏di=1∏i=1d\prod_{i=1}^dP(xi|c,pai)P(xi|c,pai)P(x_i|c,pa_i)
paipaipa_i表示xixix_i的父属性


2. 常见的半朴素贝叶斯算法SPODE和TAN

SPODE算法:假设所有属性都依赖同一个属性,这个属性称为“超父”属性。
TAN算法:通过最大带权生成树算法确定属性之间的依赖关系,简单点说,就是每个属性找到跟自己最相关的属性,然后形成一个有向边(只能往一个方向)。
下面的图借自周志华老师西瓜书:


3. 详解TAN算法

关于TAN算法的实现步骤,网上有很严谨的说明。我尽量用好理解的话解释TAN的步骤,不太严谨切勿深究。步骤如下:

  1. 计算任意两个属性之间的条件互信息(CMI,即相互依赖程度)
  2. 以每个属性为节点(nodenodenode),CMI为边(edgeedgeedge)形成一张图。找到这张图的最大带权生成树。即找到一个节点之间的连接规则,这个规则满足三个条件:

    1. 能够连接所有节点;
    2. 使用最少数目的边;
    3. 边长(CMI)总和最大
  3. 再把节点连接关系设置为有向,即从父节点指向子节点。在这里把最先出现的属性设置为根节点,再由根节点出发来确定边的方向。在代码中把第2步和第3步一起执行。

  4. 求∏di=1∏i=1d\prod_{i=1}^dP(xi|c,pai)P(xi|c,pai)P(x_i|c,pa_i)

代码,训练数据和测试数据都在我的github上有:
https://github.com/leviome/TAN

import re as pattern
import sys
import math
if len(sys.argv) != 4:train = open('lymph_train.arff')test = open('lymph_test.arff')mode = 't'
else:train = open(sys.argv[1])test = open(sys.argv[2])mode = sys.argv[3]# name list of different features : ['A1', 'A2', 'A3', 'A4', 'A5', 'A8', 'A14', 'A15']
feature_names = []# value lists for different features : len(value)=len(type), each inner list includes all value for one feature
# [ [..], [..], [..], [..]', [..], [..], [..], [..] ]
feature_values = []# value list for different labels : ['+', '-']
class_list = []# list of training data (features)
train_data = []# list of training data (labels)
train_data_label = []# list of testing data (features)
test_data = []# list of testing data (labels)
test_data_label = []# calculate CMI between i feature and j feature, traverse all possible values and classes
# return CMI value between feature[I] and feature[J]
def calculate_conditional_mutual_information(I, J):# list for all possible values within a certain featureXI = feature_values[I]XJ = feature_values[J]CMI = 0.0# get the list of combination of XI, XJ and all classesXI_XJ_Class = list([])for k in range(len(train_data)):XI_XJ_Class.append([train_data[k][I], train_data[k][J], train_data_label[k]])# get the list of [XI], [XJ], [XI,XJ]for single_class in class_list:# extract column with same class and certain featurefeature_is_XI = extract_feature_col(train_data_label, single_class, train_data, I, None)feature_is_XJ = extract_feature_col(train_data_label, single_class, train_data, J, None)feature_is_XIXJ = extract_feature_col(train_data_label, single_class, train_data, I, J)for i in range(len(XI)):# single value of XIxi = XI[i]for j in range(len(XJ)):# single value of XJxj = XJ[j]# calculate conditional possibility of xi, xj, or xi,xj , given class y# need to match xi or xj value with all train data within same classpossibility_xi_given_y = laplace_estimate_possibility(feature_is_XI, [xi], len(feature_values[I]))possibility_xj_given_y = laplace_estimate_possibility(feature_is_XJ, [xj], len(feature_values[J]))possibility_xixj_given_y = laplace_estimate_possibility(feature_is_XIXJ, [xi, xj], len(feature_values[I]) * len(feature_values[J]))possibility_xixjy = laplace_estimate_possibility(XI_XJ_Class, [xi, xj, single_class], len(feature_values[I]) * len(feature_values[J]) * len(class_list))CMI = CMI + possibility_xixjy * math.log(possibility_xixj_given_y / (possibility_xi_given_y * possibility_xj_given_y), 2)return CMI# extract train_data (only the columns of index_of_feature) by checking whether train_data_label == goal_class_value
# return 1 column or 2 columns which has the same class
def extract_feature_col(class__train, goal_class_value, feature__train, index1_of_feature, index2_of_feature):col_of_certain_feature = list([])# traverse all train datafor i in range(len(class__train)):if class__train[i] == goal_class_value:tem = list([])# row: i, column: I, in train datatem.append(feature__train[i][index1_of_feature])if index2_of_feature is not None:tem.append(feature__train[i][index2_of_feature])# make a list in a list: [ [XI or XJ or [XI,XJ], ... , [XI or XJ or [XI,XJ] ]col_of_certain_feature.append(tem)return col_of_certain_feature# in the same class, each xi or xj value's ratio/possibility;
# return a possibility value
def laplace_estimate_possibility(feature_column_with_same_class, feature_value, amount_of_values_combination):num = 0for i in range(len(feature_column_with_same_class)):if feature_column_with_same_class[i] == feature_value:num += 1# numerator + 1 to avoid 0; corresponds to denominator add the amount of all value combinationreturn float(num + 1) / (len(feature_column_with_same_class) + amount_of_values_combination)# find a maximum search tree using the edges existed; finally, each node has and has only one parent
# return a matrix (graph) which represents the maximum search tree (row is parent, column is child)
def prims_algorithm(edges, graph):all_candidates = set(range(0, len(feature_names)))# To root the maximal weight spanning tree, pick the first attribute (index=0) in the input file as the rootparent_candidates = set()parent_candidates.add(0)child_candidates = set(range(1, len(feature_names)))parent_child_list = list([])# If there are ties in selecting maximum weight edges, use the following preference criteria:# 1. Prefer edges emanating from attributes listed earlier in the input file.while parent_candidates != all_candidates:current_max = float('-inf')parent = Nonechild = Nonefor i in parent_candidates:for j in child_candidates:# 2. If there are multiple maximal weight edges emanating from the first such attribute,#    prefer edges going to attributes listed earlier in the input file.if edges[i][j] <= current_max:passelif edges[i][j] > current_max:parent = ichild = jcurrent_max = edges[i][j]parent_child_list.append([parent, child])parent_candidates.add(child)child_candidates.remove(child)# Finally, in parent_child_list, each child [i][1] will appear only once,no repetition, totally len(features)-1# (exclusively 0, namely the root)for i in range(len(parent_child_list)):graph[parent_child_list[i][0]][parent_child_list[i][1]] = True# print(parent_child_list)# return conditional probability p(child_node_value | class_node_value, parent_node_value),
# given class_value and parent_value
def conditional_probability(child_node_value, class_node_value, parent_node_value, parent_feature_index, child_feature_index):# extract the column of child_feature_index from the segment which has parent_node_value and class_node_valuechild_column = list([])for i in range(len(train_data)):if train_data[i][parent_feature_index] == parent_node_value:if train_data_label[i] == class_node_value:child_column.append(train_data[i][child_feature_index])# child_column has all values of feature[child_feature_index]num = 0for i in range(len(child_column)):if child_column[i] == child_node_value:num += 1# numerator + 1 to avoid 0; corresponds to denominator add the amount of all value combinationreturn float(num + 1) / (len(child_column) + len(feature_values[child_feature_index]))# return a dictionary includes all cases for all XI, XJ, class values (traverse 3 "for")
# key order is child_value_index + class_value_index + parent_value_index
def create_dictionary_of_conditional_probability(parent_feature_index, child_feature_index):parent_feature = feature_values[parent_feature_index]child_feature = feature_values[child_feature_index]dictionary = {}for i in range(len(parent_feature)):for j in range(len(child_feature)):for k in range(len(class_list)):# p(child_value | class_value, parent_value)key = str(j) + str(k) + str(i)dictionary[key] = conditional_probability(child_feature[j], class_list[k], parent_feature[i], parent_feature_index, child_feature_index)return dictionary# return p( X | class_list[class_value_index] ), X is determined by a_row_test_data
def prior_probability(graph, dictionaries, a_row_test_data, class_value_index):single_prior_probability = list([])chain_rule_prior_probability = 1.0for child in range(len(a_row_test_data)):parent_value_index = Nonechild_value_index = None# find current child_feature's parent_feature and their value_indexfor parent in range(len(a_row_test_data)):if graph[parent][child] == 1:# find parent_value_index using parent_value in a_row_test_datafor parent_value_index in range(len(feature_values[parent])):if feature_values[parent][parent_value_index] == a_row_test_data[parent]:break# find child_value_index using child_value in a_row_test_datafor child_value_index in range(len(feature_values[child])):if feature_values[child][child_value_index] == a_row_test_data[child]:breakbreakif child == 0:  # root# find the child_value_index inside the root feature, using the a_row_test_data[child]for child_value_index in range(len(feature_values[child])):if feature_values[child][child_value_index] == a_row_test_data[child]:break# p( x0 | class_list[class_value_index])single_prior_probability.append(dictionaries[child][str(child_value_index) + str(class_value_index)])else:  # other nodes# p( xj | class_list[class_value_index], x_parent )single_prior_probability.append(dictionaries[child][str(child_value_index) + str(class_value_index) + str(parent_value_index)])for i in range(len(single_prior_probability)):chain_rule_prior_probability = chain_rule_prior_probability * single_prior_probability[i]return chain_rule_prior_probability# binary classification problem
def TAN_predict(graph, dictionaries, a_row_test_data):probability_list = list([])  # for all class valuessum = 0.0# class_value_index is ifor i in range(len(class_list)):# p(yi)probability_of_yi = laplace_estimate_possibility(train_data_label, class_list[i], len(class_list))# compute prior probability p( X | yi )probability_of_feature_i = prior_probability(graph, dictionaries, a_row_test_data, i)probability_list.append(probability_of_yi * probability_of_feature_i)# the sum of all probability_list[i] is not 1 !!!sum += probability_list[i]max_probability = float('-inf')y_predict_index = 0for i in range(len(probability_list)):# now, the sum of all probability_list[i] is 1# now, probability_list[i] is posterior probability p( y_predict | test_data[i] )probability_list[i] = probability_list[i] / sumif probability_list[i] > max_probability:max_probability = probability_list[i]y_predict_index = ireturn [class_list[y_predict_index], max_probability]# compute edges' weights -> find maximum spanning tree (graph) ->
# according to the graph, construct a dictionary includes all combinations' conditional probability (dictionaries) ->
# classify test_data using Bayes Net chain rules (y_predict and probability_is_predict)
def TAN():# weights' matrix ( k by k, k is the length of features)edges = list([])for i in range(len(feature_names)):edges.append([])for j in range(len(feature_names)):edges[i].append(0)# calculate_edges(weights)for i in range(len(feature_names)):for j in range(i + 1, len(feature_names)):edges[i][j] = calculate_conditional_mutual_information(i, j)edges[j][i] = edges[i][j]# find maximum weight spanning tree (MST)graph = list([])for i in range(len(feature_names)):graph.append([])for j in range(len(feature_names)):graph[i].append(0)prims_algorithm(edges, graph)# print(graph)# output the maximum weight spanning tree with edge directions# after prims_algorithm, graph matrix's each column should have and have only one True# (each node have only one parent, except the root)for j in range(len(graph)):if j == 0:  # the rootprint('1: %s class' % feature_names[j])else:  # other nodesfor i in range(len(graph)):if graph[i][j] is True:# output child first and then parentprint('2: %s %s class' % (feature_names[j], feature_names[i]))breakprint('')# construct a big dictionary to store all possible conditional probability according to "graph" structurelength = len(feature_names)dictionaries = list([{}] * length)for j in range(len(graph)):  # child_feature_indexif j == 0:  # the root# root: p( x0 | all y )dictionary = {}for i in range(len(feature_values[j])):  # root feature's all valuesfor k in range(len(class_list)):# certain class_valuefeature_is_xroot = extract_feature_col(train_data_label, class_list[k], train_data, j, None)# get the ratio of certain feature_value in the column above# p ( xroot_value | class_value)dictionary[str(i) + str(k)] = laplace_estimate_possibility(feature_is_xroot, [feature_values[j][i]], len(feature_values[j]))dictionaries[j] = dictionaryelse:  # other nodesfor i in range(len(graph)):  # parent_feature_indexif graph[i][j] is True:dictionaries[j] = create_dictionary_of_conditional_probability(i, j)# print(dictionaries)# output: (i) the predicted class, (ii) the actual class, (iii) and the posterior probability of the predicted classcorrect = 0result = list([])for i in range(len(test_data)):[y_predict, probability_is_predict] = TAN_predict(graph, dictionaries, test_data[i])dummy = [y_predict, probability_is_predict]result.append(dummy)for i in range(len(result)):if result[i][0] == test_data_label[i]:correct += 1print('3: %s %s %.12f' % (result[i][0], test_data_label[i], result[i][1]))print('')print(correct)# comopute p(xj|class_value) -> p(X|class_value) ->  p(X|class_value)*p(class_value) (nominator)  ->
# sum all p(X|class_value)*p(class_value) for all class_value (denominator) ->
# p(class_value|X) (the probability of "prediction is class_value") ->
# find max probability for all class_value, namely prediction label
def naive_bayes():correct = 0probability_is_class_i = list([])for i in range(len(feature_names)):print(feature_names[i] + ' class')print('')# pre-set memoryfor t in range(len(test_data)):probability_is_class_i.append([])# get p(Y|X) for each y of each test samplesfor t in range(len(test_data)):sample_data = test_data[t]# get p(Y|X) for each yfor i in range(len(class_list)):# p(X|class_value) = p(x1|class_value) * p(x2|class_value) * ... * p(xj|class_value)probability_X_given_class_value = 1.0for k in range(len(feature_names)):temp_list = extract_feature_col(train_data_label, class_list[i], train_data, k, None)probability_X_given_class_value *= laplace_estimate_possibility(temp_list, [sample_data[k]], len(feature_values[k]))# p(class_value) * p(X | class_value)probability_class_value = laplace_estimate_possibility(train_data_label, class_list[i], len(class_list))probability_is_class_i[t].append(probability_class_value * probability_X_given_class_value)# find max probability as prediction and match it with ture labelfor t in range(len(test_data)):denominator = 0.0max_probability = 0.0index_of_class_value_prediction = 0# get bayes rule's denominatorfor i in range(len(class_list)):denominator += float(probability_is_class_i[t][i])# p(class_list[i] | X)for i in range(len(class_list)):probability_is_class_i[t][i] = probability_is_class_i[t][i] / denominatorif probability_is_class_i[t][i] > max_probability:max_probability = probability_is_class_i[t][i]index_of_class_value_prediction = i# match prediction with true labelif class_list[index_of_class_value_prediction] == test_data_label[t]:correct += 1print('%s %s %.12f' % (class_list[index_of_class_value_prediction], test_data_label[t], max_probability))print('')print(correct)begin_data = False
for line in train:if pattern.findall('@data', line) != []:begin_data = Trueelif pattern.findall('@attribute', line) != []:line = line.lstrip(' ')line = line.rstrip('\n')line = line.rstrip('\r')line = line.rstrip(' ')line = line.split(None, 2)line[1] = line[1].replace(' ', '')line[1] = line[1].replace('\'', '')line[2] = line[2].replace(' ', '')line[2] = line[2].replace('\'', '')line[2] = line[2].strip('{')line[2] = line[2].strip('}')line[2] = line[2].split(',')if line[1] != 'class':feature_names.append(line[1])feature_values.append(line[2])else:class_list = line[2]elif begin_data is True:line = line.strip('\n')line = line.strip('\r')line = line.replace(' ', '')line = line.replace('\'', '')line = line.split(',')temp = []for i in range(0, len(line) - 1):temp.append(line[i])train_data.append(temp)train_data_label.append(line[len(line) - 1])else:passbegin_data = False
for line in test:if pattern.findall('@data', line) != []:begin_data = Trueelif begin_data is True:line = line.strip('\n')line = line.strip('\r')line = line.replace(' ', '')line = line.replace('\'', '')line = line.split(',')temp = []for i in range(0, len(line) - 1):temp.append(line[i])test_data.append(temp)test_data_label.append(line[len(line) - 1])else:passif mode == 'n':naive_bayes()
elif mode == 't':TAN()
else:pass

【AI数学原理】概率机器学习(四):半朴素贝叶斯之TAN算法实例相关推荐

  1. 【机器学习】贝叶斯分类(通过通俗的例子轻松理解朴素贝叶斯与半朴素贝叶斯)

    贝叶斯分类 贝叶斯分类是一类分类算法的总称,这类算法均以贝叶斯定理为基础,故统称为贝叶斯分类.而朴素朴素贝叶斯分类是贝叶斯分类中最简单,也是常见的一种分类方法.这篇文章我尽可能用直白的话语总结一下我们 ...

  2. 机器学习:基于朴素贝叶斯(Naive Bayes)的分类预测

    目录 一.简介和环境准备 简介: 环境: 二.实战演练 2.1使用葡萄(Wine)数据集,进行贝叶斯分类 1.数据导入 2.模型训练 3.模型预测 2.2模拟离散数据集–贝叶斯分类 1.数据导入.分析 ...

  3. 机器学习基础:朴素贝叶斯及经典实例讲解

    原文连接: https://www.cnblogs.com/lliuye/p/9178090.html https://www.cnblogs.com/huangyc/p/10327209.html ...

  4. 垃圾邮件分类快速理解机器学习中的朴素贝叶斯(Naive Bayes)

    贝叶斯方法是一个历史悠久,有着坚实的理论基础的方法,同时处理很多问题时直接而又高效,很多高级自然语言处理模型也可以从它演化而来.因此,学习贝叶斯方法,是研究自然语言处理问题的一个非常好的切入口. 其实 ...

  5. 温州大学《机器学习》课程代码(四)朴素贝叶斯

    机器学习练习4 朴素贝叶斯 代码修改并注释:黄海广,haiguang2000@wzu.edu.cn 代码下载: https://github.com/fengdu78/WZU-machine-lear ...

  6. 机器学习(四)朴素贝叶斯

    朴素贝叶斯 1.朴素贝叶斯概述 1.1 条件概率 1.2 全概率公式 1.3 贝叶斯推论 2.朴素贝叶斯分类器应用 3.使用朴素贝叶斯过滤垃圾邮件 3.1 准备数据:切分文本 3.2 测试算法:使用朴 ...

  7. 机器学习面试题——朴素贝叶斯

    机器学习面试题--朴素贝叶斯 提示:这些知识点也是大厂笔试经常考的题目,我记得阿里和京东就考!!!想必在互联网大厂就会用这些知识解决实际问题 朴素贝叶斯介绍一下 朴素贝叶斯优缺点 贝叶斯公式 朴素贝叶 ...

  8. 机器学习理论《统计学习方法》学习笔记:第四章 朴素贝叶斯法

    机器学习理论<统计学习方法>学习笔记:第四章 朴素贝叶斯法 4 朴素贝叶斯法 4.1 朴素贝叶斯法的学习与分类 4.1.1 基本方法 4.1.2 后验概率最大化的含义 4.2 朴素贝叶斯法 ...

  9. 朴素贝叶斯 半朴素贝叶斯_使用朴素贝叶斯和N-Gram的Twitter情绪分析

    朴素贝叶斯 半朴素贝叶斯 In this article, we'll show you how to classify a tweet into either positive or negativ ...

最新文章

  1. Solaris 10 x86 上折腾Mono
  2. (转)spring aop(下)
  3. boot.img的分析
  4. SAP Fiori Elements 应用里的 Title 显示的内容是从哪里来的
  5. 和SAP Sales Organization相关的一些有用函数
  6. mysql 添加用户_mysql创建用户与授权
  7. 原生类型的autoboxing和auto-unboxing
  8. 栈的复习(加减乘除表达式求值)
  9. mac搭建svn服务器文件被锁定,MAC搭建SVN服务器并配置Cornerstone
  10. 圈圈USB开发板 IDE40
  11. AndroidStudio实现在图片上涂鸦并记录涂鸦轨迹
  12. java判断字符是否为0_Java判断字符串是否为空
  13. 视频解码芯片GM7150BC功能CVBS转BT656/601低功耗替代TVP5150
  14. HTML期末学生大作业 响应式动漫网页作业 html+css+javascript (1)
  15. 不同以“网”,云端地球全新升级上线!
  16. 单位转换html代码,JavaScript实现长度单位转换
  17. 面向虚拟现实更新旧版 Unity* 游戏
  18. ICCV2017 论文浏览记录(转)
  19. 量子计算机是伪科学,科学网-量子力学一门无人理解也没用处的“伪科学”?-张林的博文...
  20. 智慧工业:RFID智能资产管理,RFID资产管理高效,便捷-新导智能

热门文章

  1. Flutter实现App功能引导页
  2. 完美的css背景图片全屏显示,能比例缩小,不留空白
  3. DNS 服务与邮件服务器应用--配置DNS正向解析与反向解析
  4. Java 输出1-100之内的所有质数
  5. JavaScript說分明
  6. 微信小程序开发大赛经验总结
  7. x-admin前端模板左侧菜单栏消除记忆功能(清除缓存)
  8. 利用html开发英语单词小程,前端利用表单标签自己制作一个简单的表单页面-表单制作-小程序表单制作...
  9. Python-shogun安装问题
  10. python 画隐函数图像 画三维显函数图像