先验算法(Apriori Algorithm)是关联规则学习的经典算法之一。先验算法的设计目的是为了处理包含交易信息内容的数据库(例如,顾客购买的商品清单,或者网页常访清单。)而其他的算法则是设计用来寻找无交易信息(如Winepi算法和Minepi算法)或无时间标记(如DNA测序)的数据之间的联系规则。关联分析的目的是从大规模数据集中寻找有趣关系的任务。这些关系可以有两种形式:频繁项集或者关联规则。频繁项集(frequent item sets)是指经常出现在一起的物品的集合,关联关系(association rules)暗示两种物品之间可能存在很强的关系。

先验算法采用广度优先搜索算法进行搜索并采用树结构来对候选项目集进行高效计数。它通过长度为 k-1的候选项目集来产生长度为k的候选项目集,然后从中删除包含不常见子模式的候选项。根据向下封闭性引理,该候选项目集包含所有长度为  k的频繁项目集。之后,就可以通过扫描交易数据库来决定候选项目集中的频繁项目集。

from __future__ import division, print_function
import numpy as np
import itertoolsclass Rule():def __init__(self, antecedent, concequent, confidence, support):self.antecedent = antecedentself.concequent = concequentself.confidence = confidenceself.support = supportclass Apriori():"""A method for determining frequent itemsets in a transactional database andalso for generating rules for those itemsets. Parameters:-----------min_sup: floatThe minimum fraction of transactions an itemets needs tooccur in to be deemed frequentmin_conf: float:The minimum fraction of times the antecedent needs to implythe concequent to justify rule"""def __init__(self, min_sup=0.3, min_conf=0.81):self.min_sup = min_supself.min_conf = min_confself.freq_itemsets = None       # List of freqeuent itemsetsself.transactions = None        # List of transactionsdef _calculate_support(self, itemset):count = 0for transaction in self.transactions:if self._transaction_contains_items(transaction, itemset):count += 1support = count / len(self.transactions)return supportdef _get_frequent_itemsets(self, candidates):""" Prunes the candidates that are not frequent => returns list with only frequent itemsets """frequent = []# Find frequent itemsfor itemset in candidates:support = self._calculate_support(itemset)if support >= self.min_sup:frequent.append(itemset)return frequentdef _has_infrequent_itemsets(self, candidate):""" True or false depending on the candidate has anysubset with size k - 1 that is not in the frequent itemset """k = len(candidate)# Find all combinations of size k-1 in candidate# E.g [1,2,3] => [[1,2],[1,3],[2,3]]subsets = list(itertools.combinations(candidate, k - 1))for t in subsets:# t - is tuple. If size == 1 get the elementsubset = list(t) if len(t) > 1 else t[0]if not subset in self.freq_itemsets[-1]:return Truereturn Falsedef _generate_candidates(self, freq_itemset):""" Joins the elements in the frequent itemset and prunesresulting sets if they contain subsets that have been determinedto be infrequent. """candidates = []for itemset1 in freq_itemset:for itemset2 in freq_itemset:# Valid if every element but the last are the same# and the last element in itemset1 is smaller than the last# in itemset2valid = Falsesingle_item = isinstance(itemset1, int)if single_item and itemset1 < itemset2:valid = Trueelif not single_item and np.array_equal(itemset1[:-1], itemset2[:-1]) and itemset1[-1] < itemset2[-1]:valid = Trueif valid:# JOIN: Add the last element in itemset2 to itemset1 to# create a new candidateif single_item:candidate = [itemset1, itemset2]else:candidate = itemset1 + [itemset2[-1]]# PRUNE: Check if any subset of candidate have been determined# to be infrequentinfrequent = self._has_infrequent_itemsets(candidate)if not infrequent:candidates.append(candidate)return candidatesdef _transaction_contains_items(self, transaction, items):""" True or false depending on each item in the itemset isin the transaction """# If items is in fact only one itemif isinstance(items, int):return items in transaction# Iterate through list of items and make sure that# all items are in the transactionfor item in items:if not item in transaction:return Falsereturn Truedef find_frequent_itemsets(self, transactions):""" Returns the set of frequent itemsets in the list of transactions """self.transactions = transactions# Get all unique items in the transactionsunique_items = set(item for transaction in self.transactions for item in transaction)# Get the frequent itemsself.freq_itemsets = [self._get_frequent_itemsets(unique_items)]while(True):# Generate new candidates from last added frequent itemsetscandidates = self._generate_candidates(self.freq_itemsets[-1])# Get the frequent itemsets among those candidatesfrequent_itemsets = self._get_frequent_itemsets(candidates)# If there are no frequent itemsets we're doneif not frequent_itemsets:break# Add them to the total list of frequent itemsets and start overself.freq_itemsets.append(frequent_itemsets)# Flatten the array and return every frequent itemsetfrequent_itemsets = [itemset for sublist in self.freq_itemsets for itemset in sublist]return frequent_itemsetsdef _rules_from_itemset(self, initial_itemset, itemset):""" Recursive function which returns the rules where confidence >= min_confidenceStarts with large itemset and recursively explores rules for subsets """rules = []k = len(itemset)# Get all combinations of sub-itemsets of size k - 1 from itemset# E.g [1,2,3] => [[1,2],[1,3],[2,3]]subsets = list(itertools.combinations(itemset, k - 1))support = self._calculate_support(initial_itemset)for antecedent in subsets:# itertools.combinations returns tuples => convert to listantecedent = list(antecedent)antecedent_support = self._calculate_support(antecedent)# Calculate the confidence as sup(A and B) / sup(B), if antecedent# is B in an itemset of A and Bconfidence = float("{0:.2f}".format(support / antecedent_support))if confidence >= self.min_conf:# The concequent is the initial_itemset except for antecedentconcequent = [itemset for itemset in initial_itemset if not itemset in antecedent]# If single item => get itemif len(antecedent) == 1:antecedent = antecedent[0]if len(concequent) == 1:concequent = concequent[0]# Create new rulerule = Rule(antecedent=antecedent,concequent=concequent,confidence=confidence,support=support)rules.append(rule)# If there are subsets that could result in rules# recursively add rules from subsetsif k - 1 > 1:rules += self._rules_from_itemset(initial_itemset, antecedent)return rulesdef generate_rules(self, transactions):self.transactions = transactionsfrequent_itemsets = self.find_frequent_itemsets(transactions)# Only consider itemsets of size >= 2 itemsfrequent_itemsets = [itemset for itemset in frequent_itemsets if not isinstance(itemset, int)]rules = []for itemset in frequent_itemsets:rules += self._rules_from_itemset(itemset, itemset)# Remove empty valuesreturn rules

先验算法(Apriori Algorithm)原理及python代码实现相关推荐

  1. 随机森林分类算法python代码_随机森林的原理及Python代码实现

    原标题:随机森林的原理及Python代码实现 最近在做kaggle的时候,发现随机森林这个算法在分类问题上效果十分的好,大多数情况下效果远要比svm,log回归,knn等算法效果好.因此想琢磨琢磨这个 ...

  2. 弗雷歇距离的原理及python代码实现(动态规划)

    弗雷歇距离的原理及python代码实现(动态规划) 在网上看了很多关于弗雷歇距离的介绍,结合自己的理解,出一版更通俗易懂.更清晰具体的解释. 最简单的解释自然是最短狗绳长度,但我将从另一个角度来解释它 ...

  3. 【综合评价分析】熵权算法确定权重 原理+完整MATLAB代码+详细注释+操作实列

    [综合评价分析]熵权算法确定权重 原理+完整MATLAB代码+详细注释+操作实列 文章目录 1. 熵权法确定指标权重 (1)构造评价矩阵 Ymn (2)评价矩阵标准化处理 (3)计算指标信息熵值 Mj ...

  4. 【负荷预测】基于灰色预测算法的负荷预测(Python代码实现)

    目录 1 概述 2 流程图 3 入门算例  4 基于灰色预测算法的负荷预测(Python代码实现) 1 概述 "由于数据列的离散性,信息时区内将出现空集(不包含信息的定时区),因此只能按近似 ...

  5. CRC爆破png图片宽度和高度原理以及python代码

    CRC爆破png图片宽度和高度原理以及python代码 文章目录 CRC爆破png图片宽度和高度原理以及python代码 1.PNG图片的格式 2.CRC 3.Python爆破图片宽度和高度 参考文章 ...

  6. 联邦学习算法介绍-FedAvg详细案例-Python代码获取

    联邦学习算法介绍-FedAvg详细案例-Python代码获取 一.联邦学习系统框架 二.联邦平均算法(FedAvg) 三.联邦随梯度下降算法 (FedSGD) 四.差分隐私随联邦梯度下降算法 (DP- ...

  7. 模式识别中利用二型势函数法求解非线性分类器的原理以及python代码实现

    前言 在学校的深度学习实验过程中,由于要求实现非线性分类器的势函数实验中给出的例程代码是matlab语言的(ps:自从电脑被偷了以后,新电脑没有装matlab),我便想到用python实现非线性分类器 ...

  8. apriori算法代码python_Apriori算法原理及Python代码

    一.Apriori算法原理 参考:Python --深入浅出Apriori关联分析算法(一)​www.cnblogs.com 二.在Python中使用Apriori算法 查看Apriori算法的帮助文 ...

  9. 无监督学习-关联分析apriori原理与python代码

    关联分析是一种无监督学习,它的目标就是从大数据中找出那些经常一起出现的东西,不管是商品还是其他什么 item,然后靠这些结果总结出关联规则以用于后续的商业目的或者其他项目需求. 关联分析原理 那么这里 ...

最新文章

  1. C 标准库 - ctype.h
  2. 直接拿来用!CTO 创业技术栈指南
  3. windows 2008+Oracle 11g R2 故障转移群集配置
  4. SecureCRT退格键需要出现^H
  5. MaxCompute问答整理之2020-03月
  6. 用计算机写文章 单元备课,信息技术第一单元单元备课精要.doc
  7. 1.3 编程基础之算术表达式与顺序执行 08 温度表达转化
  8. 无心剑中译叶芝诗17首
  9. 马斯克的星链计划对互联网有哪些影响?
  10. 快读快写:读入输出优化
  11. mui ajax的值php怎样获取,MUI.ajax是怎么获取数据的
  12. 简单介绍Hadoop实操
  13. Win7下Hyenae的安装
  14. —— GPS测量原理及应用复习-4 ——
  15. windows7内存诊断工具有用吗_用Windows内存诊断工具检查内存可靠性
  16. Smartbi的安装部署
  17. ArcGIS Engine二次开发常用接口及其使用方法(一)
  18. HTML音频:音乐播放网页
  19. DNS 服务器 清除缓存
  20. 织梦采集-织梦采集教程-织梦采集插件下载教程

热门文章

  1. 将年月日时分秒转成年月日
  2. 一种新型双频双圆极化微带天线的设计
  3. 常规密码学加解密脚本(python)
  4. 大数据有哪些工作?岗位技能要求汇总
  5. hive插件 ranger_hive 整合ranger
  6. 前端构建工具是什么?
  7. python 图片识别服装_[Python设计模式] 第6章 衣服搭配系统——装饰模式
  8. Camstar CDO增加自定义字段
  9. Camstar MES 5.8 發現Ajax事件失效
  10. 未来三大主机谁将占领游戏的主战场