apriori关联规则

pip install mlxtend  #注意在jupyter里面操作要加！

The following command must be run outside of the IPython shell:$ pip install mlxtendThe Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.See the Python documentation for more information on how to install packages:https://docs.python.org/3/installing/

!pip install mlxtend

Collecting mlxtendDownloading https://files.pythonhosted.org/packages/86/30/781c0b962a70848db83339567ecab656638c62f05adb064cb33c0ae49244/mlxtend-0.18.0-py2.py3-none-any.whl (1.3MB)
Collecting scipy>=1.2.1 (from mlxtend)Downloading https://files.pythonhosted.org/packages/e1/8b/d05bd3bcd0057954f08f61472db95f4ac71c3f0bf5432abe651694025396/scipy-1.6.3-cp37-cp37m-win_amd64.whl (32.6MB)
Collecting scikit-learn>=0.20.3 (from mlxtend)Downloading https://files.pythonhosted.org/packages/33/ac/98a9c3f4b6e810c45196f6e15e04f9d83fe3d6000eebbb74dfd084446432/scikit_learn-0.24.2-cp37-cp37m-win_amd64.whl (6.8MB)
Collecting joblib>=0.13.2 (from mlxtend)Downloading https://files.pythonhosted.org/packages/55/85/70c6602b078bd9e6f3da4f467047e906525c355a4dacd4f71b97a35d9897/joblib-1.0.1-py3-none-any.whl (303kB)
Requirement already satisfied: matplotlib>=3.0.0 in c:\programdata\anaconda3\lib\site-packages (from mlxtend) (3.0.2)
Collecting pandas>=0.24.2 (from mlxtend)Downloading https://files.pythonhosted.org/packages/74/8c/9cf2e5304f4466dbc759a799b97bfd75cd3dc93b00d49558ca93bfc29173/pandas-1.2.4-cp37-cp37m-win_amd64.whl (9.1MB)
Requirement already satisfied: setuptools in c:\programdata\anaconda3\lib\site-packages (from mlxtend) (40.6.3)
Collecting numpy>=1.16.2 (from mlxtend)Downloading https://files.pythonhosted.org/packages/ce/de/0ed39fd77c5584cd9e44b4305ee4444ea7af1b38d4d71734ae684fc14184/numpy-1.20.3-cp37-cp37m-win_amd64.whl (13.6MB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn>=0.20.3->mlxtend)Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (1.0.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.3.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.7.5)
Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.24.2->mlxtend) (2018.7)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=3.0.0->mlxtend) (1.12.0)
Installing collected packages: numpy, scipy, joblib, threadpoolctl, scikit-learn, pandas, mlxtendFound existing installation: numpy 1.15.4Uninstalling numpy-1.15.4:Successfully uninstalled numpy-1.15.4Found existing installation: scipy 1.1.0Uninstalling scipy-1.1.0:Successfully uninstalled scipy-1.1.0Found existing installation: joblib 0.13.0Uninstalling joblib-0.13.0:Successfully uninstalled joblib-0.13.0Found existing installation: scikit-learn 0.20.1Uninstalling scikit-learn-0.20.1:Successfully uninstalled scikit-learn-0.20.1Found existing installation: pandas 0.23.4Uninstalling pandas-0.23.4:Successfully uninstalled pandas-0.23.4
Successfully installed joblib-1.0.1 mlxtend-0.18.0 numpy-1.20.3 pandas-1.2.4 scikit-learn-0.24.2 scipy-1.6.3 threadpoolctl-2.1.0

import pandas as pd
item_list = [['牛奶','面包'],['面包','尿布','啤酒','土豆'],['牛奶','尿布','啤酒','可乐'],['面包','牛奶','尿布','啤酒'],['面包','牛奶','尿布','可乐']]
item_df = pd.DataFrame(item_list)

from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
df_tf = te.fit_transform(item_list)
df = pd.DataFrame(df_tf,columns=te.columns_)
print(df)     #数据格式处理，传入模型的数据需要满足bool值的格式

      可乐     啤酒     土豆     尿布     牛奶     面包
0  False  False  False  False   True   True
1  False   True   True   True  False   True
2   True   True  False   True   True  False
3  False   True  False   True   True   True
4   True  False  False   True   True   True

from mlxtend.frequent_patterns import apriori

# use_colnames=True表示使用元素名字，默认的False使用列名代表元素, 设置最小支持度min_support
frequent_itemsets = apriori(df, min_support=0.05, use_colnames=True)
frequent_itemsets.sort_values(by='support', ascending=False, inplace=True)
# 选择2频繁项集
print(frequent_itemsets[frequent_itemsets.itemsets.apply(lambda x: len(x)) == 2])

    support  itemsets
17      0.6  (面包, 尿布)
18      0.6  (面包, 牛奶)
11      0.6  (啤酒, 尿布)
16      0.6  (牛奶, 尿布)
13      0.4  (面包, 啤酒)
7       0.4  (可乐, 尿布)
12      0.4  (啤酒, 牛奶)
8       0.4  (可乐, 牛奶)
14      0.2  (尿布, 土豆)
6       0.2  (可乐, 啤酒)
15      0.2  (面包, 土豆)
9       0.2  (面包, 可乐)
10      0.2  (啤酒, 土豆)

#计算关联规则
# metric可以有很多的度量选项，返回的表列名都可以作为参数
from mlxtend.frequent_patterns import association_rules
association_rule = association_rules(frequent_itemsets,metric='confidence',min_threshold=0.9)
#关联规则可以提升度排序
association_rule.sort_values(by='lift',ascending=False,inplace=True)
association_rule
# 规则是：antecedents->consequents

	antecedents	consequents	antecedent support	consequent support	support	confidence	lift	leverage	conviction
15	(土豆)	(面包, 啤酒, 尿布)	0.2	0.4	0.2	1.0	2.500000	0.12	inf
30	(土豆)	(面包, 啤酒)	0.2	0.4	0.2	1.0	2.500000	0.12	inf
12	(土豆, 尿布)	(面包, 啤酒)	0.2	0.4	0.2	1.0	2.500000	0.12	inf
24	(土豆)	(面包, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
36	(土豆)	(啤酒)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
21	(可乐, 啤酒)	(牛奶, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
25	(土豆, 尿布)	(啤酒)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
5	(可乐)	(牛奶, 尿布)	0.4	0.6	0.4	1.0	1.666667	0.16	inf
18	(可乐, 面包)	(牛奶, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
27	(土豆)	(啤酒, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
9	(面包, 土豆, 尿布)	(啤酒)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
28	(面包, 土豆)	(啤酒)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
13	(面包, 土豆)	(啤酒, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
14	(啤酒, 土豆)	(面包, 尿布)	0.2	0.6	0.2	1.0	1.666667	0.08	inf
26	(啤酒, 土豆)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
0	(啤酒)	(尿布)	0.6	0.8	0.6	1.0	1.250000	0.12	inf
23	(面包, 土豆)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
31	(土豆)	(面包)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
32	(可乐, 面包)	(牛奶)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
33	(可乐, 面包)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
34	(可乐, 啤酒)	(牛奶)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
35	(可乐, 啤酒)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
29	(啤酒, 土豆)	(面包)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
19	(可乐, 啤酒, 牛奶)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
22	(土豆, 尿布)	(面包)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
20	(可乐, 啤酒, 尿布)	(牛奶)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
1	(面包, 啤酒)	(尿布)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
17	(可乐, 面包, 尿布)	(牛奶)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
16	(可乐, 面包, 牛奶)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
11	(面包, 啤酒, 土豆)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
10	(啤酒, 土豆, 尿布)	(面包)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
8	(土豆)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf
7	(可乐)	(牛奶)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
6	(可乐)	(尿布)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
4	(可乐, 尿布)	(牛奶)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
3	(可乐, 牛奶)	(尿布)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
2	(啤酒, 牛奶)	(尿布)	0.4	0.8	0.4	1.0	1.250000	0.08	inf
37	(面包, 啤酒, 牛奶)	(尿布)	0.2	0.8	0.2	1.0	1.250000	0.04	inf

apriori关联规则相关推荐

Python --深入浅出Apriori关联分析算法（二） Apriori关联规则实战
上一篇我们讲了关联分析的几个概念,支持度,置信度,提升度.以及如何利用Apriori算法高效地根据物品的支持度找出所有物品的频繁项集. Python --深入浅出Apriori关联分析算法(一) 这次 ...
关联规则挖掘算法_基于Apriori关联规则的协同过滤算法
Apriori 算法 apriori关联规则算法的原理设计较为简单,著名的"啤酒和尿布"说的就是Apriori算法,通俗来讲apriori旨在寻找频繁项集,以帮助商家将消费者有可能 ...
python关联规则apriori算法_Python --深入浅出Apriori关联分析算法（二） Apriori关联规则实战...
上一篇我们讲了关联分析的几个概念,支持度,置信度,提升度.以及如何利用Apriori算法高效地根据物品的支持度找出所有物品的频繁项集. 这次呢,我们会在上次的基础上,讲讲如何分析物品的关联规则得出关联 ...
Apriori关联规则挖掘算法函数
假设有以下<超市商品购买.txt>数据集,每行代表一个顾客在超市的购买记录: I1: 西红柿.排骨.鸡蛋.毛巾.水果刀 I2: 西红柿.茄子.水果刀.香蕉 I3: 鸡蛋.袜子.毛巾.肥皂. ...
使用Apriori关联规则算法实现购物篮分析
Apriori算法是一种挖掘关联规则的频繁项集算法,其核心思想是通过候选集生成和情节的向下封闭检测两个阶段来挖掘频繁项集,而且算法已经被广泛的应用到商业,网络安全等各个领域. 购物篮分析是通过发视频顾 ...
apriori算法代码_sklearn(九)apriori 关联规则算法,以及FP-growth 算法
是什么: apriori算法是第一个关联规则挖掘算法,利用逐层搜索的迭代方法找出数据库中的项集(项的集合)的关系,以形成规则,其过程由连接(类矩阵运算)与剪枝(去掉没必要的中间结果)组成.是一种挖掘关 ...
Apriori关联规则算法实现及其原理（基础篇）
概念定义一:设I={i1,i2,-,im},是m个不同的项目的集合,每个ik称为一个项目.项目的集合I称为项集.其元素的个数称为项集的长度,长度为k的项集称为k-项集.引例中每个商品就是一个项目,项 ...
《Python数据分析与挖掘实战》第8章——Apriori关联规则
本文是基于<Python数据分析与挖掘实战>的实战部分的第八章的数据--<中医证型关联规则挖掘>做的分析. 旨在补充原文中的细节代码,并给出文中涉及到的内容的完整代码. 主要有 ...
利用python实现Apriori关联规则算法
关联规则大家可能听说过用于宣传数据挖掘的一个案例:啤酒和尿布:据说是沃尔玛超市在分析顾客的购买记录时,发现许多客户购买啤酒的同时也会购买婴儿尿布,于是超市调整了啤酒和尿布的货架摆放,让这两个品类摆放 ...
Apriori 关联规则算法
通过Apriori算法,我们可以对数据进行关联分析,能够在大量的数据中找出数据间有趣的关系.那这个关系怎么找呢.一是根据支持度找出频繁项集,二是根据置信度产生关联规则.频繁项集指在物品经常一起出现的. ...

apriori关联规则

apriori关联规则相关推荐

最新文章

热门文章