sql server 关联

Association Rule Mining in SQL Server is the next article in our data mining article series in which we have discussed Naïve Bayes, Decision Trees, and Time Series until now. Association Rule Mining, also known as Market Basket Analysis, mainly because Association Mining is used to find out the items which are bought together by the customers during their shopping.

SQL Server中的关联规则挖掘是数据挖掘文章系列中的下一篇文章,到目前为止,我们已经讨论了朴素贝叶斯,决策树和时间序列。 关联规则挖掘,也称为市场篮分析,主要是因为关联挖掘用于找出客户在购物期间一起购买的商品。

The most popular Association Rule Mining example that you will find is the story at the supermarket chain in the US. It is said that they have found out that the customers that are buying beer will buy nappies for their kids. After this finding, management has taken a decision to move the beer palette close to the nappy palette. By doing so, of course, they were able to increase sales. In addition to the money, they were able to make their customers happy. Also, customers buying time was reduced and so the congestion in the supermarket. This means that the Association Rule Mining is helpful to users in many ways.

您会发现的最受欢迎的协会规则挖掘示例是美国超市连锁店的故事。 据说他们发现购买啤酒的顾客会为他们的孩子买尿布。 在发现之后,管理层已决定将啤酒板移至靠近尿布板的位置。 通过这样做,他们当然能够增加销售量。 除了金钱,他们还能够使客户满意。 此外,顾客的购买时间减少了,因此超市的交通拥堵。 这意味着关联规则挖掘在许多方面对用户有帮助。

Though Association Mining is always discussed with shopping, there are other possible areas of applications such as troubleshooting, medicine, and marketing, etc. In troubleshooting, by using Association Rule, you can diagnose what issues occur together. Also, in the domain of medicine, Association Rule will help to find out what types of the disease occur together. This means there are a lot of ways of utilizing the Association Rule in business.

尽管总是与购物讨论“联合挖掘”,但是还有其他可能的应用领域,例如故障排除,药品和市场营销等。在故障排除中,通过使用“联合规则”,您可以诊断出一起发生了什么问题。 同样,在医学领域,关联规则将有助于找出哪些疾病会同时发生。 这意味着有很多方法可以在业务中利用关联规则。

如何在SQL Server中使用关联规则挖掘 (How to Use Association Rule Mining in SQL Server)

This time, there is a small change to the SSAS data mining project to what we have done before. This is due to the fact that we will be using a couple of views in the AdventureWorksDW database, whereas we were using only one view in all previous examples. Those two views are vAssocSeqOrders and vAssocSeqLineItems. vAssocSeqOrders view has orders while the vAssocSeqLineItems view has order lines for the orders. Following screenshot shows sample data set in those two views:

这次,SSAS数据挖掘项目与我们之前所做的相比有很小的变化。 这是由于以下事实:我们将在AdventureWorksDW数据库中使用几个视图,而在前面的所有示例中我们仅使用一个视图。 这两个视图是vAssocSeqOrdersvAssocSeqLineItemsvAssocSeqOrders视图具有订单,而vAssocSeqLineItems视图具有订单的订单行。 以下屏幕截图显示了在这两个视图中的示例数据集:

If you carefully look at the above screenshot, for the order number SO61313, there are three order lines in the second view.

如果您仔细查看上面的屏幕截图,则订单号为SO61313 第二个视图中有3条订单行。

Let us open SQL Server Data Tools (SSDT) and create an SSAS project to set up the Association Rule Mining. Then create a data source pointing to the AdventureworksDW database as we did in the previous articles.

让我们打开SQL Server数据工具(SSDT)并创建一个SSAS项目以设置关联规则挖掘。 然后,像前面的文章中一样,创建一个指向AdventureworksDW数据库的数据源。

For the data source view, let us add the specified views as shown in the below screenshot:

对于数据源视图,让我们添加指定的视图,如下面的屏幕快照所示:

When these two views are added to the data source views, the relation between these two views is not added by default. This means you need to join those two views manually.

将这两个视图添加到数据源视图时,默认情况下不会添加这两个视图之间的关系。 这意味着您需要手动加入这两个视图。

Verify the relationship by double-clicking the arrow sign. Source and Destination tables should be as shown along with the OrderNumber column. If it is in reverse, click the Reverse button to change it.

通过双击箭头符号验证关系。 表和目标表应与OrderNumber列一起显示。 如果是反向,请单击“ 反向”按钮进行更改。

Next, we need to choose the case and the nested table. Up to now, we had to choose only the case table in our previous examples. However, in the Associating Rule Mining, since there are two views, we need to choose the case and the nested table, as shown in the below screenshot.

接下来,我们需要选择大小写和嵌套表。 到目前为止,在前面的示例中,我们只需要选择案例表。 但是,在关联规则挖掘中,由于有两个视图,因此我们需要选择大小写和嵌套表,如下面的屏幕快照所示。

vAssocSeqOrders view was chosen as the Case table where the vAssocSeqLineItems is chosen as the Nested table.

vAssocSeqOrders视图被选为Case表,而vAssocSeqLineItems被选为Nested表。

The objective of the Association Rule Mining is to find out what models are selling together. Therefore, the product model will be the input as well as the predict attribute. OrderNumber and the Model are the keys. You can see those selections as shown in the below screenshot:

关联规则挖掘的目的是找出一起出售的模型。 因此,产品模型将是输入以及预测属性。 OrderNumberModel是键。 您可以看到以下屏幕截图所示的选择:

After the Association Rule configuration is completed, then the model can be processed. Then users can review the prediction model and perform the predictions.

完成关联规则配置后,即可处理模型。 然后,用户可以查看预测模型并执行预测。

挖掘模型查看器 (Mining Model Viewer)

Let us view the data patterns from the Association Rule model, which was built before.

让我们从以前建立的关联规则模型中查看数据模式。

In the Mining Model viewer, there are three tabs to view the data patterns. In the Rules tab, it will show the rules that can be derived fro the Association Rule Mining model in the sample set.

在“挖掘模型”查看器中,有三个选项卡可用于查看数据模式。 在“ 规则”选项卡中,它将显示可以从样本集中的关联规则挖掘模型派生的规则。

The main part of the Rule tab is the rule grid, which displays the all qualified Association Rule Minings along with their probabilities and their importance. The importance score will tell how useful the rule is. If the importance score is high, most likely greater than 1, the rule is of higher quality.

规则选项卡的主要部分是规则网格,该网格显示所有合格的关联规则挖掘以及它们的概率和重要性。 重要性分数将告诉您该规则的实用性。 如果重要性分数很高,很可能大于1,则规则的质量较高。

In the above screenshot, the customer who buys LL Mountain Tire and Fender Set-Mountain will buy Mountain Tire Tube. Probability 1 means that it will be true always. The importance of this rule is 0.850.

在上面的屏幕截图中,购买LL Mountain TireFender Set-Mountain的客户将购买Mountain Tire Tube 。 概率1表示它永远都是真实的。 此规则的重要性为0.850。

The Minimum importance level can be set for a model before processing so that the processing performance can be improved.

可以在处理之前为模型设置最小重要性级别,以便可以提高处理性能。

The next tab is ItemSets table, which will display the frequent itemsets discovered from the Association Rule algorithm.

下一个选项卡是ItemSets表,该表将显示从关联规则算法中发现的频繁项目集。

Users can set the minimum support at this view as well as a model parameter so that the performance of the model process will be improved. Users can also select the minimum item set. In this example, minimum itemset size is set at 3, which means that three combinations of models are selected. Also, if needed, it is possible to put rules which consist of a specific item model by setting up filtering in Filter Itemset.

用户可以在此视图上设置最小支持以及模型参数,以便改善模型过程的性能。 用户还可以选择最小项目集。 在此示例中,最小项目集大小设置为3,这意味着选择了三种模型组合。 另外,如果需要,可以通过在Filter Itemset中设置过滤来放置由特定项目模型组成的规则。

In the above data set, Mountain Bottle Cage, Mountain-200, and Water Bottle are in 240 orders so that you can identify the frequency of the itemset combinations.

在以上数据集中, Mountain Bottle CageMountain-200Water Bottle的订购顺序为240,因此您可以识别物料集组合的频率。

The third tab in the model viewer, Dependency Network, graphically illustrates the relationship between the itemsets, as shown in the below screenshot.

模型查看器中的第三个选项卡“ 依赖关系网络”以图形方式说明了项目集之间的关系,如下面的屏幕快照所示。

If you analyze the above screenshot, you will see that Sport-100 will be bought by the customers who bought the Touring-1000, Touring-2000, Road-550-W and Half-Finger Gloves separately. There is another great finding that customers who buy Touring Tire will buy Touring Tire Tube and importantly, vice versa is also true. This is indicated by the arrow that points both ways.

如果您分析上面的屏幕截图,您将看到分别购买了Touring-1000,Touring-2000,Road-550-WHalf-Finger手套的客户将购买Sport-100 。 还有一个很棒的发现,那就是购买Touring Tire的客户会购买Touring Tire Tube ,重要的是,反之亦然。 双向指示的箭头指示了这一点。

If you click any node, those nodes will be highlighted with different colors, as shown in the below screenshot.

如果单击任何节点,这些节点将以不同的颜色突出显示,如下面的屏幕快照所示。

This will indicate different relations of the selected node with the other nodes in the relationship diagrams.

这将在关系图中指示所选节点与其他节点的不同关系。

型号参数 (Model Parameters)

Model parameters can be set so that the Association Rule Mining can be configured to improve performance and accuracy.

可以设置模型参数,以便可以配置关联规则挖掘以提高性能和准确性。

This can be set from the following dialog box.

可以从以下对话框中进行设置。

MAXIMUM_ITEMSET_COUNT (MAXIMUM_ITEMSET_COUNT)

The default value for this parameter is 200,000. This parameter defines how many predications will be generated.

此参数的默认值为200,000。 此参数定义将生成多少个谓词。

MAXIMUM_ITEMSET_SIZE (MAXIMUM_ITEMSET_SIZE)

This parameter defines the maximum number of itemsets. The default value for this parameter is 3. Reducing this number will reduce the model processing time.

此参数定义项目集的最大数量。 此参数的默认值为3。减少此数字将减少模型处理时间。

MAXIMUM_SUPPORT (MAXIMUM_SUPPORT)

This parameter defines the maximum support threshold of a frequent itemset. This parameter can be used to filter out those items that are too frequent, which is obvious. This parameter is available only in Enterprise edition.

此参数定义频繁项集的最大支持阈值。 此参数可用于过滤掉那些太频繁的项目,这是显而易见的。 此参数仅在企业版中可用。

MINIMUM_IMPORATANCE (MINIMUM_IMPORATANCE)

This threshold will filter out rules which are less than the defined parameter value. This parameter is available only in Enterprise edition.

此阈值将滤除小于定义的参数值的规则。 此参数仅在企业版中可用。

MINIMUM_ITEMSET_SIZE (MINIMUM_ITEMSET_SIZE)

This parameter defines the minimum number of itemsets. The default value for this parameter is 1. Reducing this number will not reduce the model processing time. This parameter is available only in Enterprise edition.

此参数定义项目集的最小数量。 此参数的默认值为1。减少此数字不会减少模型处理时间。 此参数仅在企业版中可用。

MINUMUM_PROBABILITY (MINUMUM_PROBABILITY)

This parameter specifies the minimum probability that a rule is true. For example, setting this value to 0.5 specifies that no rule with less than 50% probability is generated.

此参数指定规则为真的最小概率。 例如,将此值设置为0.5指定不会生成概率小于50%的规则。

MINIMUM_SUPPORT (MINIMUM_SUPPORT)

This parameter specifies the minimum number of cases that must contain the itemset before generating a rule. Setting this value to less than 1 specifies the minimum number of cases as a percentage of the total cases. Setting this value to a whole number greater than 1 specifies the minimum number of cases as the absolute number of cases that must contain the itemset. The algorithm may increase the value of this parameter if memory is limited.

此参数指定生成规则之前必须包含项目集的最小案例数。 将此值设置为小于1将指定最小案例数占案例总数的百分比。 将此值设置为大于1的整数将指定最小案例数作为必须包含项目集的绝对案例数。 如果内存有限,算法可能会增加此参数的值。

预测 (Prediction )

Predication is an important part of any data mining algorithm. Predication can be done from the following Mining Model Prediction tab, as shown below:

谓词是任何数据挖掘算法的重要组成部分。 可以从以下“ 挖掘模型预测”选项卡进行预测 ,如下所示:

The above screenshot shows how to predict what are the items which will be bought by the customers who had bought Water Bottle.

上面的屏幕截图显示了如何预测购买水瓶的客户将购买哪些物品。

结论 (Conclusion)

In conclusion, the Association Rule Mining method is an excellent way of finding associated items and optimizing article procurement and allocation in case of shopping applications.

总之,关联规则挖掘方法是查找关联项目并优化购物应用程序中商品采购和分配的一种极好的方法。

目录 (Table of contents)

Introduction to SQL Server Data Mining
Naive Bayes Prediction in SQL Server
Microsoft Decision Trees in SQL Server
Microsoft Time Series in SQL Server
Association Rule Mining in SQL Server
Microsoft Clustering in SQL Server
Microsoft Linear Regression in SQL Server
Implement Artificial Neural Networks (ANNs) in SQL Server
Implementing Sequence Clustering in SQL Server
Measuring the Accuracy in Data Mining in SQL Server
Data Mining Query in SSIS
Text Mining in SQL Server
SQL Server数据挖掘简介
SQL Server中的朴素贝叶斯预测
SQL Server中的Microsoft决策树
SQL Server中的Microsoft时间序列
SQL Server中的关联规则挖掘
SQL Server中的Microsoft群集
SQL Server中的Microsoft线性回归
在SQL Server中实现人工神经网络(ANN)
在SQL Server中实现序列聚类
在SQL Server中测量数据挖掘的准确性
SSIS中的数据挖掘查询
SQL Server中的文本挖掘

翻译自: https://www.sqlshack.com/the-association-rule-mining-in-sql-server/

sql server 关联

sql server 关联_SQL Server中的关联规则挖掘相关推荐

  1. sql server序列_SQL Server中的Microsoft时间序列

    sql server序列 The next topic in our Data Mining series is the popular algorithm, Time Series. Since b ...

  2. sql 实现决策树_SQL Server中的Microsoft决策树

    sql 实现决策树 Decision trees, one of the very popular data mining algorithm which is the next topic in o ...

  3. sql server 群集_SQL Server中的Microsoft群集

    sql server 群集 Microsoft Clustering is the next data mining topic we will be discussing in our SQL Se ...

  4. sql server 入门_SQL Server中的数据挖掘入门

    sql server 入门 介绍 (Introduction) In past chats, we have had a look at a myriad of different Business ...

  5. sql server序列_SQL Server中的序列对象功能

    sql server序列 序列介绍 (Introduction to Sequences) 序列是SQL Server 2012中引入的用于密钥生成机制的新对象. 它已在所有版本SQL Server ...

  6. sql数据透视_SQL Server中的数据科学:取消数据透视

    sql数据透视 In this article, in the series, we'll discuss understanding and preparing data by using SQL ...

  7. sql语句截断_SQL Server中SQL截断和SQL删除语句之间的区别

    sql语句截断 We get the requirement to remove the data from the relational SQL table. We can use both SQL ...

  8. sql server 加密_SQL Server 2016中的新功能–始终加密

    sql server 加密 There are many new features in SQL Server 2016, but the one we will focus on in this p ...

  9. sql server 内存_SQL Server内存性能指标–第5部分–了解惰性写入,空闲列表停顿/秒和待批内存授予

    sql server 内存 SQL Server performance metrics series with the SQL Server memory metrics that should b ...

最新文章

  1. [转]centos5.2用memcache 来作PHP 的session.save_handler
  2. python两台电脑文件传输_python实现简单socket程序在两台电脑之间传输消息的方法...
  3. NYOJ练习题 删除元素(二分查找)
  4. 一个计算机硕士毕业生的求职经验(二)
  5. vc 通过句柄修改窗口大小_漫画:对象是如何被找到的?句柄 OR 直接指针?
  6. 让Windows Server 2008 R2 SP1 的“网络发现”真正能发现和被发现
  7. 用opencsv文件读写CSV文件
  8. LiveGBS国标GB/T28181视频平台获取海康大华宇视摄像机设备通道视频流直播地址 HLS/HTTP-FLV/WS-FLV/WebRTC/RTMP/RTSP直播流地址示例
  9. 2018黑马39期WEB前端视频教程
  10. 【网络基础】以太网电缆:UTP和STP,直连线和交叉线,网线类别
  11. 【小程序源码】uni-app云开发的网盘助手抓取网盘资源
  12. JAVA版opencv透明,opencv 替换纯色背景为透明背景
  13. 并发(Concurrent) 与并行(Parallel) 的区别
  14. ubuntu卸载 mysql
  15. 信息技术计算机老师继续教育培训心得,信息技术课教师继续教育心得体会
  16. 新浪微博瘫痪近一小时无法登陆,现已恢复
  17. 三十六亿的《哪吒》历时五年,动画创作难如何解决?
  18. 本科毕业论文专家评阅意见汇总
  19. 超过了 PCH 的虚拟内存范围;请使用“-Zm120”或更大的命令行选项重新编译
  20. ISO9001质量管理体系认证 ISO质量管理体系认证

热门文章

  1. 《构建之法》读后感二
  2. Go语言学习笔记——Go语言数据类型
  3. Docker下安装GitLab
  4. c++如何解决大数组栈内存不够的问题
  5. AES加密,解决了同步问题,和随机密钥和固定密钥,多端通信加密不一致解决办法...
  6. JavaScriptDom操作与高级应用(八)
  7. Linux内核分析——进程的描述和进程的创建
  8. 软件工程 speedsnail 第二次冲刺1次
  9. SILVERLIGHT访问WCF时通过WEB.CONFIG 指定服务器地址
  10. ant-Design------select的option 随页面滚动的问题