sql server 关联_SQL Server中的关联规则挖掘

sql server 关联

Association Rule Mining in SQL Server is the next article in our data mining article series in which we have discussed Naïve Bayes, Decision Trees, and Time Series until now. Association Rule Mining, also known as Market Basket Analysis, mainly because Association Mining is used to find out the items which are bought together by the customers during their shopping.

SQL Server中的关联规则挖掘是数据挖掘文章系列中的下一篇文章，到目前为止，我们已经讨论了朴素贝叶斯，决策树和时间序列。关联规则挖掘，也称为市场篮分析，主要是因为关联挖掘用于找出客户在购物期间一起购买的商品。

The most popular Association Rule Mining example that you will find is the story at the supermarket chain in the US. It is said that they have found out that the customers that are buying beer will buy nappies for their kids. After this finding, management has taken a decision to move the beer palette close to the nappy palette. By doing so, of course, they were able to increase sales. In addition to the money, they were able to make their customers happy. Also, customers buying time was reduced and so the congestion in the supermarket. This means that the Association Rule Mining is helpful to users in many ways.

您会发现的最受欢迎的协会规则挖掘示例是美国超市连锁店的故事。据说他们发现购买啤酒的顾客会为他们的孩子买尿布。在发现之后，管理层已决定将啤酒板移至靠近尿布板的位置。通过这样做，他们当然能够增加销售量。除了金钱，他们还能够使客户满意。此外，顾客的购买时间减少了，因此超市的交通拥堵。这意味着关联规则挖掘在许多方面对用户有帮助。

Though Association Mining is always discussed with shopping, there are other possible areas of applications such as troubleshooting, medicine, and marketing, etc. In troubleshooting, by using Association Rule, you can diagnose what issues occur together. Also, in the domain of medicine, Association Rule will help to find out what types of the disease occur together. This means there are a lot of ways of utilizing the Association Rule in business.

尽管总是与购物讨论“联合挖掘”，但是还有其他可能的应用领域，例如故障排除，药品和市场营销等。在故障排除中，通过使用“联合规则”，您可以诊断出一起发生了什么问题。同样，在医学领域，关联规则将有助于找出哪些疾病会同时发生。这意味着有很多方法可以在业务中利用关联规则。

如何在SQL Server中使用关联规则挖掘 (How to Use Association Rule Mining in SQL Server)

This time, there is a small change to the SSAS data mining project to what we have done before. This is due to the fact that we will be using a couple of views in the AdventureWorksDW database, whereas we were using only one view in all previous examples. Those two views are vAssocSeqOrders and vAssocSeqLineItems. vAssocSeqOrders view has orders while the vAssocSeqLineItems view has order lines for the orders. Following screenshot shows sample data set in those two views:

这次，SSAS数据挖掘项目与我们之前所做的相比有很小的变化。这是由于以下事实：我们将在AdventureWorksDW数据库中使用几个视图，而在前面的所有示例中我们仅使用一个视图。这两个视图是vAssocSeqOrders和vAssocSeqLineItems 。 vAssocSeqOrders视图具有订单，而vAssocSeqLineItems视图具有订单的订单行。以下屏幕截图显示了在这两个视图中的示例数据集：

If you carefully look at the above screenshot, for the order number SO61313, there are three order lines in the second view.

如果您仔细查看上面的屏幕截图，则订单号为SO61313 ，第二个视图中有3条订单行。

Let us open SQL Server Data Tools (SSDT) and create an SSAS project to set up the Association Rule Mining. Then create a data source pointing to the AdventureworksDW database as we did in the previous articles.

让我们打开SQL Server数据工具（SSDT）并创建一个SSAS项目以设置关联规则挖掘。然后，像前面的文章中一样，创建一个指向AdventureworksDW数据库的数据源。

For the data source view, let us add the specified views as shown in the below screenshot:

对于数据源视图，让我们添加指定的视图，如下面的屏幕快照所示：

When these two views are added to the data source views, the relation between these two views is not added by default. This means you need to join those two views manually.

将这两个视图添加到数据源视图时，默认情况下不会添加这两个视图之间的关系。这意味着您需要手动加入这两个视图。

Verify the relationship by double-clicking the arrow sign. Source and Destination tables should be as shown along with the OrderNumber column. If it is in reverse, click the Reverse button to change it.

通过双击箭头符号验证关系。源表和目标表应与OrderNumber列一起显示。如果是反向，请单击“ 反向”按钮进行更改。

Next, we need to choose the case and the nested table. Up to now, we had to choose only the case table in our previous examples. However, in the Associating Rule Mining, since there are two views, we need to choose the case and the nested table, as shown in the below screenshot.

接下来，我们需要选择大小写和嵌套表。到目前为止，在前面的示例中，我们只需要选择案例表。但是，在关联规则挖掘中，由于有两个视图，因此我们需要选择大小写和嵌套表，如下面的屏幕快照所示。

vAssocSeqOrders view was chosen as the Case table where the vAssocSeqLineItems is chosen as the Nested table.

vAssocSeqOrders视图被选为Case表，而vAssocSeqLineItems被选为Nested表。

The objective of the Association Rule Mining is to find out what models are selling together. Therefore, the product model will be the input as well as the predict attribute. OrderNumber and the Model are the keys. You can see those selections as shown in the below screenshot:

关联规则挖掘的目的是找出一起出售的模型。因此，产品模型将是输入以及预测属性。 OrderNumber和Model是键。您可以看到以下屏幕截图所示的选择：

After the Association Rule configuration is completed, then the model can be processed. Then users can review the prediction model and perform the predictions.

完成关联规则配置后，即可处理模型。然后，用户可以查看预测模型并执行预测。

挖掘模型查看器 (Mining Model Viewer)

Let us view the data patterns from the Association Rule model, which was built before.

让我们从以前建立的关联规则模型中查看数据模式。

In the Mining Model viewer, there are three tabs to view the data patterns. In the Rules tab, it will show the rules that can be derived fro the Association Rule Mining model in the sample set.

在“挖掘模型”查看器中，有三个选项卡可用于查看数据模式。在“ 规则”选项卡中，它将显示可以从样本集中的关联规则挖掘模型派生的规则。

The main part of the Rule tab is the rule grid, which displays the all qualified Association Rule Minings along with their probabilities and their importance. The importance score will tell how useful the rule is. If the importance score is high, most likely greater than 1, the rule is of higher quality.

规则选项卡的主要部分是规则网格，该网格显示所有合格的关联规则挖掘以及它们的概率和重要性。重要性分数将告诉您该规则的实用性。如果重要性分数很高，很可能大于1，则规则的质量较高。

In the above screenshot, the customer who buys LL Mountain Tire and Fender Set-Mountain will buy Mountain Tire Tube. Probability 1 means that it will be true always. The importance of this rule is 0.850.

在上面的屏幕截图中，购买LL Mountain Tire和Fender Set-Mountain的客户将购买Mountain Tire Tube 。概率1表示它永远都是真实的。此规则的重要性为0.850。

The Minimum importance level can be set for a model before processing so that the processing performance can be improved.

可以在处理之前为模型设置最小重要性级别，以便可以提高处理性能。

The next tab is ItemSets table, which will display the frequent itemsets discovered from the Association Rule algorithm.

下一个选项卡是ItemSets表，该表将显示从关联规则算法中发现的频繁项目集。

Users can set the minimum support at this view as well as a model parameter so that the performance of the model process will be improved. Users can also select the minimum item set. In this example, minimum itemset size is set at 3, which means that three combinations of models are selected. Also, if needed, it is possible to put rules which consist of a specific item model by setting up filtering in Filter Itemset.

用户可以在此视图上设置最小支持以及模型参数，以便改善模型过程的性能。用户还可以选择最小项目集。在此示例中，最小项目集大小设置为3，这意味着选择了三种模型组合。另外，如果需要，可以通过在Filter Itemset中设置过滤来放置由特定项目模型组成的规则。

In the above data set, Mountain Bottle Cage, Mountain-200, and Water Bottle are in 240 orders so that you can identify the frequency of the itemset combinations.

在以上数据集中， Mountain Bottle Cage ， Mountain-200和Water Bottle的订购顺序为240，因此您可以识别物料集组合的频率。

The third tab in the model viewer, Dependency Network, graphically illustrates the relationship between the itemsets, as shown in the below screenshot.

模型查看器中的第三个选项卡“ 依赖关系网络”以图形方式说明了项目集之间的关系，如下面的屏幕快照所示。

If you analyze the above screenshot, you will see that Sport-100 will be bought by the customers who bought the Touring-1000, Touring-2000, Road-550-W and Half-Finger Gloves separately. There is another great finding that customers who buy Touring Tire will buy Touring Tire Tube and importantly, vice versa is also true. This is indicated by the arrow that points both ways.

如果您分析上面的屏幕截图，您将看到分别购买了Touring-1000，Touring-2000，Road-550-W和Half-Finger手套的客户将购买Sport-100 。还有一个很棒的发现，那就是购买Touring Tire的客户会购买Touring Tire Tube ，重要的是，反之亦然。双向指示的箭头指示了这一点。

If you click any node, those nodes will be highlighted with different colors, as shown in the below screenshot.

如果单击任何节点，这些节点将以不同的颜色突出显示，如下面的屏幕快照所示。

This will indicate different relations of the selected node with the other nodes in the relationship diagrams.

这将在关系图中指示所选节点与其他节点的不同关系。

型号参数 (Model Parameters)

Model parameters can be set so that the Association Rule Mining can be configured to improve performance and accuracy.

可以设置模型参数，以便可以配置关联规则挖掘以提高性能和准确性。

This can be set from the following dialog box.

可以从以下对话框中进行设置。

MAXIMUM_ITEMSET_COUNT (MAXIMUM_ITEMSET_COUNT)

The default value for this parameter is 200,000. This parameter defines how many predications will be generated.

此参数的默认值为200,000。此参数定义将生成多少个谓词。

MAXIMUM_ITEMSET_SIZE (MAXIMUM_ITEMSET_SIZE)

This parameter defines the maximum number of itemsets. The default value for this parameter is 3. Reducing this number will reduce the model processing time.

此参数定义项目集的最大数量。此参数的默认值为3。减少此数字将减少模型处理时间。

MAXIMUM_SUPPORT (MAXIMUM_SUPPORT)

This parameter defines the maximum support threshold of a frequent itemset. This parameter can be used to filter out those items that are too frequent, which is obvious. This parameter is available only in Enterprise edition.

此参数定义频繁项集的最大支持阈值。此参数可用于过滤掉那些太频繁的项目，这是显而易见的。此参数仅在企业版中可用。

MINIMUM_IMPORATANCE (MINIMUM_IMPORATANCE)

This threshold will filter out rules which are less than the defined parameter value. This parameter is available only in Enterprise edition.

此阈值将滤除小于定义的参数值的规则。此参数仅在企业版中可用。

MINIMUM_ITEMSET_SIZE (MINIMUM_ITEMSET_SIZE)

This parameter defines the minimum number of itemsets. The default value for this parameter is 1. Reducing this number will not reduce the model processing time. This parameter is available only in Enterprise edition.

此参数定义项目集的最小数量。此参数的默认值为1。减少此数字不会减少模型处理时间。此参数仅在企业版中可用。

MINUMUM_PROBABILITY (MINUMUM_PROBABILITY)

This parameter specifies the minimum probability that a rule is true. For example, setting this value to 0.5 specifies that no rule with less than 50% probability is generated.

此参数指定规则为真的最小概率。例如，将此值设置为0.5指定不会生成概率小于50％的规则。

MINIMUM_SUPPORT (MINIMUM_SUPPORT)

This parameter specifies the minimum number of cases that must contain the itemset before generating a rule. Setting this value to less than 1 specifies the minimum number of cases as a percentage of the total cases. Setting this value to a whole number greater than 1 specifies the minimum number of cases as the absolute number of cases that must contain the itemset. The algorithm may increase the value of this parameter if memory is limited.

此参数指定生成规则之前必须包含项目集的最小案例数。将此值设置为小于1将指定最小案例数占案例总数的百分比。将此值设置为大于1的整数将指定最小案例数作为必须包含项目集的绝对案例数。如果内存有限，算法可能会增加此参数的值。

预测 (Prediction )

Predication is an important part of any data mining algorithm. Predication can be done from the following Mining Model Prediction tab, as shown below:

谓词是任何数据挖掘算法的重要组成部分。可以从以下“ 挖掘模型预测”选项卡进行预测，如下所示：

The above screenshot shows how to predict what are the items which will be bought by the customers who had bought Water Bottle.

上面的屏幕截图显示了如何预测购买水瓶的客户将购买哪些物品。

结论 (Conclusion)

In conclusion, the Association Rule Mining method is an excellent way of finding associated items and optimizing article procurement and allocation in case of shopping applications.

总之，关联规则挖掘方法是查找关联项目并优化购物应用程序中商品采购和分配的一种极好的方法。

目录 (Table of contents)

Introduction to SQL Server Data Mining

Naive Bayes Prediction in SQL Server

Microsoft Decision Trees in SQL Server

Microsoft Time Series in SQL Server

Association Rule Mining in SQL Server

Microsoft Clustering in SQL Server

Microsoft Linear Regression in SQL Server

Implement Artificial Neural Networks (ANNs) in SQL Server

Implementing Sequence Clustering in SQL Server

Measuring the Accuracy in Data Mining in SQL Server

Data Mining Query in SSIS

Text Mining in SQL Server

SQL Server数据挖掘简介

SQL Server中的朴素贝叶斯预测

SQL Server中的Microsoft决策树

SQL Server中的Microsoft时间序列

SQL Server中的关联规则挖掘

SQL Server中的Microsoft群集

SQL Server中的Microsoft线性回归

在SQL Server中实现人工神经网络（ANN）

在SQL Server中实现序列聚类

在SQL Server中测量数据挖掘的准确性

SSIS中的数据挖掘查询

SQL Server中的文本挖掘

翻译自: https://www.sqlshack.com/the-association-rule-mining-in-sql-server/

sql server 关联