Just like in Santa’s Bag of Goodies, every release of SQL Server often has something for everyone – be it enhancements to DMVs for the DBAs, new functions for T-SQL developers or new SSIS control tasks for ETL developers. Likewise, the ability to effectively support many-to-many relationships type in SQL Graph has ensured that there is indeed something in it for the data warehouse developers in SQL Server 2017. In this article, we take you through the challenges of modelling many-to-many relationships in relational data warehouse environments and later demonstrate how data warehouse teams can take advantage of the many-to-many relationship feature in SQL Server 2017 Graph Database to effectively model and support their data warehouse solutions.

就像在圣诞老人的礼物袋中一样,每个SQL Server版本通常都适合每个人,无论是DBA的DMV增强,T-SQL开发人员的新功能还是ETL开发人员的新SSIS控制任务。 同样,有效支持SQL Graph中的多对多关系类型的能力确保了SQL Server 2017中的数据仓库开发人员确实存在其中的某些内容。在本文中,我们将带您解决建模多-关系数据仓库环境中的多对多关系,稍后演示数据仓库团队如何利用SQL Server 2017图形数据库中的多对多关系功能来有效地建模和支持其数据仓库解决方案。

传统数据仓库建模 (Traditional data warehouse modelling)

Typical data warehouse models usually depict a collection of dimensions and fact tables linked together to form a star or snowflake schema. Figure 1 depicts one such multidimensional star-schema model for a sample Book Sales Data Mart wherein all the dimensions are linked together by a centralised FactSales table.

典型的数据仓库模型通常描绘维和事实表的集合,这些维和事实表链接在一起以形成星形或雪花模式。 图1描绘了样本图书销售数据集市的一个这样的多维星形模式,其中所有维度都通过集中的FactSales表链接在一起。

Figure 2 shows a preview of the data for DimAuthors, DimBooks as well as FactSales tables that have been created based off the design in Figure 1.

图2显示了根据图1的设计创建的DimAuthorDimBooksFactSales表的数据的预览。

Given the nature of the data in our tables, we can easily answer business questions such as: How many books have been sold? Such a business question can be answered by writing a T-SQL query that involves aggregation of data from the quantity column in FactSales tables as shown in Script 1.

考虑到表中数据的性质,我们可以轻松回答以下业务问题: 已售出几本书? 可以通过编写一个T-SQL查询来回答这样的业务问题,该查询涉及FactSales表中数量列中数据的汇总,如脚本1所示。

SELECT title,SUM(quantity) [Number Of Books Sold]
FROM [BookSalesMart].[Fact].[Sales] aINNER JOIN [BookSalesMart].[Dim].[Books] b ON a.bookKey = b.bookKey
GROUP BY title;

Figure 3 shows us the results of executing Script 1 and it can be seen that only a single copy of Introduction of SQL Graph has been sold thus far.

图3向我们展示了执行脚本1的结果,可以看出到目前为止仅售出了一份SQL Graph介绍文件

数据仓库中的多对多关系 (Many-to-many relationships in a data warehouse)

The multidimensional model represented in Figure 1 is typically suitable for scenarios wherein there exist one-to-one and one-to-many relationship types i.e. a single author writes one or many books. However, it is quite plausible that several authors can collaborate to write a single book. Thus, whilst a single author can be linked to many books, a single book can also in turn be linked to several authors. However, given the multidimensional model shown in Figure 1, it would be difficult to link books sold to multiple authors in a fact table. To demonstrate such a challenge, let’s assume that for every book sold, authors of that book should get a portion of the revenue. This can only be done if the authors are correctly linked to the book being sold. Now, let’s further assume that our sample book Introduction to SQL Graph was actually co-authored between myself and the guys at ApexSQL. To ensure that both authors are financially credited whenever a sale of the book occurs, we would need to add another author entry into our Authors dimension such that when we later query the very same dimension we get to see two records as shown in Figure 4:

图1中表示的多维模型通常适用于存在一对一和一对多关系类型(即,单个作者写一本书或多本书)的场景。 但是,很多作者可以合作编写一本书是很合理的。 因此,虽然单个作者可以链接到许多书,但是单个书也可以依次链接到多个作者。 但是,考虑到图1所示的多维模型,很难将事实表中出售给多位作者的书籍链接起来。 为了说明这一挑战,我们假设每售出一本书,该书的作者应获得一部分收入。 仅当作者正确链接到所出售的书时,才可以这样做。 现在,让我们进一步假设我们的样本书《 SQL Graph简介》实际上是我本人和ApexSQL的人共同撰写的。 为了确保每当售出书籍时两位作者都在财务上得到信誉,我们需要在Authors维度中添加另一个author条目,以便以后查询相同维度时,我们可以看到两条记录, 如图4所示:

Next, we would need to find a way to indicate that the existing sale in our fact table (as shown in Figure 2) should be linked to both authorKey 1 and 2. The only way we could go about doing this – without having to change our design – would be to add another entry in the fact table that would be linked to the sale of our book as per the results in Figure 5.

接下来,我们需要找到一种方法来表明事实表中的现有销售( 如图2所示)应同时链接到authorKey 1和2。这是我们要做的唯一方法–无需更改我们的设计–将根据图5的结果在事实表中添加另一个条目,该条目将与我们的书的销售链接。

However, notice that when we rerun our Script 1, the number of books sold has increased by 2 as shown in Figure 6.

但是,请注意,当我们重新运行脚本1时 ,售出的图书数量增加了2, 如图6所示。

This is clearly incorrect as only one book has been sold thus far. Thus, the change to accommodate many-to-many relationship scenario in our existing star-schema model is causing incorrect calculations.

显然这是不正确的,因为到目前为止仅售出了一本书。 因此,为适应现有星型模型中的多对多关系方案而进行的更改导致了错误的计算。

多对多关系中的桥接表 (Bridge tables in many-to-many relationships)

One of the ways we can go about catering for many-to-many relationships without causing incorrect counts against our fact table is to refactor our multidimensional model depicted in Figure 1 to introduce a bridge or junction table. The bridge table can be implemented in several ways but we are interested in a bridge table that will help us link several dimension values into a single fact transaction. Figure 7 shows one such bridge table in which DimAuthorBridge table is used to link multiple DimAuthors dimension values into a single fact transaction in FactSales.

我们可以满足多对多关系而又不会对事实表造成错误计数的一种方法是重构图1所示的多维模型,以引入桥或连接表。 桥表可以通过多种方式实现,但是我们对桥表感兴趣,它将帮助我们将多个维度值链接到单个事实事务中。 图7示出了其中DimAuthorBridge表用于多个DimAuthors维度值链接到在FactSales一个事实事务一个这样的桥接表。

In terms of the data stored within the table, both authorKey 1 and 2 have been allocated a bridge table surrogate key (authorBridgeKey) value of 1 as shown in Figure 8.

在表中存储的数据方面, authorKey 1和2都被分配了桥表代理键( authorBridgeKey )值1, 如图8所示。

In addition to refactoring the multidimensional model in Figure 1 to include a bridge table, you would have noticed in Figure 7 that we have also refactored the fact table to replace authorKey with authorBridgeKey. This bridge table surrogate key is then used in the fact table to link authors to the sale of a particular book as shown in Figure 9.

除了重构图1中的多维模型以包括桥表之外,您还将在图7中注意到,我们还重构了事实表,以将authorKey替换为authorBridgeKey。 然后,在事实表中使用此桥接表代理键将作者链接到特定书的销售, 如图9所示。

If we were to rerun Script 1 against the updated fact table shown in Figure 9, we should be able to return the correct number of books sold thus far – which is at 1.

如果要针对图9中所示的更新后的事实表重新运行脚本1 ,我们应该能够返回到目前为止已售出的正确数量的图书-为1。

使用SQL图的多对多关系 (Many-to-Many relationships using SQL graph)

In an ideal data warehouse environment, you would want your joins between tables to be on primary keys but this is not always the case when bridge tables are used. Consequently, one limitation of using bridge tables is that the mere act of assigning similar surrogate key value (i.e. 1) to two or more authors for a successful grouping means that such bridge surrogate key is not unique thus prevents joins to a fact table on primary keys. An obvious downside to this approach is that not only could this lead to incorrect keys being assigned to a pair of authors, it could negatively affect the performance of queries against the bridge table.

在理想的数据仓库环境中,您希望表之间的联接位于主键上,但是使用桥表时并非总是如此。 因此,使用桥接表的局限性在于,仅将相似的代理键值(即1)分配给两个或多个作者以进行成功分组的行为就意味着该桥接代理键不是唯一的,从而阻止了对主表上事实表的联接键。 这种方法的明显缺点是,这不仅可能导致将错误的密钥分配给一对作者,而且可能会对桥接表的查询性能产生负面影响。

Fortunately, SQL Server 2017’s support for graph databases provide us with another mechanism for implementing many-to-many relationships in our data warehouse environment. This could be done firstly breaking down dimensions and fact tables in Figure 7 into Nodes and Edges. Script 2 provides a CREATE TABLE syntax for objects that have been identified as either Nodes or Edges.

幸运的是,SQL Server 2017对图数据库的支持为我们提供了另一种在数据仓库环境中实现多对多关系的机制。 首先可以将图7中的维度和事实表分解为节点和边缘。 脚本2为已被标识为节点或边缘的对象提供了CREATE TABLE语法。

CREATE TABLE Books ([bookKey] [int] IDENTITY(1,1) NOT NULL,[title] [varchar](50) NOT NULL,[InsertDate] [datetime2](7) NOT NULL DEFAULT (getdate()),
) AS NODE;CREATE TABLE Authors ([authorKey] [int] IDENTITY(1,1) NOT NULL,[fullname] [varchar](50) NOT NULL,[InsertDate] [datetime2](7) NOT NULL DEFAULT (getdate()),
) AS NODE;CREATE TABLE Customer ([customerKey] [int] IDENTITY(1,1) NOT NULL,[fullname] [varchar](50) NOT NULL,[InsertDate] [datetime2](7) NOT NULL DEFAULT (getdate()),
) AS NODE;
CREATE TABLE bought (quantity INTEGER) AS EDGE;
CREATE TABLE writerOf AS EDGE;

Take note of the create syntax for edge table bought. You will notice that it has quantity parameter which will be used to record number of sales – which works almost similar to what the FactSales table was being used for in the Figure 1 and 7.

注意edge表buy的create语法。 您会注意到,它具有用来记录销售数量的数量参数-其工作原理几乎类似于图17中使用的FactSales表。

The next step involves populating the objects that we have created using Script 2. Key to capturing data in a graph database, particularly edge objects, is that we need to specify the FROM and TO nodes IDs – which helps us indicate how the nodes relate to each other. Having populated the objects in our graph database based off the data shown in Figure 2, we should end-up with a PowerBI preview of the data as shown in Figure 10.

下一步涉及填充使用脚本2创建的对象。 捕获图形数据库(尤其是边缘对象)中数据的关键是我们需要指定FROM和TO节点ID,这有助于我们指出节点之间的关系。 根据图2所示的数据填充了图形数据库中的对象后,我们应该以数据的PowerBI预览结束, 如图10所示。

Finally, Script 3 gives us the query that we could utilise to calculate the number of books sold.

最后, 脚本3为我们提供了查询,我们可以利用该查询来计算售出的图书数量。

SELECT Books.title, sum(bought.quantity) [Number Of Books Sold]
FROM Customer, bought, Books
WHERE MATCH (Customer-(bought)->Books)
group by Books.title

摘要 (Summary)

The star-schema model is often very useful where one-to-one and one-to-many relationship types exist between dimensions and fact table. When many-to-many relationship type occurs, a bridge table can easily be used to deal with such relationship type. Furthermore, the introduction of SQL Graph in SQL Server 2017 gives us another alternative approach to modelling many-to-many relationships in data warehouse environments.

在维度和事实表之间存在一对一和一对多关系类型的情况下,星型模式通常非常有用。 当发生多对多关系类型时,可以轻松地使用桥接表来处理这种关系类型。 此外,SQL Server 2017中SQL Graph的引入为我们提供了另一种替代方法,用于对数据仓库环境中的多对多关系进行建模。

参考资料 (References)

  • About Data Warehouse Dimensional Modeling Using a Star Schema 关于使用星型模式的数据仓库维度建模
  • Design Tip #142 Building Bridges 设计技巧#142建筑桥梁
  • MATCH (Transact-SQL) 匹配(Transact-SQL)

翻译自: https://www.sqlshack.com/replace-bridge-tables-data-warehouse-sql-server-2017-graph-database/

用SQL Server 2017图形数据库替换数据仓库中的桥表相关推荐

  1. powerbi绘制地图_如何使用PowerBI绘制SQL Server 2017图形数据库

    powerbi绘制地图 In the article How to plot a SQL Server 2017 graph database using SQL Server R, I highli ...

  2. SQL Server 将指定的数据库中的所有表都列出来

    在很多情况下我们需要将指定的数据库中的所有表都列出来.在使用c#进行软件开发时,我们有哪些方法可是实现这个目的呢?本人对此进行概要的总结,有以下6中方式可以实现这个目的. 1.sqldmo SQLDM ...

  3. sql查询禁用缓存_如何在SQL Server 2017中启用和禁用身份缓存

    sql查询禁用缓存 Every data warehouse developer is likely to appreciate the significance of having surrogat ...

  4. 在SQL Server 2017中使用Python进行数据插值和转换

    As a continuation to my previous article, How to use Python in SQL Server 2017 to obtain advanced da ...

  5. dmv io读写高的sql_适用于DBA的前8大新(或增强)SQL Server 2017 DMV和DMF

    dmv io读写高的sql Dynamic management views (DMVs) and dynamic management functions (DMFs) are system vie ...

  6. sql服务器支持版本,支持的版本和版本升级 (SQL Server 2017)

    支持的版本和版本升级 (SQL Server 2017) 12/13/2019 本文内容 适用于: SQL Server(所有支持的版本) - 仅限 Windows 可以从 SQL Server 20 ...

  7. 如何在SQL Server 2017中实现图形数据库

    介绍 (Introduction) Graph database 图形数据库 A graph database is a type of database whose concept is based ...

  8. 关于SQL Server 2017中使用json传参时解析遇到的多层解析问题

    原文:关于SQL Server 2017中使用json传参时解析遇到的多层解析问题 开发新的系统,DB部分使用了SQL Server从2016版开始自带的Json解析方式. 用了快半年,在个人项目,以 ...

  9. 使用SQL Server 2017 Docker容器在.NET Core中进行本地Web API开发

    目录 介绍 先决条件 最好事先知道 假设 动机 跨平台 快速安装 经济有效 不同版本/多个实例 速度 持久性 找到SQL Server 2017镜像并在本地下载它 在没有卷挂载的情况下在本地执行SQ​ ...

最新文章

  1. Please use HDF reader for matlab v7.3 files
  2. store下拉框同步_ExtJS下拉列表使用方法(异步传输数据)
  3. echo -e 参数
  4. C# 发送消息SendKeys、SendMessage、keybd_event的用法
  5. 困难是成功路上的垫脚石_Java是开发的垫脚石。 学习吧!
  6. UITableViewCell在非Nib及Cell重用下设置CellStyle
  7. pandas 选取第一行_用pandas中的DataFrame时选取行或列的方法
  8. java索引丢失怎么解决_java.sql.SQLException: 索引中丢失 IN 或 OUT 参数:: 1解决办法...
  9. 第一季度VR市场报告出炉,中国市场份额下降至全球第三
  10. 数学建模——蒙特卡罗模型
  11. 数据库实验报告一-创建数据库和表
  12. c语言常量10进制,C语言常量
  13. pixi 流星_流星语270—273
  14. 西门子200smart自创库与说明
  15. 计算机证英语四六级证驾驶证,学校毕业要求有两个技能证书,计算机二级,四六级,到底算不算技能证?...
  16. applet 打印的属性和配置
  17. SCTF 2021 | 冰天雪地 极限比拼
  18. AppStore审核
  19. 已知校验矩阵(监督矩阵)或生成矩阵G怎样生成所有可能码字
  20. #多源数据融合#:HSI与Lidar

热门文章

  1. 售票系统的组件图和部署图_实物图+电气图讲解:教你学会看配电系统图,值得收藏!...
  2. 如何在input输入框中加一个搜索的小图片_仿淘宝搜索栏
  3. new 操作符干了什么?
  4. 从SVN资源库下载项目
  5. 解决service iptables save出错please try to use systemctl.
  6. 移动磁盘显示由于IO设备错误,无法运行此项请求要怎样寻回资料
  7. 浅析SQL SERVER执行计划中的各类怪相
  8. linux环境变量设置和修改
  9. JavaScript学习(三十四)—事件委托
  10. JavaScript学习 第一课(一)