图数据库 graph

Earlier this year, I published several articles on SQLShack with an aim of demonstrating tools available for visualising SQL Server 2017 graph databases. I was so caught up in the excitement of having SQL Server finally support graph databases that I forgot that some people still do not have a good grasp of how graph databases work let alone consider replacing their relational databases models in favour of graph. Although there are several ways that one can go about explaining the usefulness of graph databases over its relational counterpart, I have opted to focus on the benefits and strengths of graph databases by demonstrating the differences in which graph and relational databases deal with hierarchical datasets.

今年早些时候,我在SQLShack上发表了几篇文章,目的是演示可用于可视化SQL Server 2017图形数据库的工具。 令SQL Server最终支持图数据库的激动使我非常着迷,以至于我忘记了有些人仍然对图数据库的工作原理不甚了解,更不用说考虑用​​关系图代替关系数据库模型了。 尽管可以通过多种方法来解释图数据库相对于关系数据库的有用性,但我还是通过展示图和关系数据库处理分层数据集的差异来重点关注图数据库的优点和优势。

分层数据集的案例: 足球运动员与经理的关系 (The Case for Hierarchical Datasets: Footballer-Manager Relationship)

Hierarchical datasets are often synonymous with tree-like data structures and at one point or another, we’ve all been part of hierarchical data structures; be at places of work through employee-manager relationships, in our homes via child-parent relationships or through our levels of friendships on online social media platforms. The hierarchical relationship that I want to look at in this article is the Footballer-Manager relationship and how such a relationship can be represented in both graph and relational databases.

分层数据集通常是树状数据结构的同义词,在某一点或另一点,我们都属于分层数据结构的一部分。 通过员工与经理的关系,通过子女与父母的关系在家中或通过在线社交媒体平台上的友谊水平来工作。 我想在本文中探讨的层次关系是Footballer-Manager关系,以及如何在图形数据库和关系数据库中表示这种关系。

Figure 1 depicts a Footballer-Manager hierarchical relationship between players that – at some point – were part of the Spanish La Liga club, Atlético Madrid. As it can be seen in Figure 1, the majority of Footballer-Manager relationships are between manager Gregorio Manzano and players Gabi, Rubén Olivera, and Kiki Musampa. Whilst Diego Simeone was at some point managed by Gregorio Manzano, he has gone on to manage players of his own including the likes of Diego Costa, Gabi and Juanfran.

图1描绘了球员之间足球运动员-经理之间的等级关系,在某种程度上,他们是西班牙西甲俱乐部马德里竞技的一部分。 由于它可以在图1中可以看出,大多数足球-经理关系是经理格雷戈里奥·曼萨诺和球员加比奥利维拉基基·马桑帕之间。 尽管迭戈·西蒙内在某个时候由格雷戈里奥·曼萨诺Gregorio Manzano)管理,但他继续管理自己的球员,包括像迭戈·科斯塔Diego Costa),加比(Gabi)胡安弗兰(Juanfran)这样的球员。

The information represented in Figure 1 can be modelled for both relational and graph databases. Consequently, I’ve gone ahead and produced such models as shown in Figure 2 wherein the left-hand side of the black vertical bar represents the relational database model whilst the other side represents the graph. Noticeably in both models is the fact that the FootballerManager entity has a relationship with itself – what is often called a self-join.

图1中表示的信息可以为关系数据库和图形数据库建模。 因此,我继续制作了如图2所示的模型,其中黑色竖条的左侧代表关系数据库模型,而另一侧则代表图形。 值得注意的是,在这两个模型中, FootballerManager实体与自身都有关系-通常称为自联接。

One significant difference in Figure 2 is the way self-joined relationships are indicated. For instance, keys are used in the relational model to denote relationships such that the foreign key links back to its primary key. Graph databases, on the other hand, take an entirely different approach to representing relationships. This is because, in graph databases, relationships are stored as standalone tables and thus the managedBY relationship is a standalone table separate to the FootballerManager entity. In other words, for every entity-relationship that you define in a graph database, a separate standalone table is created to store such a relationship. To demonstrate this argument, we’ve used information from Figure 2 to create a logical entity-relationship-diagram (ERD), shown in Figure 3. Again, the left-hand side of the black vertical bar in Figure 3 represents logical ERD for the relational database whilst the right-hand side signifies graph.

图2中的一个重要区别是表示自联接关系的方式。 例如,键在关系模型中用于表示关系,以使外键链接回其主键。 另一方面,图形数据库采用完全不同的方法来表示关系。 这是因为在图形数据库中,关系存储为独立表,因此managedBY关系是独立于FootballerManager实体的独立表。 换句话说,对于在图形数据库中定义的每个实体关系,都会创建一个单独的独立表来存储这种关系。 为了证明这一观点,我们使用了图2中的信息来创建逻辑实体-关系图(ERD),如图3所示。 同样, 图3中黑色竖线的左侧代表关系数据库的逻辑ERD,而右侧则代表图形。

As per the total number of objects in Figure 3, graph database creates more physical objects than its relational counterpart. This means that we should expect the exercise of creating and populating objects in a graph database to be quite lengthier than a relational database. However – as it will be demonstrated later – all of your object-creation and populating efforts get rewarded later-on during data retrieval exercise as it is generally faster to retrieve hierarchical data out of graph compared to relational databases.

根据图3中的对象总数,图数据库创建的物理对象比其关系对应的对象更多。 这意味着我们应该期望在图形数据库中创建和填充对象的工作比关系数据库要长得多。 但是,正如稍后将要演示的那样,在数据检索过程中,您的所有对象创建和填充工作都会在以后获得回报,因为与关系数据库相比,从图形中检索层次结构数据通常更快。

衡量查询性能:关系数据库与图形数据库 (Measuring Query Performance: Relational Databases vs. Graph Databases)

In this section, we compare query performances for both relational and graph databases by creating physical tables based on logical ERD design shown in Figure 3.

在本节中,我们通过基于图3所示的逻辑ERD设计创建物理表,比较关系数据库和图形数据库的查询性能。

  1. Measuring Relational Database Query Performance

    衡量关系数据库查询性能

    According to our ERD diagram for the relational database, we need to create only a single table – FootballerManager. I have gone ahead and created and populated such a table as shown in Figure 4 – (A database backup containing all tables used in this demo is available under the Downloads section at the bottom of this article).

    根据关系数据库的ERD图,我们只需要创建一个表– FootballerManager 。 我已经继续创建并填充了这样一个表, 如图4所示-(包含此演示中使用的所有表的数据库备份位于本文底部的“ 下载”部分下)。

    Figure 4

    图4

    Having created and populated our table, we next move on to querying the table using Script 1.

    创建并填充了表格后,接下来我们继续使用脚本1查询表格。

    SELECT e.[Name] AS 'Manager',m.[Name] AS 'Player'
    FROM [dbo].[FootballerManager] e,[dbo].[FootballerManager] m
    WHERE e.[PlayerID] = m.[ManagedByID]AND e.[Name] = 'Diego Simeone';
    

    The results of Script 1 execution are shown in Figure 5 – which basically returns a list of all players managed by Diego Simeone.

    脚本1执行的结果如图5所示-基本上返回了Diego Simeone管理的所有玩家的列表。

    Figure 5

    图5

    Furthermore, using ApexSQL Plan I’ve gone ahead and generated an execution plan for Script 1. Notice in the generated execution plan in Figure 6 that the majority of Script 1 execution was spent performing hash match joins.

    此外,使用ApexSQL Plan 我已经为脚本1生成了执行计划。 请注意,在图6所生成的执行计划中, 脚本1的大部分执行都花费在执行哈希匹配联接上。

    Figure 6图6

    Hash match joins often indicate TempDB activity and as indicated in the query’s IO statistics shown in Figure 7, the SQL Server optimizer made use of Workfile and Worktable temporary objects to identify correct combinations that will satisfy query conditions stipulated in Script 1.

    哈希联接比赛往往表明tempdb活动,并如在图7所示的查询的IO统计表明,SQL Server优化利用工作文件工作台临时对象,以确定正确的组合,将满足查询条件的脚本1规定。

    Figure 7图7

  2. Measuring Graph Database Query Performance

    测量图数据库查询性能

    Script 2 is graph database query version of Script 1.

    脚本2脚本1的图形数据库查询版本。

    SELECT person1.name AS 'Manager',person2.name AS 'Player'
    FROM Footballer person1,Footballer person2,managedby
    WHERE MATCH(person1-(managedby)->person2)AND person1.name = 'Diego Simeone';
    

    Script 2剧本2

    The execution of Script 2 returns an output similar to what is shown in Figure 5 but as expected this version of the query in a graph database uses an entirely different execution plan to that of a relational database query.

    脚本2的执行返回的输出类似于图5中所示的输出,但是正如预期的那样,图形数据库中的该版本的查询使用与关系数据库查询完全不同的执行计划。

    Figure 8图8

    Figure 8 shows execution plan of Script 2. The execution plan indicates that the graph database query relied on nested loop joins instead of hash match joins used in the relational database version of the query. Nested loops basically identify matches by taking each record from the outer input and applying it against the inner input. In this case, our managedBY relationship table is the outer input being applied against the Footballer table. The predicate “Diego Simeone” reduces the size of our outer input from 27 to 3 – as 3 is the total number of records (relationships) linked to Diego Simeone. Now when we go looking for the names of the players linked to Diego Simeone, we only scan the Footballer table against 3 relationships instead of 27. So, the formula for our nested loop becomes 3*22 (number of possible player names) instead of 27 * 22.

    图8显示了脚本2的执行计划。 执行计划表明,图数据库查询依赖于嵌套循环联接,而不是查询的关系数据库版本中使用的哈希匹配联接。 嵌套循环基本上通过从外部输入中获取每个记录并将其应用于内部输入来标识匹配项。 在这种情况下,我们的managedBY关系表是对Footballer表应用的外部输入。 谓词“ Diego Simeone ”将外部输入的大小从27减少到3 –因为3是与Diego Simeone相关的记录(关系)的总数。 现在,当我们查找链接到Diego Simeone的球员的姓名时,我们仅针对3个关系而不是27个关系来扫描Footballer表。因此,嵌套循环的公式变为3 * 22(可能的球员姓名数),而不是27 * 22。

摘要 (Summary)

Hierarchical datasets easily highlight the modelling and query execution differences in both relational and graph databases. However, this is by no means an indication that you should start throwing away the one database system over the other. Graph databases are more efficient when data relationships are at the core of your requirement. Below is a list of some key differences between Relational and Graph databases. We conclude this article by summarising some of the key differences between relational databases and graph databases.

层次数据集可以轻松突出关系数据库和图形数据库中的建模和查询执行差异。 但是,这绝不表示您应该开始抛弃一个数据库系统而取代另一个数据库系统。 当数据关系是您的核心要求时,图形数据库将更加高效。 下面列出了关系数据库和图数据库之间的一些关键区别。 我们通过总结关系数据库和图形数据库之间的一些关键区别来结束本文。

Relational Database Graph Database
Entity Relationships exist as keys within dataset Entity Relationships are represented as tables
Increase in size of dataset reduces query performance Increase in connections/relationships degrades query performance
Harder to introduce new relationships/keys as it requires altering definition of underlying table Easy to add new relationships
关系型数据库 图数据库
实体关系作为数据集中的键存在 实体关系表示为表格
数据集大小的增加会降低查询性能 连接/关系的增加会降低查询性能
难以引入新的关系/键,因为它需要更改基础表的定义 轻松添加新关系

资料下载 (Downloads)

  • UGDemo UGDemo

参考资料 (References)

  • Graph processing with SQL Server and Azure SQL Database 使用SQL Server和Azure SQL数据库进行图形处理
  • Hash Match – SQL Server Graphical Execution Plan 哈希匹配– SQL Server图形执行计划
  • Hierarchical database model 分层数据库模型

翻译自: https://www.sqlshack.com/understanding-benefits-of-graph-databases-over-relational-databases-through-self-joins-in-sql-server/

图数据库 graph

图数据库 graph_通过SQL Server中的自连接了解Graph数据库相对于关系数据库的好处相关推荐

  1. 如何将用户迁移到SQL Server中的部分包含的数据库

    介绍 (Introduction) Microsoft introduced the Contained Database feature in SQL Server 2012. In this ar ...

  2. SQL Server中的自连接和全外连接

                                SQL Server中的自连接和全外连接 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ...

  3. SQL Server中的查询优化技术:数据库设计和体系结构

    描述 (Description) One of the best ways to optimize performance in a database is to design it right th ...

  4. sql2008能否打开mysql数据库_mysql数据库数据能不能导入到sql server中

    点"测试"按钮确认你的链接是正确的. Press the "Test" button to ensure your connection settings ar ...

  5. SQL Server中的角色(服务器级别和数据库级别角色)

    参考文献 http://msdn.microsoft.com/zh-cn/library/ms188659.aspx 服务器级别角色 为帮助您管理服务器上的权限,SQL Server 提供了若干角色. ...

  6. SQL Server中的数据库文件组和零碎还原

    So far, we discussed many de-facto details about SQL Server database backup and restore. In this 15t ...

  7. SQL Server 中4个系统数据库详细介绍

    SQL Server 中4个系统数据库,Master.Model.Msdb.Tempdb. (1)Master数据库是SQL Server系统最重要的数据库,它记录了SQL Server系统的所有系统 ...

  8. Java数据库基础--以SQL Server为例

    sql server数据库基本概念 使用文件保存数据存在几个缺点: 1.文件的安全性问题: 2.文件不利于查询和对数据的管理: 3.文件不利于存放海量数据 4.文件在程序中控制不方便. 数据库的定义( ...

  9. SQL SERVER 中identity

    SQL SERVER 中identity用法: 在数据库中, 常用的一个流水编号通常会使用 identity 栏位来进行设置, 这种编号的好处是一定不会重覆, 而且一定是唯一的, 这对table中的唯 ...

最新文章

  1. CountDownTimer 实现验证码倒计时
  2. Using NUnit with Visual Studio 2005 Express Editions
  3. VS2015配置并运行汇编(一步一步照图做)【vs2017的链接在最后】
  4. 地砖中间高四边低_地砖上墙到底好不好?幸好我家没这么做否则全毁了!
  5. _编程语言_C++_Lambda函数与表达式
  6. MIP 官方发布 v1稳定版本
  7. Shell脚本8种字符串截取方法总结
  8. 小明是个急性子,上小学的时候经常吧老师写在黑板上的题目抄错 有一次,老师出的题目是:36x495=? 他却给抄成了:396x45=? 但是结果很戏剧性,他的 答案是对的 因为36*495 = 39
  9. FireFox插件拖放安装法
  10. ペイペイ mini program_小姐姐最爱的MINI,你了解吗?
  11. pandas读取csv文件数据并对数据分类使用matplotlib画出折线图
  12. QCC512x QCC302x Earbud 工程增加三击事件
  13. android取消输入法联想,输入法联想功能,怎么清除输入法联想
  14. 怎样在表格中选出同一类_怎样在excel中筛选出带同样文字的
  15. PHP中的SAPI是什么,都有那些模式?
  16. Web实现:完整版垃圾分类网站 html+css 内含效果图
  17. 春天来了,该播种了。久久荒芜的博客重新耕种起来
  18. 服务器租用前如何测试网络速度?
  19. 【C语言】世界上不同国家有不同的写日期的习惯。比如美国人习惯写成“月-日-年”,而中国人习惯写成“年-月-日”。下面请你写个程序,自动把读入的美国格式的日期改写成中国习惯的日期。
  20. C++连接SQL的简单例子(win 和 linux)

热门文章

  1. js json数组_JaveScript对象篇和数组篇
  2. python创建树_python – 从SQLalchemy中的自引用表创建树
  3. .net webim 源码_Netty服务器启动过程源码带你分析「你能坚持看完吗?」
  4. Java笔记(十七) 异步任务执行服务
  5. Luogu1515 青蛙的约会
  6. Docker管理工具-Swarm部署记录
  7. python初心记录一
  8. 用sed替换文件中的空格
  9. 自定义UITabBar
  10. 学者当自树其帜——为一本书专建的“第二次宣言网”上线有感