当心渴望的管道

尽管我喜欢Cypher的LOAD CSV命令使它容易地将数据获取到Neo4j中的方法,但它目前打破了最不惊奇的规则,因为它急切地在所有行中加载某些查询,即使是那些使用定期提交的查询。

这是我的同事Michael在第二篇博客文章中指出的,它解释了如何成功使用LOAD CSV :

即使遵循我之前的建议,人们遇到的最大问题是,对于超过一百万行的大量导入,Cypher遇到了内存不足的情况。

与提交大小无关 ,因此即使使用小批量的PERIODIC COMMIT也会发生。

最近,我花了几天的时间将数据导入具有4GB RAM的Windows机器上的Neo4j中,所以我发现这个问题的时间甚至早于Michael的建议。

Michael解释了如何确定您的查询是否遭受意外的急切评估:

如果分析该查询,则会看到查询计划中有一个“急切”步骤。

那就是“拉入所有数据”的地方。

您可以通过在单词“ PROFILE”前面加上前缀来配置查询。 您需要在Web浏览器的/ webadmin控制台中或使用Neo4j shell运行查询。

我为查询执行了此操作,并且能够识别得到快速评估的查询模式,在某些情况下,我们可以解决该问题。

我们将使用Northwind数据集来演示Eager管道如何潜入我们的查询,但是请记住,该数据集足够小,不会引起问题。

文件中的行如下所示:

$ head -n 2 data/customerDb.csv
OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath,OrderID,ProductID,UnitPrice,Quantity,Discount,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued,SupplierID,SupplierCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage,CategoryID,CategoryName,Description,Picture
10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,VINET,Vins et alcools Chevalier,Paul Henriot,Accounting Manager,59 rue de l'Abbaye,Reims,,51100,France,26.47.15.10,26.47.15.11,5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04,1993-10-17,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,\x,"Steven Buchanan graduated from St. Andrews University, Scotland, with a BSC degree in 1976.  Upon joining the company as a sales representative in 1992, he spent 6 months in an orientation program at the Seattle office and then returned to his permanent post in London.  He was promoted to sales manager in March 1993.  Mr. Buchanan has completed the courses ""Successful Telemarketing"" and ""International Sales Management.""  He is fluent in French.",2,http://accweb/emmployees/buchanan.bmp,10248,11,14,12,0,11,Queso Cabrales,5,4,1 kg pkg.,21,22,30,30,0,5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,,4,Dairy Products,Cheeses,\x

合并,合并,合并

我们要做的第一件事是为每个员工和每个订单创建一个节点,然后在它们之间创建一个关系。

我们可以从以下查询开始:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)

这样就可以了,但是如果我们像这样对查询进行概要分析……

PROFILE LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
WITH row LIMIT 0
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)

…我们会在第三行看到“渴望”:

==> +----------------+------+--------+----------------------------------+-----------------------------------------+
==> |       Operator | Rows | DbHits |                      Identifiers |                                   Other |
==> +----------------+------+--------+----------------------------------+-----------------------------------------+
==> |    EmptyResult |    0 |      0 |                                  |                                         |
==> | UpdateGraph(0) |    0 |      0 |    employee, order,   UNNAMED216 |                            MergePattern |
==> |          Eager |    0 |      0 |                                  |                                         |
==> | UpdateGraph(1) |    0 |      0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order |
==> |          Slice |    0 |      0 |                                  |                            {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                              row |                                         |
==> +----------------+------+--------+----------------------------------+-----------------------------------------+

您会注意到,当我们分析每个查询时,我们将删除定期提交部分,并添加“ WITH row LIMIT 0”。 这使我们能够生成足够的查询计划来标识“急切”运算符,而无需实际导入任何数据。

我们希望将该查询分为两个查询,以便可以不急于处理它:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
WITH row LIMIT 0
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
==> +-------------+------+--------+----------------------------------+-----------------------------------------+
==> |    Operator | Rows | DbHits |                      Identifiers |                                   Other |
==> +-------------+------+--------+----------------------------------+-----------------------------------------+
==> | EmptyResult |    0 |      0 |                                  |                                         |
==> | UpdateGraph |    0 |      0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order |
==> |       Slice |    0 |      0 |                                  |                            {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                              row |                                         |
==> +-------------+------+--------+----------------------------------+-----------------------------------------+

现在我们已经创建了员工和订单,我们可以将他们加入在一起:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> |    UpdateGraph |    0 |      0 | employee, order,   UNNAMED216 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(0) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(1) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(1) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+

眼中没有渴望!

比赛,比赛,比赛,合并,合并

如果我们快进几步,我们现在可能已经将导入脚本重构到了在一个查询中创建节点并在另一个查询中创建关系的地步。

我们的create查询按预期工作:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (product:Product {productId: row.ProductID})
==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+
==> |    Operator | Rows | DbHits |                                        Identifiers |                                                        Other |
==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+
==> | EmptyResult |    0 |      0 |                                                    |                                                              |
==> | UpdateGraph |    0 |      0 | employee, employee, order, order, product, product | MergeNode; :Employee; MergeNode; :Order; MergeNode; :Product |
==> |       Slice |    0 |      0 |                                                    |                                                 {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                                                row |                                                              |
==> +-------------+------+--------+----------------------------------------------------+------------------------------------------------------------

现在,我们在图表中有了员工,产品和订单。 现在,让我们创建三者之间的关系:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MATCH (product:Product {productId: row.ProductID})
MERGE (employee)-[:SOLD]->(order)
MERGE (order)-[:PRODUCT]->(product)

如果我们描述一下,我们会发现Eager再次潜入了!

==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> | UpdateGraph(0) |    0 |      0 |  order, product,   UNNAMED318 |                                              MergePattern |
==> |          Eager |    0 |      0 |                               |                                                           |
==> | UpdateGraph(1) |    0 |      0 | employee, order,   UNNAMED287 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |    Property(product,productId) == Property(row,ProductID) |
==> | NodeByLabel(0) |    0 |      0 |              product, product |                                                  :Product |
==> |      Filter(1) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(1) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(2) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(2) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+

在这种情况下,“急切”发生在我们第二次致电MERGE时,正如Michael在他的帖子中指出的那样:

问题是,在单个Cypher语句中,您必须隔离会进一步影响匹配的更改,例如,当您创建带有标签的节点时,该标签突然被以后的MATCH或MERGE操作所匹配。

在这种情况下,我们可以通过使用单独的查询来创建关系来解决该问题:

LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> |    UpdateGraph |    0 |      0 | employee, order,   UNNAMED236 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(0) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(1) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(1) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (order:Order {orderId: row.OrderID})
MATCH (product:Product {productId: row.ProductID})
MERGE (order)-[:PRODUCT]->(product)
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+
==> |       Operator | Rows | DbHits |                  Identifiers |                                                  Other |
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                              |                                                        |
==> |    UpdateGraph |    0 |      0 | order, product,   UNNAMED229 |                                           MergePattern |
==> |      Filter(0) |    0 |      0 |                              | Property(product,productId) == Property(row,ProductID) |
==> | NodeByLabel(0) |    0 |      0 |             product, product |                                               :Product |
==> |      Filter(1) |    0 |      0 |                              |       Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(1) |    0 |      0 |                 order, order |                                                 :Order |
==> |          Slice |    0 |      0 |                              |                                           {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                          row |                                                        |
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+

合并,设置

我尝试使LOAD CSV脚本尽可能地幂等,这样,如果我们将更多行或更多列的数据添加到CSV中,我们可以重新运行查询而不必重新创建所有内容。

这可以引导您进入以下创建供应商的模式:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (supplier:Supplier {supplierId: row.SupplierID})
SET supplier.companyName = row.SupplierCompanyName

我们要确保只有一个具有该SupplierID的Supplier,但是我们可能会逐步添加新属性,并决定仅使用'SET'命令替换所有内容。 如果我们分析该查询,则“渴望”会潜伏:

==> +----------------+------+--------+--------------------+----------------------+
==> |       Operator | Rows | DbHits |        Identifiers |                Other |
==> +----------------+------+--------+--------------------+----------------------+
==> |    EmptyResult |    0 |      0 |                    |                      |
==> | UpdateGraph(0) |    0 |      0 |                    |          PropertySet |
==> |          Eager |    0 |      0 |                    |                      |
==> | UpdateGraph(1) |    0 |      0 | supplier, supplier | MergeNode; :Supplier |
==> |          Slice |    0 |      0 |                    |         {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                row |                      |
==> +----------------+------+--------+--------------------+----------------------+

我们可以使用“ ON CREATE SET”和“ ON MATCH SET”以一些重复的代价来解决此问题:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (supplier:Supplier {supplierId: row.SupplierID})
ON CREATE SET supplier.companyName = row.SupplierCompanyName
ON MATCH SET supplier.companyName = row.SupplierCompanyName
==> +-------------+------+--------+--------------------+----------------------+
==> |    Operator | Rows | DbHits |        Identifiers |                Other |
==> +-------------+------+--------+--------------------+----------------------+
==> | EmptyResult |    0 |      0 |                    |                      |
==> | UpdateGraph |    0 |      0 | supplier, supplier | MergeNode; :Supplier |
==> |       Slice |    0 |      0 |                    |         {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                row |                      |
==> +-------------+------+--------+--------------------+----------------------+

使用我一直在使用的数据集,在某些情况下可以避免OutOfMemory异常,而在其他情况下,可以将运行查询所花费的时间减少3倍。

随着时间的流逝,我希望所有这些情况都将得到解决,但是从Neo4j 2.1.5开始,这些是我已经确定过急的模式。

如果您知道其他任何人,请告诉我,我可以将其添加到帖子中或撰写第二部分。

翻译自: https://www.javacodegeeks.com/2014/10/neo4j-cypher-avoiding-the-eager.html

Neo4j:Cypher –避免热切相关推荐

  1. neo4j cypher_Neo4j:Cypher –避免热切

    neo4j cypher 当心渴望的管道 尽管我喜欢Cypher的LOAD CSV命令使它容易地将数据获取到Neo4j中的方法,但它目前打破了最不惊奇的规则,因为它急切地在所有行中加载某些查询,即使是 ...

  2. neo4j cypher_优化Neo4j Cypher查询

    neo4j cypher 上周,我花了很多时间来尝试优化大约20个使用实时系统数据执行的灾难性的Cypher查询(36866ms至155575ms). 经过一番尝试和错误,以及来自Michael的大量 ...

  3. neo4j / cypher:悬挂查询参数

    一直以来,我一直在使用neo4j的密码查询语言, 迈克尔一直在告诉我在查询中使用参数,但是查询的性能始终可以接受,因此我没有必要. 但是,最近我正在研究一个数据集,并使用类似于以下的代码创建了约500 ...

  4. 优化Neo4j Cypher查询

    上周,我花了很多时间尝试使用实时系统中的数据来优化大约20个执行失败的Cypher查询(36866ms至155575ms). 经过一番尝试和错误,以及来自Michael的大量投入,我能够大致确定对查询 ...

  5. neo4j︱Cypher完整案例csv导入、关系联通、高级查询(三)

    图数据库常规的有:neo4j(支持超多语言).JanusGraph/Titan(分布式).Orientdb,google也开源了图数据库Cayley(Go语言构成).PostgreSQL存储RDF格式 ...

  6. neo4j︱Cypher 查询语言简单案例(二)

    图数据库常规的有:neo4j(支持超多语言).JanusGraph/Titan(分布式).Orientdb,google也开源了图数据库Cayley(Go语言构成).PostgreSQL存储RDF格式 ...

  7. Neo4J Cypher neo4j-driver py2neo介绍与使用

    Neo4J Cypher neo4j-driver介绍与使用 neo4j介绍 关系型数据库和图数据库 图数据库的基本概念 Nodes Labels Relationship RelationshipT ...

  8. neo4j cypher操作

    neo4j cypher操作 文章目录 neo4j cypher操作 neo4j CMD命令 前言 1增加 1.1 增加节点 1.2 查询节点 2 关系 2.1 创建关系 2.2 查询关系 3 删除 ...

  9. 【知识图谱】Neo4j Cypher查询语言详解

    Cypher 语法学习 Cypher 介绍 启动 Neo4j 基本类型 数值,布尔,字符串 节点和关系 列表 匹配语句 根据标签匹配节点 根据标签和属性匹配节点 匹配任意关系 可选匹配 过滤匹配 路径 ...

最新文章

  1. java设计一个bank类实现银行_SAP银企直连之平安银行(ECC版)
  2. 在Win7下利用VirtualBox和Vagrant安装Docker
  3. C++_typedef名字
  4. apache的防DDOS模块-mod_evasive
  5. leetcode最小路径和 (动态规划)python
  6. cnpm在ubuntu19.10下面的安装以及vue.js中el的意思
  7. switch注意事项
  8. VMware vSphere之vCenter安装
  9. 与JavaWeb有关的故事(Web请求与Java IO)
  10. linux上机考试题(Linux基础)
  11. sqlserver日期函数大全
  12. html5在线画板菱形怎么画,HTML5 Canvas 制作一个“在线画板”
  13. wps怎么关闭那个登录界面_关闭wps窗口快捷键是什么?
  14. C#使用NPOI的方式操作Excel复制行
  15. 鼠标失灵了?我来给你解决吧!
  16. Swift - 设置UILabel、UITextView的文字行间距
  17. 月薪3500的我,是怎样把自己 “逼”成月入十万的?
  18. godspeed机器人_来自深渊(KINEMA CITRUS改编的电视动画作品)_百度百科
  19. AIGC/ChatGPT这么火,相关的AI产品岗,真的有变多吗?_最新AI产品经理求职动态(28)...
  20. C语言实现小游戏之井字棋

热门文章

  1. 2015蓝桥杯省赛---java---C---9(打印大X)
  2. cucumber测试_如何在Cucumber中进行后端测试
  3. 世界是沙粒还是宇宙_看到一个沙粒世界:再一次你好世界
  4. 普罗米修斯使用es数据库_用普罗米修斯和格拉法纳仪法来豪猪
  5. 可变lambda_Lambda的Lambda(如果可选)
  6. 接口中默认方法和静态方法_接口中的默认方法和静态方法
  7. jdk8集合类流_JDK 8中的流驱动的集合功能
  8. 通过Apache Kafka集成流式传输大数据
  9. netbeans6.8_NetBeans IDE 8.0和Java 8的新功能
  10. java 使用本机代理_Java与本机代理–他们所做的强大功能