neo4j cypher

当心渴望的管道

尽管我喜欢Cypher的LOAD CSV命令使它容易地将数据获取到Neo4j中的方法,但它目前打破了最不惊奇的规则,因为它急切地在所有行中加载某些查询,即使是那些使用定期提交的查询。

这是我的同事Michael在他的第二篇博客文章中指出的,它解释了如何成功使用LOAD CSV :

即使遵循我之前的建议,人们遇到的最大问题是,对于超过一百万行的大量导入,Cypher遇到了内存不足的情况。

与提交大小无关 ,因此即使是小批量的PERIODIC COMMIT也会发生。

最近,我花了几天时间在具有4GB RAM的Windows机器上将数据导入Neo4j,所以我比Michael建议的更早看到了这个问题。

Michael解释了如何确定您的查询是否遭受意外的急切评估:

如果分析该查询,则会看到查询计划中有一个“急切”步骤。

那就是“拉入所有数据”的地方。

您可以通过在单词“ PROFILE”前面加上前缀来配置查询。 您需要在Web浏览器的/ webadmin控制台中或使用Neo4j shell运行查询。

我为查询执行了此操作,并且能够识别得到快速评估的查询模式,在某些情况下,我们可以解决该问题。

我们将使用Northwind数据集来演示Eager管道如何潜入我们的查询中,但请记住,该数据集足够小,不会引起问题。

这是文件中的行的样子:

$ head -n 2 data/customerDb.csv
OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath,OrderID,ProductID,UnitPrice,Quantity,Discount,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued,SupplierID,SupplierCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage,CategoryID,CategoryName,Description,Picture
10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,VINET,Vins et alcools Chevalier,Paul Henriot,Accounting Manager,59 rue de l'Abbaye,Reims,,51100,France,26.47.15.10,26.47.15.11,5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04,1993-10-17,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,\x,"Steven Buchanan graduated from St. Andrews University, Scotland, with a BSC degree in 1976.  Upon joining the company as a sales representative in 1992, he spent 6 months in an orientation program at the Seattle office and then returned to his permanent post in London.  He was promoted to sales manager in March 1993.  Mr. Buchanan has completed the courses ""Successful Telemarketing"" and ""International Sales Management.""  He is fluent in French.",2,http://accweb/emmployees/buchanan.bmp,10248,11,14,12,0,11,Queso Cabrales,5,4,1 kg pkg.,21,22,30,30,0,5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,,4,Dairy Products,Cheeses,\x

合并,合并,合并

我们要做的第一件事是为每个员工和每个订单创建一个节点,然后在它们之间创建一个关系。

我们可以从以下查询开始:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)

这样就可以了,但是如果我们像这样对查询进行概要分析……

PROFILE LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
WITH row LIMIT 0
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)

…我们会在第三行看到“渴望”:

==> +----------------+------+--------+----------------------------------+-----------------------------------------+
==> |       Operator | Rows | DbHits |                      Identifiers |                                   Other |
==> +----------------+------+--------+----------------------------------+-----------------------------------------+
==> |    EmptyResult |    0 |      0 |                                  |                                         |
==> | UpdateGraph(0) |    0 |      0 |    employee, order,   UNNAMED216 |                            MergePattern |
==> |          Eager |    0 |      0 |                                  |                                         |
==> | UpdateGraph(1) |    0 |      0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order |
==> |          Slice |    0 |      0 |                                  |                            {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                              row |                                         |
==> +----------------+------+--------+----------------------------------+-----------------------------------------+

您会注意到,当我们分析每个查询时,我们将删除定期提交部分,并添加“ WITH row LIMIT 0”。 这使我们能够生成足够的查询计划来标识“急切”运算符,而无需实际导入任何数据。

我们希望将该查询分为两个查询,以便可以不急于处理它:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
WITH row LIMIT 0
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
==> +-------------+------+--------+----------------------------------+-----------------------------------------+
==> |    Operator | Rows | DbHits |                      Identifiers |                                   Other |
==> +-------------+------+--------+----------------------------------+-----------------------------------------+
==> | EmptyResult |    0 |      0 |                                  |                                         |
==> | UpdateGraph |    0 |      0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order |
==> |       Slice |    0 |      0 |                                  |                            {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                              row |                                         |
==> +-------------+------+--------+----------------------------------+-----------------------------------------+

现在我们已经创建了员工和订单,我们可以将他们加入在一起:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> |    UpdateGraph |    0 |      0 | employee, order,   UNNAMED216 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(0) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(1) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(1) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+

眼中没有渴望!

比赛,比赛,比赛,合并,合并

如果我们快进几步,我们现在可能已经将导入脚本重构到了我们在一个查询中创建节点并在另一个查询中创建关系的地步。

我们的create查询按预期工作:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (employee:Employee {employeeId: row.EmployeeID})
MERGE (order:Order {orderId: row.OrderID})
MERGE (product:Product {productId: row.ProductID})
==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+
==> |    Operator | Rows | DbHits |                                        Identifiers |                                                        Other |
==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+
==> | EmptyResult |    0 |      0 |                                                    |                                                              |
==> | UpdateGraph |    0 |      0 | employee, employee, order, order, product, product | MergeNode; :Employee; MergeNode; :Order; MergeNode; :Product |
==> |       Slice |    0 |      0 |                                                    |                                                 {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                                                row |                                                              |
==> +-------------+------+--------+----------------------------------------------------+------------------------------------------------------------

现在,我们在图表中有了员工,产品和订单。 现在让我们在三者之间建立关系:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MATCH (product:Product {productId: row.ProductID})
MERGE (employee)-[:SOLD]->(order)
MERGE (order)-[:PRODUCT]->(product)

如果我们描述,我们会发现Eager再次潜入了!

==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> | UpdateGraph(0) |    0 |      0 |  order, product,   UNNAMED318 |                                              MergePattern |
==> |          Eager |    0 |      0 |                               |                                                           |
==> | UpdateGraph(1) |    0 |      0 | employee, order,   UNNAMED287 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |    Property(product,productId) == Property(row,ProductID) |
==> | NodeByLabel(0) |    0 |      0 |              product, product |                                                  :Product |
==> |      Filter(1) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(1) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(2) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(2) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+

在这种情况下,Eager发生在我们第二次致电MERGE时,正如Michael在他的帖子中指出的:

问题是,在单个Cypher语句中,您必须隔离会进一步影响匹配的更改,例如,当您创建带有标签的节点时,该标签突然被以后的MATCH或MERGE操作所匹配。

在这种情况下,我们可以通过使用单独的查询来创建关系来解决该问题:

LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (employee:Employee {employeeId: row.EmployeeID})
MATCH (order:Order {orderId: row.OrderID})
MERGE (employee)-[:SOLD]->(order)
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |       Operator | Rows | DbHits |                   Identifiers |                                                     Other |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                               |                                                           |
==> |    UpdateGraph |    0 |      0 | employee, order,   UNNAMED236 |                                              MergePattern |
==> |      Filter(0) |    0 |      0 |                               |          Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(0) |    0 |      0 |                  order, order |                                                    :Order |
==> |      Filter(1) |    0 |      0 |                               | Property(employee,employeeId) == Property(row,EmployeeID) |
==> | NodeByLabel(1) |    0 |      0 |            employee, employee |                                                 :Employee |
==> |          Slice |    0 |      0 |                               |                                              {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                           row |                                                           |
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MATCH (order:Order {orderId: row.OrderID})
MATCH (product:Product {productId: row.ProductID})
MERGE (order)-[:PRODUCT]->(product)
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+
==> |       Operator | Rows | DbHits |                  Identifiers |                                                  Other |
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+
==> |    EmptyResult |    0 |      0 |                              |                                                        |
==> |    UpdateGraph |    0 |      0 | order, product,   UNNAMED229 |                                           MergePattern |
==> |      Filter(0) |    0 |      0 |                              | Property(product,productId) == Property(row,ProductID) |
==> | NodeByLabel(0) |    0 |      0 |             product, product |                                               :Product |
==> |      Filter(1) |    0 |      0 |                              |       Property(order,orderId) == Property(row,OrderID) |
==> | NodeByLabel(1) |    0 |      0 |                 order, order |                                                 :Order |
==> |          Slice |    0 |      0 |                              |                                           {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                          row |                                                        |
==> +----------------+------+--------+------------------------------+--------------------------------------------------------+

合并,设置

我尝试使LOAD CSV脚本尽可能地幂等,这样,如果我们将更多行或更多列的数据添加到CSV中,我们可以重新运行查询而不必重新创建所有内容。

这可以引导您进入以下创建供应商的模式:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (supplier:Supplier {supplierId: row.SupplierID})
SET supplier.companyName = row.SupplierCompanyName

我们要确保只有一个具有该SupplierID的Supplier,但是我们可能会逐步添加新属性,并决定使用'SET'命令替换所有内容。 如果我们分析该查询,则“渴望”会潜伏:

==> +----------------+------+--------+--------------------+----------------------+
==> |       Operator | Rows | DbHits |        Identifiers |                Other |
==> +----------------+------+--------+--------------------+----------------------+
==> |    EmptyResult |    0 |      0 |                    |                      |
==> | UpdateGraph(0) |    0 |      0 |                    |          PropertySet |
==> |          Eager |    0 |      0 |                    |                      |
==> | UpdateGraph(1) |    0 |      0 | supplier, supplier | MergeNode; :Supplier |
==> |          Slice |    0 |      0 |                    |         {  AUTOINT0} |
==> |        LoadCSV |    1 |      0 |                row |                      |
==> +----------------+------+--------+--------------------+----------------------+

我们可以使用“ ON CREATE SET”和“ ON MATCH SET”以一些重复的代价来解决此问题:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row
MERGE (supplier:Supplier {supplierId: row.SupplierID})
ON CREATE SET supplier.companyName = row.SupplierCompanyName
ON MATCH SET supplier.companyName = row.SupplierCompanyName
==> +-------------+------+--------+--------------------+----------------------+
==> |    Operator | Rows | DbHits |        Identifiers |                Other |
==> +-------------+------+--------+--------------------+----------------------+
==> | EmptyResult |    0 |      0 |                    |                      |
==> | UpdateGraph |    0 |      0 | supplier, supplier | MergeNode; :Supplier |
==> |       Slice |    0 |      0 |                    |         {  AUTOINT0} |
==> |     LoadCSV |    1 |      0 |                row |                      |
==> +-------------+------+--------+--------------------+----------------------+

使用我一直在使用的数据集,在某些情况下可以避免OutOfMemory异常,而在其他情况下,可以将运行查询所花费的时间减少3倍。

随着时间的流逝,我希望所有这些情况都将得到解决,但是从Neo4j 2.1.5开始,这些是我已经确定过急的模式。

如果您知道其他任何人,请告诉我,我可以将其添加到帖子中或撰写第二部分。

翻译自: https://www.javacodegeeks.com/2014/10/neo4j-cypher-avoiding-the-eager.html

neo4j cypher

neo4j cypher_Neo4j:Cypher –避免热切相关推荐

  1. neo4j cypher_neo4j / cypher:悬挂查询参数

    neo4j cypher 一直以来,我一直在使用neo4j的密码查询语言, 迈克尔一直在告诉我在查询中使用参数,但是查询的性能始终可以接受,因此我没有必要. 但是,最近我正在研究一个数据集,并使用类似 ...

  2. Neo4j:Cypher –避免热切

    当心渴望的管道 尽管我喜欢Cypher的LOAD CSV命令使它容易地将数据获取到Neo4j中的方法,但它目前打破了最不惊奇的规则,因为它急切地在所有行中加载某些查询,即使是那些使用定期提交的查询. ...

  3. (六)图数据neo4j之cypher(一)

    (六)图数据neo4j之cypher(一) 1.Cypher概述 cypher是一种声明式的图数据库查询语言,能高效的查询和更新图数据库,是依赖于模式的.所谓模式(Patterns)是就是众多节点和关 ...

  4. Neo4j常用Cypher查询语句

    Neo4j常用Cypher查询语句 作者:胡佳辉, CSDN博客:https://blog.csdn.net/gobitan [1] 查看图数据库中所有的标签 match (n) return dis ...

  5. neo4j cypher_Neo4j:Cypher – Neo.ClientError.Statement.TypeError:不知道如何添加Double和String...

    neo4j cypher 最近,我将支持Neo4j的应用程序从Neo4j 3.2升级到Neo4j 3.3,发现围绕类型强制的行为发生了有趣的变化,导致我的应用程序抛出了很多错误. 在Neo4j 3.2 ...

  6. neo4j cypher_Neo4j:使用Cypher生成实时建议

    neo4j cypher Neo4j的最常见用途之一是构建实时推荐引擎,一个共同的主题是它们利用大量不同的数据来提出有趣的推荐. 例如, 在此视频中, 阿曼达(Amanda)展示了约会网站如何通过社交 ...

  7. Neo4j:Cypher – Neo.ClientError.Statement.TypeError:不知道如何添加Double和String

    我最近将支持Neo4j的应用程序从Neo4j 3.2升级到Neo4j 3.3,发现围绕类型强制的行为发生了有趣的变化,导致我的应用程序抛出了很多错误. 在Neo4j 3.2和更早版本中,如果将Stri ...

  8. neo4j之cypher使用文档

    Cypher是图形数据库Neo4j的声明式查询语言. Cypher语句规则和具备的能力: Cypher通过模式匹配图数据库中的节点和关系,来提取信息或者修改数据. Cypher语句中允许使用变量,用来 ...

  9. Neo4j之Cypher概述

    第 3 章 Cypher 入门 这章的内容有些多,部分小节又会分为很多小片段,本章需要大家认真学习,重点掌握 3.1 Cypher 概述 3.1.1 Cypher是什么 Cypher 是一种声明式图数 ...

最新文章

  1. WebRequest 请求被中止: 请求已被取消。 错误解决方法
  2. JAVA实现在数据不匹配时把数据写到文件中
  3. 元素不包括_干货 | FDA法规对元素杂质的限度控制及计算方法
  4. 如何优雅地向导师/老板表示:“上周工作没什么进展”?
  5. Vue - 条件渲染与列表渲染
  6. 学习银行转账系统-代码摘取csdn
  7. orcad元件封装制作
  8. Linux入门基础学习参考资料
  9. HTML5+CSS:03优惠券
  10. 实现h5链接打开Android app
  11. 将多个文件合并为一个文件
  12. zookeeper启动报错already running as process处理
  13. docker容器满了,如何清理内存
  14. Synctoy2.1通过计划任务备份文件到网络驱动器注销不生效问题
  15. [BZOJ1513]Tet-Tetris 3D
  16. Bugku Misc 我永远喜欢穹妹
  17. 【往届EI已检索】2023年第三届应用数学、建模与智能计算国际研讨会(CAMMIC2023)
  18. java.sql.SQLException: ORA-00918: 未明确定义列ORA-00918
  19. C#与Json实现字符串和对象的互相转换
  20. TR101-290码流三级错误监测

热门文章

  1. P4430-小猴打架【perfer序列】
  2. hdu4965-Fast Matrix Calculation【矩阵乘法】
  3. 欢乐纪中A组周六赛【2019.5.25】
  4. P4945-最后的战役【dp,离散化】
  5. MongoDb连接表的查询
  6. Oracle入门(一)之入门级知识详解
  7. div中的table内容过多时不超出div的范围解决方法
  8. MySQL 5.7 聚合(GROUP BY)功能描述
  9. C#的protected internal
  10. 12-多对一添加操作(添加新客户及对应的新订单)