前序文章讲解了yelp dataset导入Neo4j的详细步骤,但实际操作过程中可能会遇到各种问题。

为了避免中间环节遇到的各类问题,选择直接读取json文件,解析需要的字段导入Neo4j。下文附上详细代码。

注意:

1、字段可能重复,所以代码中设置了独立的set用于去除重复的节点、关系导入;

2、可能会遇到特殊字符,比如名字或地址之间包含',"name":"Marco's Pizza",导入的时候就需要设置为双引号表示字符串。

import json
import csv
import py2neo
from py2neo import Graph,Node,Relationship
from base import openfile
import re
import reverse_geocoder as rg# connect to local Neo4j
graph=Graph("http://localhost:11006/",username="admin",password="password"
)def read_json(path,filename):f=open(path+filename,'r',encoding='utf-8')unique_cities = set ( )unique_categorys =set()unique_business=set()unique_users=set()unique_reviews=set()unique_business_cities=set()unique_business_categorys=set()unique_city_state=set()unique_state_country=set()for line in f.readlines():item=json.loads(line)if filename =='business.json':business_id = item['business_id']name = item['name']address = item['address']city = item['city']if business_id!="" and name !="" and address !="" and business_id not in unique_business:try :teststr="merge (Business:Business {business_id:" +'"'+ business_id +'"'+ ", name:" +'"'+ name +'"'+ ", address:" +'"'+ address +'"'+ "})"graph.run ("merge (Business:Business {business_id:" +'"'+ business_id +'"'+ ", name:" +'"'+ name +'"'+ ", address:" +'"'+ address +'"'+ "})" )#there may be many categorys, so each one has a business idfor category in item ["categories"].split ( ',' ) :unique_categorys.add ( category )unique_business_categorys.add(business_id+","+category)unique_business.add ( business_id )except :print ( "business写入Neo4j报错" )print(teststr)lat_longs = {}result={}if item ["latitude"] and item ["longitude"] :lat_longs [item ["business_id"]] = {"lat_long" : (item ["latitude"], item ["longitude"])}business_ids = list ( lat_longs.keys ( ) )for value in lat_longs.values ( ) :locations = rg.search ( value ["lat_long"] )for business_id, location in zip ( business_ids, locations ) :try:if city not in unique_cities and business_id in unique_business:graph.run ("merge (City:City {city:" +'"'+ city +'"'+ "})" )unique_cities.add(city)# if location["admin1"] not in unique_states:graph.run ("merge (State:State {state:" +'"'+ location["admin1"] +'"'+ "})" )# unique_states.add(location["admin1"])# if location["cc"] not in unique_countrys:graph.run ("merge (Country:Country {country:" +'"'+ location ["cc"] +'"'+ "})" )# unique_countrys.add(location["cc"])if city+","+location ["admin1"] not in unique_city_state and business_id in unique_business:graph.run ("match (s:State {state:" +'"'+ location ["admin1"] +'"'+ "}),(c:City {city:" +'"'+ city +'"'+ "})" + "create (c)-[:in_state]->(s)" )unique_city_state.add(city+","+location ["admin1"])# teststr="match (c:Country {country:'" + location ["cc"] + "'}),(s:State {state:'" + location ["admin1"] + "'})" + "create (s)-[:in_country]->(c)"if location ["admin1"]+","+location ["cc"] not in unique_state_country and business_id in unique_business:graph.run ("match (c:Country {country:" +'"'+ location ["cc"] +'"'+ "}),(s:State {state:" +'"'+ location ["admin1"] +'"'+ "})" + "create (s)-[:in_country]->(c)" )unique_state_country.add(location ["admin1"]+","+location ["cc"])if business_id+city not in unique_business_cities and business_id in unique_business:graph.run ("match (b:Business {business_id:" +'"'+ business_id +'"'+ "}),(c:City {city:" +'"'+ city +'"'+ "})" + "create (b)-[:in_city]->(c)" )unique_business_cities.add(business_id+city)except:print ( "location 写入报错" )continueelif filename =='user.json':user_id=item["user_id"]user_name=item["Rashmi"]for friends_id in item["friends"].split(','):if user_id not in unique_users:try :graph.run ("merge (User:User {user_id:" +'"'+ user_id +'"'+ ", name:"+'"' + user_name +'"'+ "})" )unique_users.add ( user_id )# create relations for friends should check duplicategraph.run ("match (b:User {user_id:"+'"'+ user_id +'"'+ "}),(a:User {user_id:" +'"'+ friends_id +'"'+ "})" + "create (b)-[:friends]->(a)" )except :print ( "用户写入报错" )continueelif filename == 'review.json' :review_id=item["review_id"]user_id=item["user_id"]business_id=item["business_id"]text=item["text"]stars=item["stars"]date1=item["date"]try :if review_id not in unique_reviews:graph.run ("merge (Review:Review {review_id:" +'"'+ review_id +'"'+ ", stars:" +'"'+ stars +'"'+",date:"+'"'+ date1 +'"'+", text:" +'"'+ text +'"'+ "})" )unique_reviews.add(review_id)# create relations for friends should check duplicategraph.run ("match (b:User {user_id:" +'"'+ user_id +'"'+ "}),(a:Review {review_id:" +'"'+ review_id +'"'+ "})" + "create (b)-[:write]->(a)" )graph.run ("match (b:Business {business_id:" +'"'+ business_id +'"'+ "}),(a:Review {review:" +'"'+ review_id +'"'+ "})" + "create (a)-[:reviews]->(b)" )except :print ( "评论写入报错" )continue# create category node and relations between business and categoryfor category in unique_categorys :try :graph.run ("merge (Category:Category {category:" +'"'+ category +'"'+ "})" )# graph.run (#     "match (b:Business {business_id:'" + business_id + "'}),(c:Category {category:'" + category + "'})" + "create (b)-[:belong_to]->(c)" )except :print ( "分类写入报错" )continuefor business_category in unique_business_categorys:try :graph.run ("match (b:Business {business_id:" +'"'+ business_category.split(',')[0] +'"'+ "}),(c:Category {category:" +'"'+ business_category.split(',')[1]  +'"'+ "})" + "create (b)-[:belong_to]->(c)" )except :print ( "business与分类关系写入报错" )continueif __name__ == '__main__' :# read jsonlist1=['business.json','review.json','user.json']path = 'D:/share/yelp/'for filename in list1:file=path+filenameread_json(path,filename)

yelp dataset导入Neo4j详解(二)相关推荐

  1. yelp dataset导入Neo4j详解

    Neo4j图算法第九章介绍了在Yelp数据集上进行算法实践,今天先介绍如何将Yelp数据集导入Neo4j. 1.Yelp数据集可以在https://www.yelp.com/dataset下载,只需要 ...

  2. 1.11.Flink DataSetAPI、DataSet API之Data Sources、DataSet API之Transformations、DataSet Sink部分详解

    1.11.Flink DataSetAPI 1.11.1.DataSet API之Data Sources 1.11.2.DataSet API之Transformations 1.11.3.Data ...

  3. 爬虫入门之urllib库详解(二)

    爬虫入门之urllib库详解(二) 1 urllib模块 urllib模块是一个运用于URL的包 urllib.request用于访问和读取URLS urllib.error包括了所有urllib.r ...

  4. Pytorch|YOWO原理及代码详解(二)

    Pytorch|YOWO原理及代码详解(二) 本博客上接,Pytorch|YOWO原理及代码详解(一),阅前可看. 1.正式训练 if opt.evaluate:logging('evaluating ...

  5. Windows 7防火墙设置详解(二)

    Windows 7防火墙设置详解(二) 一.高级安全Windows 防火墙MMC 依次点击"计算机"--"控制面板"--"Windows防火墙&quo ...

  6. 安卓 linux init.rc,[原创]Android init.rc文件解析过程详解(二)

    Android init.rc文件解析过程详解(二) 3.parse_new_section代码如下: void parse_new_section(struct parse_state *state ...

  7. [转]文件IO详解(二)---文件描述符(fd)和inode号的关系

    原文:https://www.cnblogs.com/frank-yxs/p/5925563.html 文件IO详解(二)---文件描述符(fd)和inode号的关系 ---------------- ...

  8. docker导入MySQL文件_Docker容器中Mysql数据的导入/导出详解

    前言 Mysql数据的导入导出我们都知道一个mysqldump命令就能够解决,但如果是运行在docker环境下的mysql呢? 解决办法其实还是用mysqldump命令,但是我们需要进入docker的 ...

  9. PopUpWindow使用详解(二)——进阶及答疑

    相关文章: 1.<PopUpWindow使用详解(一)--基本使用> 2.<PopUpWindow使用详解(二)--进阶及答疑> 上篇为大家基本讲述了有关PopupWindow ...

最新文章

  1. Oracle中joint,oracle support
  2. vue+axios请求时设置request header请求头(带上token)
  3. 如何自学python知乎-学习Python价格多少?如何学习好?老男孩IT教育
  4. 用sql取a与b的交集_【庖丁解牛SQL(二)】SQL核心语法速查
  5. 听孔文达老师《IT职业规划经验谈》WEBCAST笔记
  6. PHP基础知识------页面静态化
  7. torch 深度学习(5)
  8. 平面设计中有趣的词云图如何设计
  9. 计算机误删恢复软件,电脑误删文件恢复软件_手机数据恢复工具-万能数据恢复大师...
  10. 大话转岗 PHP 开发小结
  11. Unity摄像头仿真调研(svl)
  12. 小白学习Winform 遇到的问题总结
  13. gs--常见函数说明
  14. DXP_protel2004_原理图设计基础_集成运放原理图设计学习
  15. Godaddy 添加子域名
  16. 【VTM10.0】xPredIntraAng函数解析
  17. Libtorch的介绍与使用方法
  18. ubuntu 18使用国内版firefox
  19. ASCII表与字符编码
  20. Android 11.0 设置Camera2的相机拍照默认像素为1080P

热门文章

  1. 1星|《商业新物种·新零售》:疑似吴声拉了一堆软广告批发给了哈佛商业评论...
  2. Redis(SPEC文件打包)
  3. 【iOS】--手势操作
  4. PHP对象与MAP映射对象的实例
  5. 根据商品ID查询出单个商品
  6. 信号继电器 DX-31B DC220V
  7. 思维导图PPT有温度的学习工具
  8. Signing for xxx requires a development team. Select a development team
  9. 算法笔记_048:找零问题(Java)
  10. 怎样把WORD文档所含的图片单独保存下来