Neo4j图算法第九章介绍了在Yelp数据集上进行算法实践,今天先介绍如何将Yelp数据集导入Neo4j.

1.Yelp数据集可以在https://www.yelp.com/dataset下载,只需要填写简单的信息即可,也可以在https://pan.baidu.com/s/1n3PXAtOWqj1cS0XajZyruA下载;

2.解压后会得到如下图左侧json文件,下一步要将json文件转换为右侧csv文件;

3.json_to_csv

实际上https://github.com/mneedham/yelp-graph-algorithms/blob/master/README.adoc有介绍如何进行数据转换及导入,也开源了脚本,但我在运行这些脚本的时候总是报错,所以还是自己动手。

以下代码读取businessLocations.json,转换为area.csv、country.csv、city_IN_AREA_area.csv、area_IN_COUNTRY_country.csv。

import json
import csvdef read_json(path,filename):f=open(path+filename,'r',encoding='utf-8')# for line in f.readlines():#     dic=json.loads(line)content=json.load(f,strict=False)for item in content.items():admin1 = item [1] ['admin1']admin2 = item [1] ['admin2']city = item [1] ['city']country = item [1] ['country']name = item [1] ['name']with open ( "D:/share/yelp/area.json", "a+" ) as area_csv :area_writer = csv.writer ( area_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL,dialect='excel' )try :area_writer.writerow ( [admin1] )except :print ( "there is a error" )continuewith open ( "D:/share/yelp/country.json", "a+" ) as country_csv :country_writer = csv.writer ( country_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )try :country_writer.writerow ( [country] )except :continuewith open ( "D:/share/yelp/city_IN_AREA_area.json", "a" ) as city_area_csv :city_area_writer = csv.writer ( city_area_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )try :city_area_writer.writerow ( [city, admin1] )except :continuewith open ( "D:/share/yelp/area_IN_COUNTRY_country.json", "a" ) as area_country_csv :area_country_writer = csv.writer ( area_country_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )try :area_country_writer.writerow ( [admin1, country] )except :continueif __name__ == '__main__' :# read jsonpath = 'D:/share/yelp/businessLocations.json'read_json(path)

接下来生成以下文件。

business.csv
category.csv
user.csv
review.csv
city.csv
business_IN_CATEGORY_category.csv
user_FRIENDS_user.csv
user_WROTE_review.csv
review_REVIEWS_business.csv
business_IN_CITY_city.csv
import json
import csvdef read_json(path,filename):f=open(path+filename,'r',encoding='utf-8')unique_cities = set ( )unique_categorys =set()for line in f.readlines():item=json.loads(line)if filename =='business.json':with open ( "D:/share/yelp/business.csv", "a+" ) as business_csv :business_writer = csv.writer ( business_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL,dialect='excel' )try :business_writer.writerow ([item ['business_id'], item ['name'], item ['address'], item ['city'], item ['state']] )except :print ( "there is a error" )continueunique_cities.add(item["city"])with open ( "D:/share/yelp/business_IN_CITY_city.csv", "a+" ) as business_city_writer :business_city_writer = csv.writer ( business_city_writer, escapechar='\\', quotechar='"',quoting=csv.QUOTE_ALL,dialect='excel' )try :business_city_writer.writerow ( [item ["business_id"], item ["city"]] )except :print ( "there is a error" )continuewith open ( "D:/share/yelp/business_IN_CATEGORY_category.csv", "a+" ) as business_category_csv :business_category_writer = csv.writer ( business_category_csv, escapechar='\\', quotechar='"',quoting=csv.QUOTE_ALL,dialect='excel' )try :for category in item ["categories"].split(',') :unique_categorys.add ( category )business_category_writer.writerow ( [item ["business_id"], category] )except :print("there is a error")continueelif filename =='user.json':with open ( "D:/share/yelp/user.csv", "a" ) as user_csv, \open ( "D:/share/yelp/user_FRIENDS_user.csv", "a" ) as user_user_csv:user_writer = csv.writer ( user_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )user_user_writer = csv.writer ( user_user_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )try :user_writer.writerow ([item["user_id"], item["name"]])for friend_id in item ["friends"].split(',') :user_user_writer.writerow ( [item ["user_id"], friend_id] )except :continueelif filename == 'review.json' :with open ( "D:/share/yelp/review.csv", "a" ) as review_csv , \open ( "D:/share/yelp/user_WROTE_review.csv", "a" ) as user_review_csv, \open ( "D:/share/yelp/review_REVIEWS_business.csv", "a" ) as review_business_csv:review_writer = csv.writer ( review_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )user_review_writer = csv.writer ( user_review_csv, escapechar='\\', quotechar='"',quoting=csv.QUOTE_ALL )review_business_writer = csv.writer ( review_business_csv, escapechar='\\', quotechar='"',quoting=csv.QUOTE_ALL )try :review_writer.writerow ( [item ["review_id"], item ["text"], item ["stars"], item ["date"]] )user_review_writer.writerow ( [item ["user_id"], item ["review_id"]] )review_business_writer.writerow ( [item ["review_id"], item ["business_id"]] )except :continueif filename =='business.json':with open ( "D:/share/yelp/city.csv", "a" ) as city_csv :city_writer = csv.writer ( city_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )for city in unique_cities :city_writer.writerow ( [city] )with open ( "D:/share/yelp/category.csv", "a+" ) as categories_csv :category_writer = csv.writer ( categories_csv, escapechar='\\', quotechar='"', quoting=csv.QUOTE_ALL )for category in unique_categorys :try :category_writer.writerow ( [category] )except Exception as e :print ( category )continueif __name__ == '__main__' :# read jsonlist1=['business.json','review.json','user.json']path = 'D:/share/yelp/'for filename in list1:file=path+filenameread_json(path,filename)

除以上文件外,还需要创建header文件,见import.sh文件。

代码如下:

import json
import csvdef write_header(file_name, columns):with open(file_name, 'w') as file_csv:writer = csv.writer(file_csv)writer.writerow(columns)file_csv.close()if __name__ == '__main__' :write_header ("D:/share/yelp/area_header.csv",['name:ID(Area)'] )write_header ("D:/share/yelp/country_header.csv",['name:ID(Country)'] )write_header ("D:/share/yelp/city_IN_AREA_area_header.csv",[':START_ID(City)', ':END_ID(Area)'] )write_header ("D:/share/yelp/area_IN_COUNTRY_country_header.csv",[':START_ID(Area)', ':END_ID(Country)'] )write_header ("D:/share/yelp/business_header.csv",['id:ID(Business)', 'name', 'address', 'city', 'state'] )write_header ("D:/share/yelp/city_header.csv",['name:ID(City)'] )write_header ("D:/share/yelp/business_IN_CITY_city_header.csv",[':START_ID(Business)', ':END_ID(City)'] )write_header ("D:/share/yelp/category_header.csv",['name:ID(Category)'] )write_header ("D:/share/yelp/business_IN_CATEGORY_category_header.csv",[':START_ID(Business)', ':END_ID(Category)'] )write_header ("D:/share/yelp/user_header.csv",['id:ID(User)', 'name'] )write_header ("D:/share/yelp/user_FRIENDS_user_header.csv",[':START_ID(User)', ':END_ID(User)'] )write_header ("D:/share/yelp/review_header.csv",['id:ID(Review)', 'text', 'stars:int', 'date'] )write_header ("D:/share/yelp/user_WROTE_review_header.csv",[':START_ID(User)', ':END_ID(Review)'] )write_header ("D:/share/yelp/review_REVIEWS_business_header.csv",[':START_ID(Review)', ':END_ID(Business)'] )

4、所有csv文件生成以后,就可以执行导入操作了,直接执行import.sh。实际上我更加倾向于直接读取json文件将数据写入Neo4j,免得再转一道。

#!/usr/bin/env bashexport DATA=D:/share/yelp/./bin/neo4j-admin import \--mode=csv \--database=yelp.db \--nodes:Business $DATA/business_header.csv,$DATA/business.csv \--nodes:Category $DATA/category_header.csv,$DATA/category.csv \--nodes:User $DATA/user_header.csv,$DATA/user.csv \--nodes:Review $DATA/review_header.csv,$DATA/review.csv \--nodes:City $DATA/city_header.csv,$DATA/city.csv \--nodes:Area $DATA/area_header.csv,$DATA/area.csv \--nodes:Country $DATA/country_header.csv,$DATA/country.csv \--relationships:IN_CATEGORY $DATA/business_IN_CATEGORY_category_header.csv,$DATA/business_IN_CATEGORY_category.csv \--relationships:FRIENDS $DATA/user_FRIENDS_user_header.csv,$DATA/user_FRIENDS_user.csv \--relationships:WROTE $DATA/user_WROTE_review_header.csv,$DATA/user_WROTE_review.csv \--relationships:REVIEWS $DATA/review_REVIEWS_business_header.csv,$DATA/review_REVIEWS_business.csv \--relationships:IN_CITY $DATA/business_IN_CITY_city_header.csv,$DATA/business_IN_CITY_city.csv \--relationships:IN_AREA $DATA/city_IN_AREA_area_header.csv,$DATA/city_IN_AREA_area.csv \--relationships:IN_COUNTRY $DATA/area_IN_COUNTRY_country_header.csv,$DATA/area_IN_COUNTRY_country.csv \--ignore-missing-nodes=true \--multiline-fields=true

5、知识图谱设计如下图

数据导入之后就可以进行调用Neo4j自带图算法进行分析。

yelp dataset导入Neo4j详解相关推荐

  1. yelp dataset导入Neo4j详解(二)

    前序文章讲解了yelp dataset导入Neo4j的详细步骤,但实际操作过程中可能会遇到各种问题. 为了避免中间环节遇到的各类问题,选择直接读取json文件,解析需要的字段导入Neo4j.下文附上详 ...

  2. docker导入MySQL文件_Docker容器中Mysql数据的导入/导出详解

    前言 Mysql数据的导入导出我们都知道一个mysqldump命令就能够解决,但如果是运行在docker环境下的mysql呢? 解决办法其实还是用mysqldump命令,但是我们需要进入docker的 ...

  3. python导入模块的变量_python 环境变量和import模块导入方法(详解)

    1.定义 模块:本质就是.py结尾的文件(逻辑上组织python代码)模块的本质就是实现一个功能 文件名就是模块名称 包: 一个有__init__.py的文件夹:用来存放模块文件 2.导入模块 for ...

  4. 合法的python变量名import_python 环境变量和import模块导入方法(详解)

    1.定义 模块:本质就是.py结尾的文件(逻辑上组织python代码)模块的本质就是实现一个功能 文件名就是模块名称 包: 一个有__init__.py的文件夹:用来存放模块文件 2.导入模块 imp ...

  5. 1.11.Flink DataSetAPI、DataSet API之Data Sources、DataSet API之Transformations、DataSet Sink部分详解

    1.11.Flink DataSetAPI 1.11.1.DataSet API之Data Sources 1.11.2.DataSet API之Transformations 1.11.3.Data ...

  6. python 相对导入_Python相对导入机制详解

    Google FEB 26TH, 2015 Python相对导入机制详解 这个答案能解释大多关于 relative import,即相对导入的疑惑,讲解十分详尽清晰,算是 SO 上被低估的一个答案. ...

  7. mysql数据库导出_MySQL数据库导入导出详解[转发]

    1. 概述 MySQL数据库的导入,有两种方法: 1) 先导出数据库SQL脚本,再导入: 2) 直接拷贝数据库目录和文件. 在不同操作系统或MySQL版本情况下,直接拷贝文件的方法可能会有不兼容的情况 ...

  8. Android studio 导入项目详解 (简单快速)

    最近开课移动互联网应用开发,实验课老师发了代码让我们导入,在网上找了各种方法,发现不是每一个项目都适合,有些能够成功,有些还是有错,头大的很.后面发现一个比较简单的方法,没翻过车,新手可以试试O(∩_ ...

  9. IDEA导入scala详解

    IDEA的确是一款非常优秀的软件开发工具,但是对于新手来说增加一些插件总会遇到一些问题,下面主要描述IDEA集成scala开发 一.下载和IDEA相同版本的scala插件 查看自己的IEDA的版本(图 ...

最新文章

  1. 干货,Wireshark使用技巧-过滤规则
  2. add-migration Build failed.
  3. 如何开启OpenSSL和mcrypt
  4. Qt文档阅读笔记-Simple Anchor Layout Example解析
  5. torchtext处理文本数据——构造dataset读取文本(学习一)
  6. SVN 与 CVS 在【版本管理】上的区别~
  7. 在CRA中自定义webpack
  8. 为什么中国没有CES ? | 云栖大会科技榜单
  9. android前置摄像头拍摄,Android前置摄像头拍摄倒置照片
  10. 赫兹的单位换算_赫兹单位换算(赫兹的单位换算公式)
  11. 知识图谱论文读后感001
  12. JavaWeb前端: JavaScript 简介
  13. linux与手机ssh连接,linux之间连接—使用SSH
  14. FFmpeg5.0源码阅读之AVClass和AVOption
  15. 数据结构~07.栈和队列的基本概念
  16. jsp22216美食菜谱食谱网站系统mysql
  17. 驱动及驱动开发的简单理解
  18. everything的下载和使用
  19. VBA学习记忆点小记:www.51zxw.net
  20. ClickHouse数据库培训实战 (PB级大数据分析平台、大规模分布式集群架构)

热门文章

  1. 【关于时间序列的ML】项目 5 :用机器学习预测天气
  2. linux网卡数据流 发送与接收
  3. 西北大学应用计算机自考,西北大学自考的官方网站是???
  4. 监理工程师职责对比关系
  5. OSPF 报文 链路状态更新报文 LSU
  6. 根据商品ID查询出单个商品
  7. 利用Broadcast及相关组件实现简易音乐播放器功能
  8. 【Java】启动报错 Error:Kotlin: Module was compiled with an incompatible version of Kotlin...
  9. 物联网--思科模拟器--外草坪喷头
  10. vue的router-link和a标签的本质区别