爬虫--07:MongoDB
Crawler - 07: MongoDB
- MongoDB
- 一、概念
- 二、SQL与NoSQL的区别
- Mongo的优势
- 三、安装
- 四、Mongo的基本使用
- 1、查看数据库
- 2、使用/创建数据库
- 3、查看当前使用的数据库
- 4、看淡数据库当中的表
- 5、向当前数据库插入数据
- 6、删除数据库的数据
- 7、查看表中的数据
- 8、查看表是否存在上限
- 9、删除数据库的表
- 10插入数据补充
- 11、查询数据补充
- 12、比较运算符
- 13、逻辑运算符
- 14、操作查询结果
- 15、修改数据
- 16、删除数据
- 五、练习
- 六、聚合
- 1、查询数据
- 2、分组功能
- 3、查询每组含有的数量
- 4、查询每组含有的数据
- 5、查询范围性数据
- 6、跳过几个数据,指定查询几个数据
- 七、mongo创建索引
- 1、为什么要创建索引?
- 2、命令
- 查看索引
- 显示查询操作的详尽信息
- 创建索引
- 删除索引
- 八、python与mongo的交互
- 1、安装
- 2、导入模块
- 3、连接mongo
- 4、操作mongodb
- 5、代码总结
- 九、MongoDB与scrapy的交互
MongoDB
一、概念
- 非关系型数据库,保存数据非常灵活
- MongoDB是一个介于关系型数据库和非关系型数据库啊之间的产品,是非关系型数据库当中功能最丰富,最像关系型数据库的。它支持的数据结构非常松散,因此可以存储比较复杂的数据类型。Mongo最大的特点是它支持的查询语言非常强大,其语法有点类似于面向对象的查询语言,几乎可以实现类似关系数据库表单查询的绝大部分功能,而且还支持对数据建立索引。(索引)
二、SQL与NoSQL的区别
- SQL:数据库------表------数据
- NoSQL:数据库------集合(表)------文档(数据)
Mongo的优势
- 无数据结构的限制
- 业务开发比较方便
- 性能高
- 良好的支持
- 发展比价长,完善的文档
- 跨平台性好
三、安装
- MacOS系统安装mongodb教程链接:安装教程
- windows安装教程
- 下载msi文件
- 安装步骤:https://www.yuque.com/books/share/bc5d784a-9b7f-4b9a-bc78-510ce82b6a33/vezdrq
- 需要注意
- bin目录添加到path环境变量中
- 启动服务:
mongod --dbpath C:\Program Files\MongoDB\Server\4.4.5\data
- 链接命令:
mongo
- 链接成功显示:
MongoDB shell version v4.4.5
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("08893c5f-ca3a-40b5-948f-beadf475331f") }
MongoDB server version: 4.4.5
---
The server generated these startup warnings when booting:2021-04-16T05:36:24.282+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
---
四、Mongo的基本使用
1、查看数据库
show dbs MongoDB Enterprise > show dbs
admin 0.000GB
config 0.000GB
local 0.000GB以上三个数据库都是mongo自带的
2、使用/创建数据库
use adminMongoDB Enterprise > use admin
switched to db admin意思是使用admin数据库use demoMongoDB Enterprise > use demo
switched to db demo
MongoDB Enterprise > db
demo
MongoDB Enterprise > show dbs
admin 0.000GB
config 0.000GB
local 0.000GB此时use的功能是创建新的书库据demo
因为新建的数据库只存在内存之中,并没有保存在硬盘当中
新建的这个数据库是一个表或者集合
3、查看当前使用的数据库
dbMongoDB Enterprise > db
admin查看当前使用的数据库
4、看淡数据库当中的表
第一种
MongoDB Enterprise > show tables
system.version第二种
MongoDB Enterprise > show collections
system.versio
5、向当前数据库插入数据
非手动添加数据
MongoDB Enterprise > db
demo
MongoDB Enterprise > db.jerry.insert({s:1})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > show dbs
admin 0.000GB
config 0.000GB
demo 0.000GB
local 0.000GB手动添加数据
db.creatCollection(name, options)
name:表示集合(表)的名字------注意表的名字不能重复
options:表示可选参数,可以指定表的大小
MongoDB Enterprise > db.createCollection('wangjiaxin_cllection')
{ "ok" : 1 }
MongoDB Enterprise > db.createCollection('wangjiaxin1',{capped:true,size:4})
{ "ok" : 1 }
上述的size:4 表示该表最多插入6条数据
在mongo中如果字节小于256就默认是256个字节
6、删除数据库的数据
MongoDB Enterprise > show tables
jerry
MongoDB Enterprise > db.dropDatabase()
{ "dropped" : "demo", "ok" : 1 }
MongoDB Enterprise > show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
7、查看表中的数据
查询数据
db.name.find()# 手动添加的数据
MongoDB Enterprise > db.wangjiaaxin_collection.find()# 非手动添加的数据
MongoDB Enterprise > db.wangjiaxin.find()
{ "_id" : ObjectId("60793c2b9dc82dc5d7862223"), "x" : 1 }
8、查看表是否存在上限
MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin.isCapped()
false
返回false表示集合不存在上限
MongoDB Enterprise > db.wangjiaxin1.isCapped()
true
返回ture表示集合存在上限
9、删除数据库的表
MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin_cllection.drop()
true
MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
10插入数据补充
向数据库已经存在的表内插入数据(通过ID插入替换数据)
第一种方式:插入单条数据
db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
> db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',_id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
{ "_id" : 1, "name" : "wangjiaxin", "age" : 25, "gender" : "madl" }注意:主keyID不能重复_id:1第二种方式:插入多条数据
> db.wangjiaxin1.insert({name:'wangjiaxin',age:25,gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
> db.wangjiaxin1.insert([{name:'wangjiaxin',age:25},{name:'lirui',age:23}])
BulkWriteResult({"writeErrors" : [ ],"writeConcernErrors" : [ ],"nInserted" : 2,"nUpserted" : 0,"nMatched" : 0,"nModified" : 0,"nRemoved" : 0,"upserted" : [ ]
})
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7e5c4474722e11521252"), "name" : "wangjiaxin", "age" : 25 }
{ "_id" : ObjectId("607a7e5c4474722e11521253"), "name" : "lirui", "age" : 23 }
> 批量添加数据的方式
for(i=2;i<10;i++)db.wangjiaxin3.insert({x:i})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }根据主key去做数据更新
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
> db.wangjiaxin3.save({_id:ObjectId("607a7f824474722e11521254"),name:18,gender:'male'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
也有单独的插入功能
> db.wangjiaxin3.save({name:'abc',gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
{ "_id" : ObjectId("607a81d14474722e1152125c"), "name" : "abc", "gender" : "male" }
11、查询数据补充
查询数据库
show dbs 查询表
show tables/collection查询表里面的数据
db.name.find()
> db.stu.find({name:'张三'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }美化查询(格式化的查询打印)
> db.stu.find({name:'老李'}).pretty()
{"_id" : ObjectId("607a96864474722e1152125e"),"name" : "老李","hometown" : "广州","age" : 18,"gender" : false
}查询一条
> db.stu.findOne()
{"_id" : ObjectId("607a96864474722e1152125d"),"name" : "张三","hometown" : "长沙","age" : 20,"gender" : true
}条件查找db.stu.find({age:18})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
12、比较运算符
- 大于
> db.stu.find({age:{$gt:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
- 大于等于
> db.stu.find({age:{$gte:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
- 多条件查询
> db.stu.find({age:{$gte:18},hometown:'长沙'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
13、逻辑运算符
- 找到年龄或者性别符合
> db.stu.find({$or:[{age:{$gt:18}},{gender:false}]})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
- 范围判断
> db.stu.find({age:{$in:[18,28]}})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
14、操作查询结果
- 查询结果数量
> db.stu.find().count()
7
- limit查询指定数量
> db.stu.find().limit(2)
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
- skip跳过指定数量的数据
> db.stu.find().skip(2)
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
- 映射
> db.stu.find({},{age:1})
{ "_id" : ObjectId("607a96864474722e1152125d"), "age" : 20 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521260"), "age" : 40 }
{ "_id" : ObjectId("607a96864474722e11521261"), "age" : 16 }
{ "_id" : ObjectId("607a96864474722e11521262"), "age" : 45 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }> db.stu.find({},{age:1,_id:0})
{ "age" : 20 }
{ "age" : 18 }
{ "age" : 18 }
{ "age" : 40 }
{ "age" : 16 }
{ "age" : 45 }
{ "age" : 18 }> db.stu.find({age:18},{age:1,genfder:1})
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }
- 排序
升序
> db.stu.find().sort({age:1})
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }降序
> db.stu.find().sort({age:-1})
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
15、修改数据
db.name.update({query},{update},{multi:boolean})
query:查询条件
undate:更新的内容
multi:可选参数,默认是false 表示满足条件的第一条数据
ture:表示吧满足条件的数据都更新# 指定键值得修改
> db.stu.update({name:'张三'},{$set:{name:'zhangsan'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }# 普通修改> db.stu.update({name:'jerry'},{name:'wangjiaxin'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }满足条件修改
{multi:true}
> db.stu.update({},{$set:{gender:0}},{multi:true})
WriteResult({ "nMatched" : 7, "nUpserted" : 0, "nModified" : 7 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }
16、删除数据
条件删除
> db.stu.remove({name:'zhangsan'})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }删除其中一个
> db.stu.remove({age:18},{justOne:true})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }删除表里的所以有元素
> db.stu.remove({})
WriteResult({ "nRemoved" : 5 })
> db.stu.find()删除表
> show tables
stu
wangjiaxin
wangjiaxin1
wangjiaxin3
> db.stu.drop()
true
> show tables
wangjiaxin
wangjiaxin1
wangjiaxin3
五、练习
- 基础数据
> db.persons.find().pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"name" : "jim","age" : 25,"email" : "75431457@qq.com","c" : 89,"m" : 96,"e" : 87,"country" : "USA","books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"name" : "tom","age" : 25,"email" : "214557457@qq.com","c" : 75,"m" : 66,"e" : 97,"country" : "USA","books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"name" : "lili","age" : 26,"email" : "344521457@qq.com","c" : 75,"m" : 63,"e" : 97,"country" : "USA","books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"name" : "zhangsan","age" : 27,"email" : "2145567457@qq.com","c" : 89,"m" : 86,"e" : 67,"country" : "China","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"name" : "lisi","age" : 26,"email" : "274521457@qq.com","c" : 53,"m" : 96,"e" : 83,"country" : "China","books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"name" : "wangwu","age" : 27,"email" : "65621457@qq.com","c" : 45,"m" : 65,"e" : 99,"country" : "China","books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"name" : "zhaoliu","age" : 27,"email" : "214521457@qq.com","c" : 99,"m" : 96,"e" : 97,"country" : "China","books" : ["JS","JAVA","EXTJS","PHP"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),"name" : "piaoyingjun","age" : 26,"email" : "piaoyingjun@uspcat.com","c" : 39,"m" : 54,"e" : 53,"country" : "Korea","books" : ["JS","C#","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),"name" : "lizhenxian","age" : 27,"email" : "lizhenxian@uspcat.com","c" : 35,"m" : 56,"e" : 47,"country" : "Korea","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),"name" : "lixiaoli","age" : 21,"email" : "lixiaoli@uspcat.com","c" : 36,"m" : 86,"e" : 32,"country" : "Korea","books" : ["JS","JAVA","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),"name" : "zhangsuying","age" : 22,"email" : "zhangsuying@uspcat.com","c" : 45,"m" : 63,"e" : 77,"country" : "Korea","books" : ["JS","JAVA","C#","MONGODB"]
}
- 1.查询年龄大于25小于27的name,age
> db.persons.find({age:{$gt:25,$lt:27}},{name:1,age:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "age" : 26 }通过find()查找范围,在通过映射得到name,age
- 2.查询出不是美国的name
> db.persons.find({country:{$ne:'USA'}},{name:1,country:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "name" : "zhangsan", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "name" : "wangwu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "name" : "zhangsuying", "country" : "Korea" }同上
扩展:¥ne :含义是不等于
- 3.查询国籍是中国或者美国的学生信息
> db.persons.find({$or:[{country:'USA'},{country:'China'}]}).pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"name" : "jim","age" : 25,"email" : "75431457@qq.com","c" : 89,"m" : 96,"e" : 87,"country" : "USA","books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"name" : "tom","age" : 25,"email" : "214557457@qq.com","c" : 75,"m" : 66,"e" : 97,"country" : "USA","books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"name" : "lili","age" : 26,"email" : "344521457@qq.com","c" : 75,"m" : 63,"e" : 97,"country" : "USA","books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"name" : "zhangsan","age" : 27,"email" : "2145567457@qq.com","c" : 89,"m" : 86,"e" : 67,"country" : "China","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"name" : "lisi","age" : 26,"email" : "274521457@qq.com","c" : 53,"m" : 96,"e" : 83,"country" : "China","books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"name" : "wangwu","age" : 27,"email" : "65621457@qq.com","c" : 45,"m" : 65,"e" : 99,"country" : "China","books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"name" : "zhaoliu","age" : 27,"email" : "214521457@qq.com","c" : 99,"m" : 96,"e" : 97,"country" : "China","books" : ["JS","JAVA","EXTJS","PHP"]
}
- 4.查询语文成绩大于85或者英语成绩大于90的学生信息
> db.persons.find({$or:[{c:{$gt:85}},{e:{$gt:90}}]},{c:1,e:1,name:1})
- 5.查询出名字中存在"li"的学生信息
> db.persons.find({name:/li/},{name:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli" }
- 6.查询喜欢看MONGODB和PHP的学生
> db.persons.find({books:{$all:['MONGODB','PHP']}},{name:1,books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "books" : [ "JS", "C#", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
- 7.查询第二本书是JAVA的学生信息
> db.persons.find({books:'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }> db.persons.find({'books.1':'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
- 8.查询喜欢的书数量是4本的学生
> db.persons.find({books:{$size:4}},{books:1}).pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"books" : ["JS","JAVA","EXTJS","PHP"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),"books" : ["JS","C#","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),"books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),"books" : ["JS","JAVA","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),"books" : ["JS","JAVA","C#","MONGODB"]
}$size : 指定数量
- 9.查询出persons中的国家分别是什么
> db.persons.distinct('country')
[ "China", "Korea", "USA" ]
六、聚合
- 聚合是基于数据处理的聚合管道,每个文档通过一个由多个阶段组成的管道,可以对每个阶段的管道进行分组、过滤等功能,然后经过一系列的处理,输出相应的结果
1、查询数据
- 在使用聚合的时候使用
aggregate()
效率会更高
> db.stu.aggregate()
{ "_id" : ObjectId("607d1d7a8b3cfdc5eb3dd1e6"), "name" : "a", "hometown" : "东北", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da18b3cfdc5eb3dd1ea"), "name" : "e", "hometown" : "山东", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607d1dad8b3cfdc5eb3dd1ec"), "name" : "g", "hometown" : "大理", "age" : 18, "gender" : true }
2、分组功能
按照年龄来分组
> db.stu.aggregate({$group:{_id:'$age'}})
{ "_id" : 40 }
{ "_id" : 16 }
{ "_id" : 20 }
{ "_id" : 45 }
{ "_id" : 18 }按照性别来分组
> db.stu.aggregate({$group:{_id:'$gender'}})
{ "_id" : false }
{ "_id" : true }
3、查询每组含有的数量
> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1}}})
{ "_id" : false, "stu_count" : 2 }
{ "_id" : true, "stu_count" : 5 }注意:
1是正常的
2或者以上的其他则是原始数量的倍数
> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:2}}})
{ "_id" : false, "stu_count" : 4 }
{ "_id" : true, "stu_count" : 10 }
4、查询每组含有的数据
> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1},name:{$push:"$name"}}})
{ "_id" : true, "stu_count" : 5, "name" : [ "a", "d", "e", "f", "g" ] }
{ "_id" : false, "stu_count" : 2, "name" : [ "b", "c" ] }
5、查询范围性数据
> db.stu.aggregate({$match:{age:{$gt:20}}})
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown'}})
{ "_id" : "江苏" }
{ "_id" : "华山" }> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown',count:{$sum:1}}})
{ "_id" : "江苏", "count" : 1 }
{ "_id" : "华山", "count" : 1 }
6、跳过几个数据,指定查询几个数据
> db.stu.aggregate({$skip:1},{$limit:2})
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }
七、mongo创建索引
1、为什么要创建索引?
- 加快查询的效率问题(优化)
- 进行数据的去重
2、命令
查看索引
> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
还未创建索引,显示的效果
显示查询操作的详尽信息
> db.test.find({name:'test9999'}).explain('executionStats')
{"queryPlanner" : {"plannerVersion" : 1,"namespace" : "suoyin.test","indexFilterSet" : false,"parsedQuery" : {"name" : {"$eq" : "test9999"}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test9999"}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 46,"totalKeysExamined" : 0,"totalDocsExamined" : 100000,"executionStages" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test9999"}},"nReturned" : 1,"executionTimeMillisEstimate" : 1,"works" : 100002,"advanced" : 1,"needTime" : 100000,"needYield" : 0,"saveState" : 100,"restoreState" : 100,"isEOF" : 1,"direction" : "forward","docsExamined" : 100000}},"serverInfo" : {"host" : "wangjiaxindeMacBook-Pro-131.local","port" : 27017,"version" : "4.4.4","gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"},"ok" : 1
}
创建索引
> db.test.ensureIndex({name:1}){ "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1}> db.test.getIndexes()[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" }, { "v" : 2, "key" : { "name" : 1 }, "name" : "name_1" }]
删除索引
> db.test.dropIndex({name:1})
{ "nIndexesWas" : 2, "ok" : 1 }> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]> db.test.find({name:'test99999'}).explain('executionStats')
{"queryPlanner" : {"plannerVersion" : 1,"namespace" : "suoyin.test","indexFilterSet" : false,"parsedQuery" : {"name" : {"$eq" : "test99999"}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test99999"}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 47,"totalKeysExamined" : 0,"totalDocsExamined" : 100000,"executionStages" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test99999"}},"nReturned" : 1,"executionTimeMillisEstimate" : 1,"works" : 100002,"advanced" : 1,"needTime" : 100000,"needYield" : 0,"saveState" : 100,"restoreState" : 100,"isEOF" : 1,"direction" : "forward","docsExamined" : 100000}},"serverInfo" : {"host" : "wangjiaxindeMacBook-Pro-131.local","port" : 27017,"version" : "4.4.4","gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"},"ok" : 1
}
八、python与mongo的交互
1、安装
pip install pymongo
2、导入模块
import pymongo
3、连接mongo
mongo_client = pymongo.MongoClient(host='127.0.0.1', port=27017)
4、操作mongodb
mongo_client['Wjx']['like'].insert({'name': 'lirui'})
5、代码总结
- 插入单条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)if __name__ == '__main__':md = MongoData('like')md.add_one({'name':'pengli'})
- 插入多条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_idsif __name__ == '__main__':md = MongoData('like')r = md.add_many([{'x': i} for i in range(2)])print(r)
- 查询单条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_ids# 查询一条数据# query = None 表示无条件查询def get_one(self, query=None):if query is None:return self.db.find_one()else:return self.db.find_one(query)if __name__ == '__main__':md = MongoData('like')# md.add_one({'name':'pengli'})# r = md.add_many([{'x': i} for i in range(2)])# print(r)r = md.get_one({'name':'zhuqi'})print(r)
- 查询多条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_ids# 查询一条数据# query = None 表示无条件查询def get_one(self, query=None):if query is None:return self.db.find_one()else:return self.db.find_one(query)# 查询多条数据def get_many(self, query=None):if query is None:return self.db.find()else:return self.db.find(query)if __name__ == '__main__':md = MongoData('like')# md.add_one({'name':'pengli'})# r = md.add_many([{'x': i} for i in range(2)])# print(r)# r = md.get_one({'name':'zhuqi'})# print(r)r = md.get_many()for i in r:print(i)
九、MongoDB与scrapy的交互
- 爬虫文件
import scrapy
from chaoshenspider.items import ChaoshenspiderItemclass XintiantingSpider(scrapy.Spider):name = 'xintianting'allowed_domains = ['biduoxs.com']start_urls = ['https://www.biduoxs.com/biquge/51_51108/c20390104.html']def parse(self, response):chapter_name = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bookname"]/h1/text()').get()chapter_content = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@id="content"]/text()').getall()chapter_text = '\n'.join(chapter_content)# print(chapter_name)# print(chapter_text)item = ChaoshenspiderItem()item['chapter_name'] = chapter_nameitem['chapter_text'] = chapter_textyield item# 爬取下一章chapter_href = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bottem2"]/a/@href').getall()[2]# print(chapter_href)if chapter_href == '/biquge/51_51108/':passelse:chapter_url = response.urljoin(chapter_href)yield scrapy.Request(url=chapter_url,callback=self.parse)
- items.py
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass ChaoshenspiderItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()chapter_name = scrapy.Field()chapter_text = scrapy.Field()pass
- settings.py
# Scrapy settings for chaoshenspider project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlBOT_NAME = 'chaoshenspider'SPIDER_MODULES = ['chaoshenspider.spiders']
NEWSPIDER_MODULE = 'chaoshenspider.spiders'LOG_LEVEL = 'WARNING'# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'chaoshenspider (+http://www.yourdomain.com)'# Obey robots.txt rules
ROBOTSTXT_OBEY = False# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16# Disable cookies (enabled by default)
#COOKIES_ENABLED = False# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language': 'en',
}# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {# 'chaoshenspider.middlewares.ChaoshenspiderSpiderMiddleware': 543,
#}# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {# 'chaoshenspider.middlewares.ChaoshenspiderDownloaderMiddleware': 543,
#}# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {# 'scrapy.extensions.telnet.TelnetConsole': None,
#}# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {'chaoshenspider.pipelines.ChaoshenspiderPipeline': 300,
}MONGODB_HOST = '127.0.0.1'
MONGODB_PORT = 27017
MONGODB_DBNAME = 'fiction'
MONGODB_DBCNAME = '超神学院之新天庭'# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
- piplines.py
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html# useful for handling different item types with a single interface
import pymongo
from itemadapter import ItemAdapter
from chaoshenspider import settingsclass ChaoshenspiderPipeline:def __init__(self):host = settings.MONGODB_HOSTport = settings.MONGODB_PORTdbname = settings.MONGODB_DBNAMEdbcname = settings.MONGODB_DBCNAMEclient = pymongo.MongoClient(host=host, port=port) # 链接数据库fiction = client[dbname] # 指定数据库self.add = fiction[dbcname] # 指定表self.book = open('超神学院之新天庭.txt', 'w', encoding='utf-8')print('爬虫程序开始!')def process_item(self, item, spider):print(item['chapter_name']+'下载完成!')# 下载小说文件self.book.write(item['chapter_name']+'\n')self.book.write(item['chapter_text']+'\n\n')# 存放数据库data = dict(item)self.add.insert_one(data)return itemdef close_spider(self, item):self.book.close()print('爬虫程序结束!')
爬虫--07:MongoDB相关推荐
- 【Python爬虫】MongoDB爬虫实践:爬取虎扑论坛
MongoDB爬虫实践:爬取虎扑论坛 网站地址为:https://bbs.hupu.com/bxj 1.网站分析 首先,定位网页上帖子名称.帖子链接.作者.作者链接.创建时间.回复数目.浏览数目.最后 ...
- python爬虫--连接MongoDB 存数据
之前做爬虫 爬取贴吧松爱协会的内容是存在txt文件的 这个并不好 所以这一次存在Mongdb 这次是在windows 安在Mongodb里 官网下载 https://www.mongodb.com/d ...
- python做前端mongodb_Python爬虫之mongodb和python交互
mongodb和python交互 学习目标 掌握 mongdb和python交互的增删改查的方法 掌握 权限认证的方式使用pymongo模块 1. mongdb和python交互的模块 pymongo ...
- Python爬虫之MongoDB
目录 一.Mongo概述 二.安装&下载 1.下载: 2.安装 三.基本命令 数据库操作 创建表 插入数据 查询数据 修改数据 删除数据 索引 四.Python与MongoDB交互 1.安装p ...
- python3 [爬虫入门实战]爬虫之mongoDB数据库的安装配置与可视化
从安装过程到可视化工具可查看数据信息,历时两天,昨天坐了一天的火车,今天早上才到的青岛–> 来放松心情. 前天说是要学习如何使用mongoDB的链接与安装. 到今天过去了将一天, 不过还是在函兮 ...
- No.5 爬虫学习——MongoDB爬虫实践:虎扑论坛(唐松编《Python网络爬虫从入门到实践》P116-123)
题目:获取虎扑步行街论坛上所有帖子的数据,内容包括帖子名称.帖子链接.作者.作者链接.创建时间.回复数.浏览数.最后回复用户和最后回复时间,网络地址为:https://bbs.hupu.com/bxj ...
- python盗墓笔记爬虫爬虫scrapy_redis——MongoDB存储
目标网站:盗墓笔记小说网站 目标网址:http://www.daomubiji.com/ 目标内容:盗墓笔记小说的信息,具体内容包括:书标题章数章标题输出结果保存在MongoDB中 ######### ...
- 爬虫07 爬取阿里旅行特价机票
https://sjipiao.alitrip.com/cheap_flight_search.htm?tripType=0&depCityName=&depCity=&arr ...
- 最全Python培训课程,基础班+高级就业班+课件(数据分析、深度学习、爬虫、人工智能等) 精品课程
最新版Python全套培训课程视频,绝对零基础到Python大牛.包括:零基础得python基础班, 高阶进阶就业篇完整版(含:数据分析.机器学习.网络爬虫.数据挖掘.深度学习.人工智能等)课程共10 ...
- python书籍推荐-Python爬虫开发与项目实战
所属网站分类: 资源下载 > python电子书 作者:doit 链接: http://www.pythonheidong.com/blog/article/466/ 来源:python黑洞网 ...
最新文章
- leaflet地图框架
- geany配置python_Linux系统下搭建基于Geany+Python开发环境
- Makefile中常用的函数
- 转载 详解go语言GC
- 关于Hadoop多用户管理支持客户端远程操作的理论总结
- 《研磨设计模式》chap17 策略模式(1) 简介
- 从啤酒尿布到自动驾驶,零售行业如何再创营销神话?
- OpenCASCADE:扩展数据交换(XDE)的简介
- 【工业控制】什么是波形
- lpc2000 filash utility 程序烧写工具_单片机烧录程序的次数
- YOLOX目标检测模型Keras实现,超越Yolov5
- 微信小程序 后端返回数据为字符串,转json方法
- L. Collecting Diamonds
- 浪潮服务器 U盘安装 Windows server 2016系统
- C语言入门:查找子串
- Kubernetes集群功能演示:deployment的管理和kubectl的使用
- 学计算机的三本分数线,2020三本分数线
- Opencv-培训(一)
- 拒绝攀比 理性分期消费
- 【历史上的今天】8 月 23 日:计算机先驱诞生日;万维网面世 30 周年