Crawler - 07: MongoDB

  • MongoDB
    • 一、概念
    • 二、SQL与NoSQL的区别
      • Mongo的优势
    • 三、安装
    • 四、Mongo的基本使用
      • 1、查看数据库
      • 2、使用/创建数据库
      • 3、查看当前使用的数据库
      • 4、看淡数据库当中的表
      • 5、向当前数据库插入数据
      • 6、删除数据库的数据
      • 7、查看表中的数据
      • 8、查看表是否存在上限
      • 9、删除数据库的表
      • 10插入数据补充
      • 11、查询数据补充
      • 12、比较运算符
      • 13、逻辑运算符
      • 14、操作查询结果
      • 15、修改数据
      • 16、删除数据
    • 五、练习
    • 六、聚合
      • 1、查询数据
      • 2、分组功能
      • 3、查询每组含有的数量
      • 4、查询每组含有的数据
      • 5、查询范围性数据
      • 6、跳过几个数据,指定查询几个数据
    • 七、mongo创建索引
      • 1、为什么要创建索引?
      • 2、命令
        • 查看索引
        • 显示查询操作的详尽信息
          • 创建索引
        • 删除索引
    • 八、python与mongo的交互
      • 1、安装
      • 2、导入模块
      • 3、连接mongo
        • 4、操作mongodb
      • 5、代码总结
    • 九、MongoDB与scrapy的交互

MongoDB

一、概念

  • 非关系型数据库,保存数据非常灵活
  • MongoDB是一个介于关系型数据库和非关系型数据库啊之间的产品,是非关系型数据库当中功能最丰富,最像关系型数据库的。它支持的数据结构非常松散,因此可以存储比较复杂的数据类型。Mongo最大的特点是它支持的查询语言非常强大,其语法有点类似于面向对象的查询语言,几乎可以实现类似关系数据库表单查询的绝大部分功能,而且还支持对数据建立索引。(索引)

二、SQL与NoSQL的区别

  • SQL:数据库------表------数据
  • NoSQL:数据库------集合(表)------文档(数据)

Mongo的优势

  • 无数据结构的限制

    • 业务开发比较方便
  • 性能高
  • 良好的支持
    • 发展比价长,完善的文档
    • 跨平台性好

三、安装

  • MacOS系统安装mongodb教程链接:安装教程
  • windows安装教程
    • 下载msi文件
    • 安装步骤:https://www.yuque.com/books/share/bc5d784a-9b7f-4b9a-bc78-510ce82b6a33/vezdrq
    • 需要注意
      • bin目录添加到path环境变量中
    • 启动服务:mongod --dbpath C:\Program Files\MongoDB\Server\4.4.5\data
    • 链接命令:mongo
    • 链接成功显示:
MongoDB shell version v4.4.5
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("08893c5f-ca3a-40b5-948f-beadf475331f") }
MongoDB server version: 4.4.5
---
The server generated these startup warnings when booting:2021-04-16T05:36:24.282+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
---

四、Mongo的基本使用

1、查看数据库

show dbs MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB以上三个数据库都是mongo自带的

2、使用/创建数据库

use adminMongoDB Enterprise > use admin
switched to db admin意思是使用admin数据库use demoMongoDB Enterprise > use demo
switched to db demo
MongoDB Enterprise > db
demo
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB此时use的功能是创建新的书库据demo
因为新建的数据库只存在内存之中,并没有保存在硬盘当中
新建的这个数据库是一个表或者集合

3、查看当前使用的数据库

dbMongoDB Enterprise > db
admin查看当前使用的数据库

4、看淡数据库当中的表

第一种
MongoDB Enterprise > show tables
system.version第二种
MongoDB Enterprise > show collections
system.versio

5、向当前数据库插入数据

非手动添加数据
MongoDB Enterprise > db
demo
MongoDB Enterprise > db.jerry.insert({s:1})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
demo    0.000GB
local   0.000GB手动添加数据
db.creatCollection(name, options)
name:表示集合(表)的名字------注意表的名字不能重复
options:表示可选参数,可以指定表的大小
MongoDB Enterprise > db.createCollection('wangjiaxin_cllection')
{ "ok" : 1 }
MongoDB Enterprise > db.createCollection('wangjiaxin1',{capped:true,size:4})
{ "ok" : 1 }
上述的size:4  表示该表最多插入6条数据
在mongo中如果字节小于256就默认是256个字节

6、删除数据库的数据

MongoDB Enterprise > show tables
jerry
MongoDB Enterprise > db.dropDatabase()
{ "dropped" : "demo", "ok" : 1 }
MongoDB Enterprise > show dbs
admin   0.000GB
config  0.000GB
local   0.000GB

7、查看表中的数据

查询数据
db.name.find()# 手动添加的数据
MongoDB Enterprise > db.wangjiaaxin_collection.find()# 非手动添加的数据
MongoDB Enterprise > db.wangjiaxin.find()
{ "_id" : ObjectId("60793c2b9dc82dc5d7862223"), "x" : 1 }

8、查看表是否存在上限

MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin.isCapped()
false
返回false表示集合不存在上限
MongoDB Enterprise > db.wangjiaxin1.isCapped()
true
返回ture表示集合存在上限

9、删除数据库的表

MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1
wangjiaxin_cllection
MongoDB Enterprise > db.wangjiaxin_cllection.drop()
true
MongoDB Enterprise > show tables
wangjiaxin
wangjiaxin1

10插入数据补充

向数据库已经存在的表内插入数据(通过ID插入替换数据)
第一种方式:插入单条数据
db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
> db.wangjiaxin.insert({name:'wangjiaxin',age:25,gender:'madl',_id:1})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin.find()
{ "_id" : ObjectId("607a764e4474722e1152124d"), "x" : 1 }
{ "_id" : ObjectId("607a79704474722e1152124e"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7a264474722e1152124f"), "name" : "wangjiaxin", "age" : 25, "gender" : "madl", "id" : 1 }
{ "_id" : 1, "name" : "wangjiaxin", "age" : 25, "gender" : "madl" }注意:主keyID不能重复_id:1第二种方式:插入多条数据
> db.wangjiaxin1.insert({name:'wangjiaxin',age:25,gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
> db.wangjiaxin1.insert([{name:'wangjiaxin',age:25},{name:'lirui',age:23}])
BulkWriteResult({"writeErrors" : [ ],"writeConcernErrors" : [ ],"nInserted" : 2,"nUpserted" : 0,"nMatched" : 0,"nModified" : 0,"nRemoved" : 0,"upserted" : [ ]
})
> db.wangjiaxin1.find()
{ "_id" : ObjectId("607a7df54474722e11521251"), "name" : "wangjiaxin", "age" : 25, "gender" : "male" }
{ "_id" : ObjectId("607a7e5c4474722e11521252"), "name" : "wangjiaxin", "age" : 25 }
{ "_id" : ObjectId("607a7e5c4474722e11521253"), "name" : "lirui", "age" : 23 }
> 批量添加数据的方式
for(i=2;i<10;i++)db.wangjiaxin3.insert({x:i})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }根据主key去做数据更新
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "x" : 2 }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
> db.wangjiaxin3.save({_id:ObjectId("607a7f824474722e11521254"),name:18,gender:'male'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
也有单独的插入功能
> db.wangjiaxin3.save({name:'abc',gender:'male'})
WriteResult({ "nInserted" : 1 })
> db.wangjiaxin3.find()
{ "_id" : ObjectId("607a7f824474722e11521254"), "name" : 18, "gender" : "male" }
{ "_id" : ObjectId("607a7f824474722e11521255"), "x" : 3 }
{ "_id" : ObjectId("607a7f824474722e11521256"), "x" : 4 }
{ "_id" : ObjectId("607a7f824474722e11521257"), "x" : 5 }
{ "_id" : ObjectId("607a7f824474722e11521258"), "x" : 6 }
{ "_id" : ObjectId("607a7f824474722e11521259"), "x" : 7 }
{ "_id" : ObjectId("607a7f824474722e1152125a"), "x" : 8 }
{ "_id" : ObjectId("607a7f824474722e1152125b"), "x" : 9 }
{ "_id" : ObjectId("607a81d14474722e1152125c"), "name" : "abc", "gender" : "male" }

11、查询数据补充

查询数据库
show dbs  查询表
show tables/collection查询表里面的数据
db.name.find()
> db.stu.find({name:'张三'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }美化查询(格式化的查询打印)
> db.stu.find({name:'老李'}).pretty()
{"_id" : ObjectId("607a96864474722e1152125e"),"name" : "老李","hometown" : "广州","age" : 18,"gender" : false
}查询一条
> db.stu.findOne()
{"_id" : ObjectId("607a96864474722e1152125d"),"name" : "张三","hometown" : "长沙","age" : 20,"gender" : true
}条件查找db.stu.find({age:18})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

12、比较运算符

  • 大于
> db.stu.find({age:{$gt:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
  • 大于等于
> db.stu.find({age:{$gte:18}})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
  • 多条件查询
> db.stu.find({age:{$gte:18},hometown:'长沙'})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }

13、逻辑运算符

  • 找到年龄或者性别符合
> db.stu.find({$or:[{age:{$gt:18}},{gender:false}]})
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
  • 范围判断
> db.stu.find({age:{$in:[18,28]}})
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }

14、操作查询结果

  • 查询结果数量
> db.stu.find().count()
7
  • limit查询指定数量
> db.stu.find().limit(2)
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
  • skip跳过指定数量的数据
> db.stu.find().skip(2)
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
  • 映射
> db.stu.find({},{age:1})
{ "_id" : ObjectId("607a96864474722e1152125d"), "age" : 20 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521260"), "age" : 40 }
{ "_id" : ObjectId("607a96864474722e11521261"), "age" : 16 }
{ "_id" : ObjectId("607a96864474722e11521262"), "age" : 45 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }> db.stu.find({},{age:1,_id:0})
{ "age" : 20 }
{ "age" : 18 }
{ "age" : 18 }
{ "age" : 40 }
{ "age" : 16 }
{ "age" : 45 }
{ "age" : 18 }> db.stu.find({age:18},{age:1,genfder:1})
{ "_id" : ObjectId("607a96864474722e1152125e"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "age" : 18 }
{ "_id" : ObjectId("607a96864474722e11521263"), "age" : 18 }
  • 排序
升序
> db.stu.find().sort({age:1})
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }降序
> db.stu.find().sort({age:-1})
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "jerry", "hometown" : "长沙", "age" : 16, "gender" : true }

15、修改数据

db.name.update({query},{update},{multi:boolean})
query:查询条件
undate:更新的内容
multi:可选参数,默认是false 表示满足条件的第一条数据
ture:表示吧满足条件的数据都更新# 指定键值得修改
> db.stu.update({name:'张三'},{$set:{name:'zhangsan'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }# 普通修改> db.stu.update({name:'jerry'},{name:'wangjiaxin'})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "张三", "hometown" : "长沙", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin" }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : true }满足条件修改
{multi:true}
> db.stu.update({},{$set:{gender:0}},{multi:true})
WriteResult({ "nMatched" : 7, "nUpserted" : 0, "nModified" : 7 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125d"), "name" : "zhangsan", "hometown" : "长沙", "age" : 20, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }

16、删除数据

条件删除
> db.stu.remove({name:'zhangsan'})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125e"), "name" : "老李", "hometown" : "广州", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }删除其中一个
> db.stu.remove({age:18},{justOne:true})
WriteResult({ "nRemoved" : 1 })
> db.stu.find()
{ "_id" : ObjectId("607a96864474722e1152125f"), "name" : "王麻子", "hometown" : "北京", "age" : 18, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521260"), "name" : "刘六", "hometown" : "深圳", "age" : 40, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521261"), "name" : "wangjiaxin", "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521262"), "name" : "小永", "hometown" : "广州", "age" : 45, "gender" : 0 }
{ "_id" : ObjectId("607a96864474722e11521263"), "name" : "老amy", "hometown" : "衡阳", "age" : 18, "gender" : 0 }删除表里的所以有元素
> db.stu.remove({})
WriteResult({ "nRemoved" : 5 })
> db.stu.find()删除表
> show tables
stu
wangjiaxin
wangjiaxin1
wangjiaxin3
> db.stu.drop()
true
> show tables
wangjiaxin
wangjiaxin1
wangjiaxin3

五、练习

  • 基础数据
> db.persons.find().pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"name" : "jim","age" : 25,"email" : "75431457@qq.com","c" : 89,"m" : 96,"e" : 87,"country" : "USA","books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"name" : "tom","age" : 25,"email" : "214557457@qq.com","c" : 75,"m" : 66,"e" : 97,"country" : "USA","books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"name" : "lili","age" : 26,"email" : "344521457@qq.com","c" : 75,"m" : 63,"e" : 97,"country" : "USA","books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"name" : "zhangsan","age" : 27,"email" : "2145567457@qq.com","c" : 89,"m" : 86,"e" : 67,"country" : "China","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"name" : "lisi","age" : 26,"email" : "274521457@qq.com","c" : 53,"m" : 96,"e" : 83,"country" : "China","books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"name" : "wangwu","age" : 27,"email" : "65621457@qq.com","c" : 45,"m" : 65,"e" : 99,"country" : "China","books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"name" : "zhaoliu","age" : 27,"email" : "214521457@qq.com","c" : 99,"m" : 96,"e" : 97,"country" : "China","books" : ["JS","JAVA","EXTJS","PHP"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),"name" : "piaoyingjun","age" : 26,"email" : "piaoyingjun@uspcat.com","c" : 39,"m" : 54,"e" : 53,"country" : "Korea","books" : ["JS","C#","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),"name" : "lizhenxian","age" : 27,"email" : "lizhenxian@uspcat.com","c" : 35,"m" : 56,"e" : 47,"country" : "Korea","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),"name" : "lixiaoli","age" : 21,"email" : "lixiaoli@uspcat.com","c" : 36,"m" : 86,"e" : 32,"country" : "Korea","books" : ["JS","JAVA","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),"name" : "zhangsuying","age" : 22,"email" : "zhangsuying@uspcat.com","c" : 45,"m" : 63,"e" : 77,"country" : "Korea","books" : ["JS","JAVA","C#","MONGODB"]
}
  • 1.查询年龄大于25小于27的name,age
> db.persons.find({age:{$gt:25,$lt:27}},{name:1,age:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "age" : 26 }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "age" : 26 }通过find()查找范围,在通过映射得到name,age
  • 2.查询出不是美国的name
> db.persons.find({country:{$ne:'USA'}},{name:1,country:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "name" : "zhangsan", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "name" : "wangwu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu", "country" : "China" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d15"), "name" : "piaoyingjun", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "country" : "Korea" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "name" : "zhangsuying", "country" : "Korea" }同上
扩展:¥ne :含义是不等于
  • 3.查询国籍是中国或者美国的学生信息
> db.persons.find({$or:[{country:'USA'},{country:'China'}]}).pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"name" : "jim","age" : 25,"email" : "75431457@qq.com","c" : 89,"m" : 96,"e" : 87,"country" : "USA","books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"name" : "tom","age" : 25,"email" : "214557457@qq.com","c" : 75,"m" : 66,"e" : 97,"country" : "USA","books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"name" : "lili","age" : 26,"email" : "344521457@qq.com","c" : 75,"m" : 63,"e" : 97,"country" : "USA","books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"name" : "zhangsan","age" : 27,"email" : "2145567457@qq.com","c" : 89,"m" : 86,"e" : 67,"country" : "China","books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"name" : "lisi","age" : 26,"email" : "274521457@qq.com","c" : 53,"m" : 96,"e" : 83,"country" : "China","books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"name" : "wangwu","age" : 27,"email" : "65621457@qq.com","c" : 45,"m" : 65,"e" : 99,"country" : "China","books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"name" : "zhaoliu","age" : 27,"email" : "214521457@qq.com","c" : 99,"m" : 96,"e" : 97,"country" : "China","books" : ["JS","JAVA","EXTJS","PHP"]
}
  • 4.查询语文成绩大于85或者英语成绩大于90的学生信息
> db.persons.find({$or:[{c:{$gt:85}},{e:{$gt:90}}]},{c:1,e:1,name:1})
  • 5.查询出名字中存在"li"的学生信息
> db.persons.find({name:/li/},{name:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "name" : "lili" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "name" : "zhaoliu" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "name" : "lizhenxian" }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli" }
  • 6.查询喜欢看MONGODB和PHP的学生
> db.persons.find({books:{$all:['MONGODB','PHP']}},{name:1,books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d12"), "name" : "lisi", "books" : [ "JS", "C#", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "name" : "lixiaoli", "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
  • 7.查询第二本书是JAVA的学生信息
> db.persons.find({books:'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }> db.persons.find({'books.1':'JAVA'},{books:1})
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"), "books" : [ "PHP", "JAVA", "EXTJS", "C++" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d10"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d11"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d13"), "books" : [ "JS", "JAVA", "C++", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d14"), "books" : [ "JS", "JAVA", "EXTJS", "PHP" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d16"), "books" : [ "JS", "JAVA", "EXTJS", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d17"), "books" : [ "JS", "JAVA", "PHP", "MONGODB" ] }
{ "_id" : ObjectId("607c1a14cd2a2ff6578a8d18"), "books" : [ "JS", "JAVA", "C#", "MONGODB" ] }
  • 8.查询喜欢的书数量是4本的学生
> db.persons.find({books:{$size:4}},{books:1}).pretty()
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0e"),"books" : ["JS","C++","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d0f"),"books" : ["PHP","JAVA","EXTJS","C++"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d10"),"books" : ["JS","JAVA","C#","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d11"),"books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d12"),"books" : ["JS","C#","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d13"),"books" : ["JS","JAVA","C++","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d14"),"books" : ["JS","JAVA","EXTJS","PHP"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d15"),"books" : ["JS","C#","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d16"),"books" : ["JS","JAVA","EXTJS","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d17"),"books" : ["JS","JAVA","PHP","MONGODB"]
}
{"_id" : ObjectId("607c1a14cd2a2ff6578a8d18"),"books" : ["JS","JAVA","C#","MONGODB"]
}$size  :   指定数量
  • 9.查询出persons中的国家分别是什么
>  db.persons.distinct('country')
[ "China", "Korea", "USA" ]

六、聚合

  • 聚合是基于数据处理的聚合管道,每个文档通过一个由多个阶段组成的管道,可以对每个阶段的管道进行分组、过滤等功能,然后经过一系列的处理,输出相应的结果

1、查询数据

  • 在使用聚合的时候使用aggregate()效率会更高
> db.stu.aggregate()
{ "_id" : ObjectId("607d1d7a8b3cfdc5eb3dd1e6"), "name" : "a", "hometown" : "东北", "age" : 20, "gender" : true }
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da18b3cfdc5eb3dd1ea"), "name" : "e", "hometown" : "山东", "age" : 16, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }
{ "_id" : ObjectId("607d1dad8b3cfdc5eb3dd1ec"), "name" : "g", "hometown" : "大理", "age" : 18, "gender" : true }

2、分组功能

按照年龄来分组
> db.stu.aggregate({$group:{_id:'$age'}})
{ "_id" : 40 }
{ "_id" : 16 }
{ "_id" : 20 }
{ "_id" : 45 }
{ "_id" : 18 }按照性别来分组
> db.stu.aggregate({$group:{_id:'$gender'}})
{ "_id" : false }
{ "_id" : true }

3、查询每组含有的数量

> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1}}})
{ "_id" : false, "stu_count" : 2 }
{ "_id" : true, "stu_count" : 5 }注意:
1是正常的
2或者以上的其他则是原始数量的倍数
> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:2}}})
{ "_id" : false, "stu_count" : 4 }
{ "_id" : true, "stu_count" : 10 }

4、查询每组含有的数据

> db.stu.aggregate({$group:{_id:'$gender',stu_count:{$sum:1},name:{$push:"$name"}}})
{ "_id" : true, "stu_count" : 5, "name" : [ "a", "d", "e", "f", "g" ] }
{ "_id" : false, "stu_count" : 2, "name" : [ "b", "c" ] }

5、查询范围性数据

> db.stu.aggregate({$match:{age:{$gt:20}}})
{ "_id" : ObjectId("607d1d9a8b3cfdc5eb3dd1e9"), "name" : "d", "hometown" : "华山", "age" : 40, "gender" : true }
{ "_id" : ObjectId("607d1da68b3cfdc5eb3dd1eb"), "name" : "f", "hometown" : "江苏", "age" : 45, "gender" : true }> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown'}})
{ "_id" : "江苏" }
{ "_id" : "华山" }> db.stu.aggregate({$match:{age:{$gt:20}}},{$group:{_id:'$hometown',count:{$sum:1}}})
{ "_id" : "江苏", "count" : 1 }
{ "_id" : "华山", "count" : 1 }

6、跳过几个数据,指定查询几个数据

> db.stu.aggregate({$skip:1},{$limit:2})
{ "_id" : ObjectId("607d1d8a8b3cfdc5eb3dd1e7"), "name" : "b", "hometown" : "长沙", "age" : 18, "gender" : false }
{ "_id" : ObjectId("607d1d938b3cfdc5eb3dd1e8"), "name" : "c", "hometown" : "武汉", "age" : 18, "gender" : false }

七、mongo创建索引

1、为什么要创建索引?

  • 加快查询的效率问题(优化)
  • 进行数据的去重

2、命令

查看索引

> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
还未创建索引,显示的效果

显示查询操作的详尽信息

> db.test.find({name:'test9999'}).explain('executionStats')
{"queryPlanner" : {"plannerVersion" : 1,"namespace" : "suoyin.test","indexFilterSet" : false,"parsedQuery" : {"name" : {"$eq" : "test9999"}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test9999"}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 46,"totalKeysExamined" : 0,"totalDocsExamined" : 100000,"executionStages" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test9999"}},"nReturned" : 1,"executionTimeMillisEstimate" : 1,"works" : 100002,"advanced" : 1,"needTime" : 100000,"needYield" : 0,"saveState" : 100,"restoreState" : 100,"isEOF" : 1,"direction" : "forward","docsExamined" : 100000}},"serverInfo" : {"host" : "wangjiaxindeMacBook-Pro-131.local","port" : 27017,"version" : "4.4.4","gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"},"ok" : 1
}
创建索引
> db.test.ensureIndex({name:1}){  "createdCollectionAutomatically" : false,  "numIndexesBefore" : 1,  "numIndexesAfter" : 2,  "ok" : 1}​> db.test.getIndexes()[  {    "v" : 2,    "key" : {      "_id" : 1    },    "name" : "_id_"  },  {    "v" : 2,    "key" : {      "name" : 1    },    "name" : "name_1"  }]

删除索引

> db.test.dropIndex({name:1})
{ "nIndexesWas" : 2, "ok" : 1 }> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]> db.test.find({name:'test99999'}).explain('executionStats')
{"queryPlanner" : {"plannerVersion" : 1,"namespace" : "suoyin.test","indexFilterSet" : false,"parsedQuery" : {"name" : {"$eq" : "test99999"}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test99999"}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 47,"totalKeysExamined" : 0,"totalDocsExamined" : 100000,"executionStages" : {"stage" : "COLLSCAN","filter" : {"name" : {"$eq" : "test99999"}},"nReturned" : 1,"executionTimeMillisEstimate" : 1,"works" : 100002,"advanced" : 1,"needTime" : 100000,"needYield" : 0,"saveState" : 100,"restoreState" : 100,"isEOF" : 1,"direction" : "forward","docsExamined" : 100000}},"serverInfo" : {"host" : "wangjiaxindeMacBook-Pro-131.local","port" : 27017,"version" : "4.4.4","gitVersion" : "8db30a63db1a9d84bdcad0c83369623f708e0397"},"ok" : 1
}

八、python与mongo的交互

1、安装

pip install pymongo

2、导入模块

import pymongo

3、连接mongo

mongo_client = pymongo.MongoClient(host='127.0.0.1', port=27017)

4、操作mongodb

mongo_client['Wjx']['like'].insert({'name': 'lirui'})

5、代码总结

  • 插入单条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)if __name__ == '__main__':md = MongoData('like')md.add_one({'name':'pengli'})
  • 插入多条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_idsif __name__ == '__main__':md = MongoData('like')r = md.add_many([{'x': i} for i in range(2)])print(r)
  • 查询单条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_ids# 查询一条数据# query = None 表示无条件查询def get_one(self, query=None):if query is None:return self.db.find_one()else:return self.db.find_one(query)if __name__ == '__main__':md = MongoData('like')# md.add_one({'name':'pengli'})# r = md.add_many([{'x': i} for i in range(2)])# print(r)r = md.get_one({'name':'zhuqi'})print(r)
  • 查询多条数据
import pymongoclass MongoData():def __init__(self, name):# 连接数据库self.client = pymongo.MongoClient(host='127.0.0.1', port=27017)# 选择数据库self.db = self.client['Wjx'][name]# 插入单条数据def add_one(self, data):result = self.db.insert_one(data)print(result.inserted_id)# 插入多条数据def add_many(self, data):result = self.db.insert_many(data)return result.inserted_ids# 查询一条数据# query = None 表示无条件查询def get_one(self, query=None):if query is None:return self.db.find_one()else:return self.db.find_one(query)# 查询多条数据def get_many(self, query=None):if query is None:return self.db.find()else:return self.db.find(query)if __name__ == '__main__':md = MongoData('like')# md.add_one({'name':'pengli'})# r = md.add_many([{'x': i} for i in range(2)])# print(r)# r = md.get_one({'name':'zhuqi'})# print(r)r = md.get_many()for i in r:print(i)

九、MongoDB与scrapy的交互

  • 爬虫文件
import scrapy
from chaoshenspider.items import ChaoshenspiderItemclass XintiantingSpider(scrapy.Spider):name = 'xintianting'allowed_domains = ['biduoxs.com']start_urls = ['https://www.biduoxs.com/biquge/51_51108/c20390104.html']def parse(self, response):chapter_name = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bookname"]/h1/text()').get()chapter_content = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@id="content"]/text()').getall()chapter_text = '\n'.join(chapter_content)# print(chapter_name)# print(chapter_text)item = ChaoshenspiderItem()item['chapter_name'] = chapter_nameitem['chapter_text'] = chapter_textyield item# 爬取下一章chapter_href = response.xpath('//div[@class="content_read"]/div[@class="box_con"]/div[@class="bottem2"]/a/@href').getall()[2]# print(chapter_href)if chapter_href == '/biquge/51_51108/':passelse:chapter_url = response.urljoin(chapter_href)yield scrapy.Request(url=chapter_url,callback=self.parse)
  • items.py
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass ChaoshenspiderItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()chapter_name = scrapy.Field()chapter_text = scrapy.Field()pass
  • settings.py
# Scrapy settings for chaoshenspider project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlBOT_NAME = 'chaoshenspider'SPIDER_MODULES = ['chaoshenspider.spiders']
NEWSPIDER_MODULE = 'chaoshenspider.spiders'LOG_LEVEL = 'WARNING'# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'chaoshenspider (+http://www.yourdomain.com)'# Obey robots.txt rules
ROBOTSTXT_OBEY = False# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16# Disable cookies (enabled by default)
#COOKIES_ENABLED = False# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language': 'en',
}# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {#    'chaoshenspider.middlewares.ChaoshenspiderSpiderMiddleware': 543,
#}# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {#    'chaoshenspider.middlewares.ChaoshenspiderDownloaderMiddleware': 543,
#}# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {#    'scrapy.extensions.telnet.TelnetConsole': None,
#}# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {'chaoshenspider.pipelines.ChaoshenspiderPipeline': 300,
}MONGODB_HOST = '127.0.0.1'
MONGODB_PORT = 27017
MONGODB_DBNAME = 'fiction'
MONGODB_DBCNAME = '超神学院之新天庭'# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
  • piplines.py
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html# useful for handling different item types with a single interface
import pymongo
from itemadapter import ItemAdapter
from chaoshenspider import settingsclass ChaoshenspiderPipeline:def __init__(self):host = settings.MONGODB_HOSTport = settings.MONGODB_PORTdbname = settings.MONGODB_DBNAMEdbcname = settings.MONGODB_DBCNAMEclient = pymongo.MongoClient(host=host, port=port) # 链接数据库fiction = client[dbname] # 指定数据库self.add = fiction[dbcname] # 指定表self.book = open('超神学院之新天庭.txt', 'w', encoding='utf-8')print('爬虫程序开始!')def process_item(self, item, spider):print(item['chapter_name']+'下载完成!')# 下载小说文件self.book.write(item['chapter_name']+'\n')self.book.write(item['chapter_text']+'\n\n')# 存放数据库data = dict(item)self.add.insert_one(data)return itemdef close_spider(self, item):self.book.close()print('爬虫程序结束!')

爬虫--07:MongoDB相关推荐

  1. 【Python爬虫】MongoDB爬虫实践:爬取虎扑论坛

    MongoDB爬虫实践:爬取虎扑论坛 网站地址为:https://bbs.hupu.com/bxj 1.网站分析 首先,定位网页上帖子名称.帖子链接.作者.作者链接.创建时间.回复数目.浏览数目.最后 ...

  2. python爬虫--连接MongoDB 存数据

    之前做爬虫 爬取贴吧松爱协会的内容是存在txt文件的 这个并不好 所以这一次存在Mongdb 这次是在windows 安在Mongodb里 官网下载 https://www.mongodb.com/d ...

  3. python做前端mongodb_Python爬虫之mongodb和python交互

    mongodb和python交互 学习目标 掌握 mongdb和python交互的增删改查的方法 掌握 权限认证的方式使用pymongo模块 1. mongdb和python交互的模块 pymongo ...

  4. Python爬虫之MongoDB

    目录 一.Mongo概述 二.安装&下载 1.下载: 2.安装 三.基本命令 数据库操作 创建表 插入数据 查询数据 修改数据 删除数据 索引 四.Python与MongoDB交互 1.安装p ...

  5. python3 [爬虫入门实战]爬虫之mongoDB数据库的安装配置与可视化

    从安装过程到可视化工具可查看数据信息,历时两天,昨天坐了一天的火车,今天早上才到的青岛–> 来放松心情. 前天说是要学习如何使用mongoDB的链接与安装. 到今天过去了将一天, 不过还是在函兮 ...

  6. No.5 爬虫学习——MongoDB爬虫实践:虎扑论坛(唐松编《Python网络爬虫从入门到实践》P116-123)

    题目:获取虎扑步行街论坛上所有帖子的数据,内容包括帖子名称.帖子链接.作者.作者链接.创建时间.回复数.浏览数.最后回复用户和最后回复时间,网络地址为:https://bbs.hupu.com/bxj ...

  7. python盗墓笔记爬虫爬虫scrapy_redis——MongoDB存储

    目标网站:盗墓笔记小说网站 目标网址:http://www.daomubiji.com/ 目标内容:盗墓笔记小说的信息,具体内容包括:书标题章数章标题输出结果保存在MongoDB中 ######### ...

  8. 爬虫07 爬取阿里旅行特价机票

    https://sjipiao.alitrip.com/cheap_flight_search.htm?tripType=0&depCityName=&depCity=&arr ...

  9. 最全Python培训课程,基础班+高级就业班+课件(数据分析、深度学习、爬虫、人工智能等) 精品课程

    最新版Python全套培训课程视频,绝对零基础到Python大牛.包括:零基础得python基础班, 高阶进阶就业篇完整版(含:数据分析.机器学习.网络爬虫.数据挖掘.深度学习.人工智能等)课程共10 ...

  10. python书籍推荐-Python爬虫开发与项目实战

    所属网站分类: 资源下载 > python电子书 作者:doit 链接: http://www.pythonheidong.com/blog/article/466/ 来源:python黑洞网  ...

最新文章

  1. leaflet地图框架
  2. geany配置python_Linux系统下搭建基于Geany+Python开发环境
  3. Makefile中常用的函数
  4. 转载 详解go语言GC
  5. 关于Hadoop多用户管理支持客户端远程操作的理论总结
  6. 《研磨设计模式》chap17 策略模式(1) 简介
  7. 从啤酒尿布到自动驾驶,零售行业如何再创营销神话?
  8. OpenCASCADE:扩展数据交换(XDE)的简介
  9. 【工业控制】什么是波形
  10. lpc2000 filash utility 程序烧写工具_单片机烧录程序的次数
  11. YOLOX目标检测模型Keras实现,超越Yolov5
  12. 微信小程序 后端返回数据为字符串,转json方法
  13. L. Collecting Diamonds
  14. 浪潮服务器 U盘安装 Windows server 2016系统
  15. C语言入门:查找子串
  16. Kubernetes集群功能演示:deployment的管理和kubectl的使用
  17. 学计算机的三本分数线,2020三本分数线
  18. Opencv-培训(一)
  19. 拒绝攀比 理性分期消费
  20. 【历史上的今天】8 月 23 日:计算机先驱诞生日;万维网面世 30 周年

热门文章

  1. std::is_same的用法
  2. MT6739充电IC集成步骤
  3. 充电IC中的动态路径管理
  4. 2022-2028年全球与中国粮食种植行业市场深度调研及投资预测分析
  5. python测验6_Python语言程序设计 - 测验6: 组合数据类型 (第6周)
  6. C# ZPL命令 实现打印中文
  7. Algorithm:三数之和为0
  8. python ipo模型是指什么?
  9. python读写excel的图片_Python读取excel中的图片完美解决方法
  10. 关于金仓数据库的java连接问题