MonogoDB 查询小结

MonogoDB是一种NoSQL数据库

优点:

　　 1.数据的存储以json的文档进行存储(面向文档存储)

　　 2.聚合框架查询速度快

3.高效存储二进制大对象

缺点:

　　1.不支持事务

2.文件存储空间占用过大

案例学习

例1:单个变量查询(查找出制造商字段为“Porsche”的所有汽车的查询)

{"layout" : "rear mid-engine rear-wheel-drive layout","name" : "Porsche Boxster","productionYears" : [ ],"modelYears" : [ ],"bodyStyle" : "roadster","assembly" : ["Finland","Germany","Stuttgart","Uusikaupunki"],"class" : "sports car","manufacturer" : "Porsche"
}

def porsche_query(): #{'字段名':'字段值'}query = {'manufacturer':'Porsche'}return query

例2:范围查询 (找出在二十一世纪建成的所有城市注意运算符 $gte,$lte)

{'areaCode': ['916'],'areaLand': 109271000.0,'country': 'United States','elevation': 13.716,'foundingDate': datetime.datetime(2000, 7, 1, 0, 0),'governmentType': ['Council\u2013manager government'],'homepage': ['http://elkgrovecity.org/'],'isPartOf': ['California', u'Sacramento County California'],'lat': 38.4383,'leaderTitle': 'Chief Of Police','lon': -121.382,'motto': 'Proud Heritage Bright Future','name': 'City of Elk Grove','population': 155937,'postalCode': '95624 95757 95758 95759','timeZone': ['Pacific Time Zone'],'utcOffset': ['-7', '-8']
}

def range_query():    #使用$gt,$lt来限定查询的条件的范围 query = {'foundingDate':{'$gte':datetime(2001,1,1),'$lt':datetime(2100,12,31)}}return query

例3:找出在德国、英国或日本组装的所有汽车

{"layout" : "rear mid-engine rear-wheel-drive layout","name" : "Porsche Boxster","productionYears" : [ ],"modelYears" : [ ],"bodyStyle" : "roadster","assembly" : ["Finland","Germany","Stuttgart","Uusikaupunki"],"class" : "sports car","manufacturer" : "Porsche"
}

def in_query():    #使用$in来找出满足调节的集合 query = {'assembly':{'$in':['Germany','England','Japan']}}return query

例4:点表示法找出宽度大于 2.5 的所有汽车

{"_id" : ObjectId("52fd438b5a98d65507d288cf"),"engine" : "Crawler-transporter__1","dimensions" : {"width" : 34.7472,"length" : 39.9288,"weight" : 2721000},"transmission" : "16 traction motors powered by four  generators","modelYears" : [ ],"productionYears" : [ ],"manufacturer" : "Marion Power Shovel Company","name" : "Crawler-transporter"
}

def dot_query():    #使用.来表示父节点中的子节点query = {'dimensions.width':{'$gt':2.5}}return query

聚合框架查询

例5:找出创建推特时最常用的应用

思路语法:$group分组,创建一个变量count,使用$sum计算分组后的数据的条数

示例文件

{"_id" : ObjectId("5304e2e3cc9e684aa98bef97"),"text" : "First week of school is over :P","in_reply_to_status_id" : null,"retweet_count" : null,"contributors" : null,"created_at" : "Thu Sep 02 18:11:25 +0000 2010","geo" : null,"source" : "web","coordinates" : null,"in_reply_to_screen_name" : null,"truncated" : false,"entities" : {"user_mentions" : [ ],"urls" : [ ],"hashtags" : [ ]},"retweeted" : false,"place" : null,"user" : {"friends_count" : 145,"profile_sidebar_fill_color" : "E5507E","location" : "Ireland :)","verified" : false,"follow_request_sent" : null,"favourites_count" : 1,"profile_sidebar_border_color" : "CC3366","profile_image_url" : "http://a1.twimg.com/profile_images/1107778717/phpkHoxzmAM_normal.jpg","geo_enabled" : false,"created_at" : "Sun May 03 19:51:04 +0000 2009","description" : "","time_zone" : null,"url" : null,"screen_name" : "Catherinemull","notifications" : null,"profile_background_color" : "FF6699","listed_count" : 77,"lang" : "en","profile_background_image_url" : "http://a3.twimg.com/profile_background_images/138228501/149174881-8cd806890274b828ed56598091c84e71_4c6fd4d8-full.jpg","statuses_count" : 2475,"following" : null,"profile_text_color" : "362720","protected" : false,"show_all_inline_media" : false,"profile_background_tile" : true,"name" : "Catherine Mullane","contributors_enabled" : false,"profile_link_color" : "B40B43","followers_count" : 169,"id" : 37486277,"profile_use_background_image" : true,"utc_offset" : null},"favorited" : false,"in_reply_to_user_id" : null,"id" : NumberLong("22819398300")
}

def make_pipeline():pipeline = [# 1.根据source进行分组,然后统计出每个分组的数量,放在count中        # 2.根据count字段降序排列{'$group':{'_id':'$source','count':{'$sum':1}}},{'$sort':{'count':-1}}]return pipeline

例6:找出巴西利亚时区的用户，哪些用户发推次数不低于 100 次，哪些用户的关注者数量最多

def make_pipeline():#1.使用$match将数据进行筛选    #2.使用$project(投影运算),获取结果的返回值     #3.使用$sort根据followers的值降序排列    #4.使用$limit来限制展示的条数,第一条就是满足条件的结果pipeline = [{'$match':{'user.time_zone':'Brasilia','user.statuses_count':{'$gte':100}}},{'$project':{'followers':'$user.followers_count','screen_name':'$user.screen_name','tweets':'$user.statuses_count'}},{'$sort':{'followers':-1}},{'$limit':1}]return pipeline

例7:找出印度的哪个地区包括的城市最多

$match进行条件筛选,类似SQL语法的where

$unwind对列表的数据进行拆分,如果数据以列表的形式存放,$unwind会将列表每一项单独和文件进行关联

$sort对文件中的元素进行排序

示例文件

{"_id" : ObjectId("52fe1d364b5ab856eea75ebc"),"elevation" : 1855,"name" : "Kud","country" : "India","lon" : 75.28,"lat" : 33.08,"isPartOf" : ["Jammu and Kashmir","Udhampur district"],"timeZone" : ["Indian Standard Time"],"population" : 1140
}

def make_pipeline():#1.根据$match筛选出国家    #2.根据$unwind将列表形式的字段进行拆分    #3.根据$group将拆分的项进行分组,并统计出总数count    #4.根据$sort将总数count进行降序排列,找出结果集
    pipeline = [{'$match':{'country':'India'}},{'$unwind':'$isPartOf'},{'$group':{'_id':'$isPartOf','count':{'$sum':1}}},{'$sort':{'count':-1}}]return pipeline

例8:找出每个用户的所有推特文本数量,仅数出推特数量在前五名的用户。

$push将每一项数据聚合成列表(允许重复的元素)

$addToSet将每一项数据聚合成列表(允许重复的元素)

def make_pipeline():#1.使用$group根据screen_name进行分组    #2.使用$push将所有的text的值放入到tweet_texts中    #3.使用$sum统计出总数    #4.使用$sort将总数count进行降序排列    #5.使用$limit获取前5的用户
    pipeline = [{'$group':{'_id':'$user.screen_name','tweet_texts':{'$push':'$text'},'count':{'$sum':1}}},{'$sort':{'count':-1}},{'$limit':5}]return pipeline

例9:找出印度各个地区的平均人口数量是多少

def make_pipeline():#1.使用$match筛选出国家India    #2.使用$unwind对isPartOf进行拆分    #3.使用$group将isPartOf进行分组,在使用$avg计算出平均人口    #4.使用$group将avg的值进行展示
    pipeline = [{'$match':{'country':'India'}},{'$unwind':'$isPartOf'},{'$group':{'_id':'$isPartOf','avgp':{'$avg':'$population'}}},{'$group':{'_id':'India Regional City Population avg','avg':{'$avg':'$avgp'}}}]return pipeline

练习

习题集03

1.仅处理 FIELDS 字典中作为键的字段，并返回清理后的值字典列表

需求:

　　1.根据 FIELDS 字典中的映射更改字典的键

　　2.删掉“rdf-schema#label”中的小括号里的多余说明，例如“(spider)”

　　3.如果“name”为“NULL”，或包含非字母数字字符，将其设为和“label”相同的值

　　4.如果字段的值为“NULL”，将其转换为“None”

　　5.如果“synonym”中存在值，应将其转换为数组（列表），方法是删掉“{}”字符，并根据“|” 拆分字符串。剩下的清理方式将由你自行决定，例如删除前缀“*”等。如果存在单数同义词，值应该依然是列表格式。　　　　

　　6.删掉所有字段前后的空格（如果有的话）

　　7.输出结构应该如下所示

[ { 'label': 'Argiope','uri': 'http://dbpedia.org/resource/Argiope_(spider)','description': 'The genus Argiope includes rather large and spectacular spiders that often ...','name': 'Argiope','synonym': ["One", "Two"],'classification': {'family': 'Orb-weaver spider','class': 'Arachnid','phylum': 'Arthropod','order': 'Spider','kingdom': 'Animal','genus': None}},{ 'label': ... , }, ...
]

import codecs
import csv
import json
import pprint
import reDATAFILE = 'arachnid.csv'
FIELDS ={'rdf-schema#label': 'label','URI': 'uri','rdf-schema#comment': 'description','synonym': 'synonym','name': 'name','family_label': 'family','class_label': 'class','phylum_label': 'phylum','order_label': 'order','kingdom_label': 'kingdom','genus_label': 'genus'}def process_file(filename, fields):#获取FIELDS字典的keys列表process_fields = fields.keys()#存放结果集data = []with open(filename, "r") as f:reader = csv.DictReader(f)#跳过文件中的前3行for i in range(3):l = reader.next()#读文件for line in reader:# YOUR CODE HERE#存放总的字典res = {}#存放key是classification的子字典res['classification'] = {}#循环FIELDS字典的keys  for field in process_fields:#获取excel中key所对应的val,条件1tmp_val = line[field].strip()#生成json数据的新key,即是FIELDS字典的valuenew_key = FIELDS[field]#条件4 if tmp_val == 'NULL':tmp_val = None#条件2if field == 'rdf-schema#label':tmp_val = re.sub(r'\(.*\)','',tmp_val).strip()#条件3if field == 'name' and line[field] == 'NULL':tmp_val = line['rdf-schema#label'].strip()#条件5if field == 'synonym' and tmp_val:tmp_val = parse_array(line[field])#子字典中所包含的的key if new_key in ['kingdom','family','order','phylum','genus','class']:#子字典中所包含的的key的valueres['classification'][new_key] = tmp_valcontinue#将新的key和val放入到res中,然后加入到列表中返回res[new_key] = tmp_valdata.append(res)return datadef parse_array(v):#解析数组#如果以{开头和}结尾,删除左右的{},并以|进行分割,最后去除每一个项的空格,返回if (v[0] == "{") and (v[-1] == "}"):v = v.lstrip("{")v = v.rstrip("}")v_array = v.split("|")v_array = [i.strip() for i in v_array]return v_arrayreturn [v]

def test():#测试函数,如果不出错,结果正确data = process_file(DATAFILE, FIELDS)print "Your first entry:"pprint.pprint(data[0])first_entry = {"synonym": None, "name": "Argiope", "classification": {"kingdom": "Animal", "family": "Orb-weaver spider", "order": "Spider", "phylum": "Arthropod", "genus": None, "class": "Arachnid"}, "uri": "http://dbpedia.org/resource/Argiope_(spider)", "label": "Argiope", "description": "The genus Argiope includes rather large and spectacular spiders that often have a strikingly coloured abdomen. These spiders are distributed throughout the world. Most countries in tropical or temperate climates host one or more species that are similar in appearance. The etymology of the name is from a Greek name meaning silver-faced."}assert len(data) == 76assert data[0] == first_entryassert data[17]["name"] == "Ogdenia"assert data[48]["label"] == "Hydrachnidiae"assert data[14]["synonym"] == ["Cyrene Peckham & Peckham"]if __name__ == "__main__":test()

2.向MonogoDB中插入数据

import jsondef insert_data(data, db):#直接调用insert方法插入即可
arachnids = db.arachnid.insert(data)if __name__ == "__main__":from pymongo import MongoClientclient = MongoClient("mongodb://localhost:27017")db = client.exampleswith open('arachnid.json') as f:data = json.loads(f.read())insert_data(data, db)print db.arachnid.find_one()

习题集04

实例文本

{"_id" : ObjectId("52fe1d364b5ab856eea75ebc"),"elevation" : 1855,"name" : "Kud","country" : "India","lon" : 75.28,"lat" : 33.08,"isPartOf" : ["Jammu and Kashmir","Udhampur district"],"timeZone" : ["Indian Standard Time"],"population" : 1140
}

1.找出最常见的城市名

def make_pipeline():#1.使用$match过滤掉name为空的数据    #2.使用$group进行对name分组,统计出每个值的和放在count中    #3.使用$sort对count进行降序排列    #4.使用$limit 1返回最后的结果pipeline = [{'$match':{'name':{'$ne':None}}},{'$group':{'_id':'$name','count':{'$sum':1}}},{'$sort':{'count':-1}},{'$limit':1}]return pipeline

2.经度在 75 到 80 之间的地区中，找出哪个地区包含的城市数量最多

def make_pipeline():#1.使用$match过滤出国家为India同时经度在75~80的区域    #2.使用$unwind对地区进行分割    #3.使用$group将地区进行分组,同时根据地区统计出数量    #4.使用$sort对count进行降序排列    #5.使用$limit 1返回最后的结果
    pipeline = [{'$match':{'country':'India','lon':{'$gte':75,'$lte':80}}},{'$unwind':'$isPartOf'},{'$group':{'_id':'$isPartOf','count':{'$sum':1}}},{'$sort':{'count':-1}},      {'$limit':1}]return pipeline

3.计算平均人口(本题目不是很明确,意思是要计算出每个区域的平均人口)

def make_pipeline():#1.使用$unwind对地区进行分割    #2.使用$group对所有的国家和地区进行分组,同时计算国家的平均人口    #3.使用$group在对国家进行分组然后在计算每个区域的平均人口即可pipeline = [{'$unwind':'$isPartOf'},{'$group':{'_id':{'country':'$country','region':'$isPartOf'},'avgCityPopulation':{'$avg':'$population'}}},{'$group':{'_id':'$_id.country','avgRegionalPopulation':{'$avg':'$avgCityPopulation'}}}]return pipeline

参考:https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

转载于:https://www.cnblogs.com/luhuajun/p/8022755.html

MonogoDB 查询小结相关推荐

MySQL 数据库 like 语句通配符模糊查询小结
MySQL 报错:Parameter index out of range (1 > number of parameters, which is 0)--MySQL 数据库 like 语句通配 ...
ado.net 查询小结 c# 1614653302
ado.net 小结 c# 1614653302 得到连接对象 SqlConnection 操作步骤导入命名空间 using System.Data.SqlClient 获取连接字符串得到类似的字 ...
oracle树状结构层级查询小结--connect by等之测试数据
1.创建表 (dept_id VARCHAR2(32) not null,dept_name VARCHAR2(128),dept_code VARCHAR2(32),par_dept_id VARC ...
java元婴期(25)----java进阶（mybatis（4）---高级映射查询缓存）
1.需要用到的数据模型(这是后面高级查询需要用到的实例) 用户表user: 记录了购买商品的用户信息订单表:orders 记录了用户所创建的订单(购买商品的订单) ...
MyBatis多表查询之XML和注解实现(resultMap结果集映射配置数据库字段与实体类属性一一映射)
MyBatis多表查询多表模型分类一对一:在任意一方建立外键,关联对方的主键. 一对多:在多的一方建立外键,关联一的一方的主键. 多对多:借助中间表,中间表至少两个字段,分别关联两张表的主键. 数 ...
关联查询---Mybatis学习笔记（九）
商品订单数据模型注意:分析数据库表和数据库表之间的关系可以先通过数据库中的主外键关系来分析,然后通过业务中的实际的关系来分析. 1.一对一查询需求: 查询订单信息,关联查询创建订单的用户信息分析 ...
分组分页连接查询子查询9202-0422
分组查询查询当前宠物有哪些组分组成功后,就不要再使用数据行的字段来查询了获得分组后的数据求和 limit分页显示内连接查询宠物表与主人表是有关联的有的宠物是有主人的获得宠物与主人的关系 ...
【MyBatis框架】高级映射-一对一查询
一对一查询根据上面我们分析的订单商品数据模型(链接:12.订单商品数据模型-分析思路.txt),我们来写一下有关一对一的查询,分别使用了resultType和resultMap指定输出参数类型 1. ...
mongodb模糊查询_我叫Mongo，收了「查询基础篇」，值得你拥有
这是mongo第二篇「查询基础篇」,后续会连续更新6篇 mongodb的文章总结上会有一系列的文章,顺序是先学会怎么用,在学会怎么用好,戒急戒躁,循序渐进,跟着我一起来探索交流. 通过上一篇基础篇的介 ...

MonogoDB 查询小结

MonogoDB 查询小结相关推荐

最新文章

热门文章