python urllib.request 爬虫数据处理-python爬虫之json数据处理

# -*- coding: utf-8 -*-

# @Time : 2019/11/5 23:18

# @Author : AForever

# @Site :

# @File : Spider_05.py

# @Software: PyCharm

# 处理json数据

from urllib import request

import json

def get_data():

url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=400&page_start=0"

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"

}

req = request.Request(url, headers=headers)

response = request.urlopen(req)

if response.getcode() == 200:

result = response.read()

# print(type(result)) # bytes类型

# print(result)

result = str(result, encoding="utf8")

print(result)

return result

def parse_data(html):

# 将字符串形式的json转换为dict字典

data = json.loads(html)

movies = data["subjects"]

for movie in movies:

print(movie["title"], movie["rate"])

if __name__ == "__main__":

# get_data()

parse_data(get_data())

原文地址：https://www.cnblogs.com/AForever01/p/11986622.html

python urllib.request 爬虫数据处理-python爬虫之json数据处理相关推荐

python urllib.request 爬虫数据处理-运维学python之爬虫基础篇（二）urllib模块使用...
1 何为爬虫网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本.另外一些不常使用的名字还有蚂蚁.自动索引. ...
python urllib.request 爬虫数据处理-python 爬虫之 urllib库
文章更新于:2020-03-02 注:代码来自老师授课用样例. 一.初识 urllib 库在 python2.x 版本,urllib 与urllib2 是两个库,在 python3.x 版本,二者合 ...
python urllib.request 爬虫数据处理-python之爬虫（三） Urllib库的基本使用
什么是Urllib Urllib是python内置的HTTP请求库包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模 ...
python urllib.request 爬虫数据处理-Python爬虫学习之（二）| urllib进阶篇
作者:xiaoyu 微信公众号:Python数据科学知乎:Python数据分析师前情回顾,urllib的基本用法 urllib库的基本组成利用最简单的urlopen方法爬取网页html 利用Re ...
python urllib.request 爬虫数据处理-Python网络爬虫(基于urllib库的get请求页面)
一.urllib库 urllib是Python自带的一个用于爬虫的库,其主要作用就是可以通过代码模拟浏览器发送请求.其常被用到的子模块在Python3中的为urllib.request和urllib. ...
python urllib.request 爬虫数据处理-使用Python3.5写简单网络爬虫
<一>用urllib库访问URL并采集网络数据 -1. 直接采集发送请求,打开URL,打印传回的数据(html文件) - 2. 模拟真实浏览器访问 1)发送http头信息(header) ...
python urllib.request 爬虫数据处理-python爬虫1--urllib请求库之request模块
urllib为python内置的HTTP请求库,包含四个模块: request:最基本的HTTP请求模块, 只需要传入URL和参数 error:异常处理模块 parse:工具模块,处理URL,拆分.解 ...
python urllib.request 爬虫数据处理-python爬虫 urllib模块url编码处理
案例:爬取使用搜狗根据指定词条搜索到的页面数据(例如爬取词条为"周杰伦'的页面数据) import urllib.request # 1.指定url url = 'https://www.s ...
python urllib.request 爬虫数据处理-python爬虫 urllib模块url编码处理详解
案例:爬取使用搜狗根据指定词条搜索到的页面数据(例如爬取词条为"周杰伦'的页面数据) import urllib.request # 1.指定url url = 'https://www.s ...
python urllib发送post请求_python爬虫 urllib模块发起post请求过程解析
urllib模块发起的POST请求案例:爬取百度翻译的翻译结果 1.通过浏览器捉包工具,找到POST请求的url 针对ajax页面请求的所对应url获取,需要用到浏览器的捉包工具.查看百度翻译针对某 ...

python urllib.request 爬虫数据处理-python爬虫之json数据处理

python urllib.request 爬虫数据处理-python爬虫之json数据处理相关推荐

最新文章

热门文章

python urllib.request 爬虫 数据处理-python爬虫之json数据处理

python urllib.request 爬虫 数据处理-python爬虫之json数据处理相关推荐

最新文章

热门文章

python urllib.request 爬虫数据处理-python爬虫之json数据处理

python urllib.request 爬虫数据处理-python爬虫之json数据处理相关推荐