Python--爬虫初学(11.5)

import urllib3 #导入标准库升级版模块http = urllib3.PoolManager()   #创建poolmanager对象，用于处理[与线程的连接以及线程安全
response = http.request("GET","http://www.baidu.com")print(response.data.decode("utf-8"))import  requestsresponse = requests.get('http://www.baidu.com/') #发送网络请求
print("状态码",response.status_code) #打印状态码
print("请求地址",response.url) #打印请求地址
print("打印头部信息：",response.headers) #打印头部信息
print("cookie信息",response.cookies) #打印cookies
print("导入源码:",response.text)
print("字节流源码",response.content)import requests
#headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36"}
url = "http://www.baidu.com/"
#,headers = headers
response = requests.get(url) #发送网络请求
print(response.content.decode("utf-8")) #编码方式一定要添加
"""
"""
import requests
#设置代理ip
proxy = {'http':'194.177.0.74:53281','https':'118.172.201.251:42287'}response = requests.get('https://www.google.com',proxies = proxy)
print(response.content.decode('utf-8')from bs4 import BeautifulSoup
import requestsresponse = requests.get('https://news.baidu.com')
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p><p class="story">...</p>
"""
soup = BeautifulSoup(response.text,features="lxml")
print(soup.find("title").text) #找到title标签的内容
print("标题：",soup.title)
print(soup.prettify())  #格式化代码

一大堆代码有点累改天再解释睡觉了

Python--爬虫初学(11.5)相关推荐

python爬虫精选11集（selenium进阶总结【selenium的cookies处理、ip代理、useragent替换】）
python爬虫精选11集(selenium进阶总结) 一.selenium的介绍 1. selenium运行效果展示 1.1 chrome浏览器的运行效果 1.2 phantomjs无界面浏览器的运 ...
python爬虫初学实战——免登录爬取easyicon里的vip图标（2）
python爬虫初学实战-免登录爬取easyicon里的vip图标(2) 实验日期:2020-08-09 tips:没看过前面(1)的可以康康,指路 -> 爬取easyicon里的png图标成 ...
python爬虫初学实战——免登录爬取easyicon里的vip图标（1）
python爬虫初学入门实战 --无需登录无需vip获得各种大小的png图标,爬取easyicon里的图标并储存首先附上爬取的网址地址实验环境操作系统:windows 10 版本:python ...
萌新python爬虫初学
前言:先声明本人是小菜鸡一枚,望大佬勿鄙视.就在今天总感觉有什么大事发生,心神不宁的,突然天空一声巨响,菜鸡萌生写博客的想法.(天啦!天啦!这是要翻天了,菜鸡都要写博客了)现在的初心是:方便自己整理学 ...
小白学 Python 爬虫（11）：urllib 基础使用（一）
人生苦短,我用 Python 前文传送门: 小白学 Python 爬虫(1):开篇小白学 Python 爬虫(2):前置准备(一)基本类库的安装小白学 Python 爬虫(3):前置准备(二)Li ...
Python爬虫初学一（爬虫基础）
目录一.通用爬虫和聚焦爬虫 1.什么是网络爬虫? 1.通用爬虫 2.聚焦网络爬虫二.HTTP和HTTPS 1.HTTP工作原理 1.浏览器发送HTTP请求的过程三.客户端HTTP请求 1.请求方 ...
Python爬虫初学二（网络数据采集）
目录一.网络数据采集 1.什么是网络数据采集 2.网络数据采集的特点二.网络数据采集之urllib库三.网络数据采集之requests库 1.requests安装 2.request方法汇总 3 ...
Python爬虫初学：报错1：UnicodeEncodeError: 'gbk' codec can't encode character '\xbb'……
大家好,我是庞老板咩,一名浙江大学动力工程20级推免生,最近开始自学Python.这是我的第一篇博客,很高兴可以在CSDN这个平台和大家分享交流^ ^ 首先,这是一段我初学爬虫时的代码: import ...
Python爬虫初学三（网络数据解析）
目录 1.学习正则表达式的原因 2.什么是正则表达式 3.正则表达式匹配规则 4.正则实现步骤 5.Pattern 对象 6.正则模块常量 7.Match 对象 8.search 9.findall和 ...
Python爬虫初学（三）—— 模拟登录知乎
模拟登录知乎这几天在研究模拟登录, 以知乎 - 与世界分享你的知识.经验和见解为例.实现过程遇到不少疑问,借鉴了知乎xchaoinfo的代码,万分感激! 知乎登录分为邮箱登录和手机登录两种方式,通过 ...

Python--爬虫初学(11.5)

Python--爬虫初学(11.5)相关推荐

最新文章

热门文章