python3 爬虫https的坑 -- 已解决

以下代码在ipython执行无报错，且有正确结果，但在pycharm执行就报错，错误代码见第二段

# coding=utf-8
import re
import urllib.requestdef getHtml(url):page = urllib.request.urlopen(url)html = page.read()html = html.decode('utf-8')return htmldef getImg(html):reg = r'<p class="img_title">(.*)</p>'img_title = re.compile(reg)imglist = re.findall(img_title, html)return imglisturl = "https://tieba.baidu.com"
html = getHtml(url)
imglist = getImg(html)print(imglist)

/Users/zhu/python3/env/bin/python /Users/zhu/python3/the19/ex3.py
Traceback (most recent call last):File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_openencode_chunked=req.has_header('Transfer-encoding'))File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in requestself._send_request(method, url, body, headers, encode_chunked)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_requestself.endheaders(body, encode_chunked=encode_chunked)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheadersself._send_output(message_body, encode_chunked=encode_chunked)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_outputself.send(msg)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in sendself.connect()File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connectserver_hostname=server_hostname)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 407, in wrap_socket_context=self, _session=session)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 814, in __init__self.do_handshake()File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1068, in do_handshakeself._sslobj.do_handshake()File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 689, in do_handshakeself._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)During handling of the above exception, another exception occurred:Traceback (most recent call last):File "/Users/zhu/python3/the19/ex3.py", line 21, in <module>html = getHtml(url)File "/Users/zhu/python3/the19/ex3.py", line 7, in getHtmlpage = urllib.request.urlopen(url)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopenreturn opener.open(url, data, timeout)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in openresponse = self._open(req, data)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open'_open', req)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chainresult = func(*args)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_opencontext=self._context, check_hostname=self._check_hostname)File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_openraise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)>Process finished with exit code 1

因为爬虫对象是https链接，导入一个ssl模块就可以解决问题，最后代码改成如下：

# coding=utf-8
import re
import urllib.request
import ssldef getHtml(url):page = urllib.request.urlopen(url)html = page.read()html = html.decode('utf-8')return htmldef getImg(html):reg = r'<p class="img_title">(.*)</p>'img_title = re.compile(reg)imglist = re.findall(img_title, html)return imglistssl._create_default_https_context = ssl._create_unverified_context
url = "https://tieba.baidu.com"
html = getHtml(url)
imglist = getImg(html)print(imglist)

python3 爬虫https的坑 -- 已解决相关推荐

python3 unicode字符串_【已解决】Python3中如何声明字符串是unicode类型以避免log日志打印出错...
Python3中代码: CreateTableSqlTemplate = """CREATE TABLE IF NOT EXISTS `%s` ( `id` int(11 ...
【vue axios 跨域】cookie、origin等一步步递进的跨域踩坑已解决
前后端对接必出bug,最近连续对接了两个项目,对解决跨域有点感觉了,跟大家分享一下经验,都是血的教训- 两个项目都是Springboot+Vue(axios网络请求).本人主要负责前端,所以前端会分享 ...
python3爬虫图片_【已下线】Python3 实现淘女郎照片爬虫
刚学习python 大家就当随便看看 # encoding: utf-8 import urllib import urllib2 import re import json def getHtml( ...
spark on yarn模式下SparkStream整合kafka踩的各种坑(已解决)_fqzzzzz的博客
项目场景: 使用sparkStream接收kafka的数据进行计算,并且打包上传到linux进行spark任务的submit 错误集合: 1.错误1: Failed to add file:/usr/ ...
【Python3爬虫】常见反爬虫措施及解决办法（二）...
[Python3爬虫]常见反爬虫措施及解决办法(二) 这一篇博客,还是接着说那些常见的反爬虫措施以及我们的解决办法.同样的,如果对你有帮助的话,麻烦点一下推荐啦. 一.防盗链这次我遇到的防盗链,除了 ...
iOS开发遇到的坑之五--解决工程已存在plist表,数据却不能存入的问题
iOS开发遇到的坑之五--解决工程已存在plist表,数据却不能存入的问题参考文章: (1)iOS开发遇到的坑之五--解决工程已存在plist表,数据却不能存入的问题 (2)https://www. ...
python3抓取电影天堂存mysql出错如何解决？（已解决）
项目简介:抓取电影天堂的数据,xpath解析,存mysql 问题描述: 连续抓取并存储六页数据后,从第七页开始就不能存数据库了,直接回滚数据库,至今仍未解决,请大佬会的麻烦解答一下已解决 ...
Python 爬虫中国知网论文过程中遇到的坑及解决办法
假期,老师给布置了 Python 爬虫中国知网论文的任务,目前实现了登录和搜索功能,先写一下遇到的坑和解决办法吧. Python 爬虫中国知网论文过程中遇到的坑及解决办法一. selenium 模块 ...
ubuntu18.04解决问题：is not a supported wheel on this platform(已解决)--用python3.8安装crala0.9.13时出现
ubuntu18.04解决问题:is not a supported wheel on this platform(已解决)–用python3.8安装crala0.9.13时出现文章目录前言一. ...
python有中文无法保存_解决python3爬虫无法显示中文的问题
解决python3爬虫无法显示中文的问题有时候使用python从网站上爬数据的时候,如果数据里包含中文,有时候显示的却是如下所示...\xe4\xba\xba\xef\xbc\x8c\xe6...类 ...

python3 爬虫https的坑 -- 已解决

python3 爬虫https的坑 -- 已解决相关推荐

最新文章

热门文章