【python3】爬取鼠绘汉化的海贼王漫画

特别说明：

因为早些时候鼠绘的接口调整，之前的代码已经不能用了。

正好最近在学习scrapy，于是重新写了一个，项目放在github https://github.com/TurboWay/ishuhui

一、起因：

　　很喜欢看海贼漫画，其中鼠绘汉化的海贼王无疑是最好的，更新最快的。但是由于版权的问题，迫于压力，鼠绘官网早一点的海贼王已经看不了，但是。。。重点是，我发现接口还是可以用的，于是就写了个爬虫把鼠绘翻译的海贼王漫画都爬了下来。分享下源码，供有需要的海迷使用。另外建议不要在高峰时段爬取，毕竟我们都爱鼠绘。

二、如何使用：

　　有安装python环境的，直接复制源码，运行.py

三、代码如下：

# -*- coding: utf-8 -*-
import requests,json,time,os,shutil,logging,sys
from PIL import Image
from io import BytesIOlogger = logging.getLogger('log')
logger.setLevel(logging.DEBUG)# log format
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')# console log
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(formatter)
logger.addHandler(ch)def get_url(url):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)'' Chrome/62.0.3202.75 Safari/537.36'}response = requests.get(url=url, headers=headers, timeout=5)js = json.loads(response.text)if js["errNo"] == 0:return js["data"]else:logger.warning("请求失败：{0}".format(js))# 去掉文件名禁止符号
def clean(text):kws = ['/','\\',':','*','"','<','>','|','？']for kw in kws:text = text.replace(kw,'.')return text# 新建文件夹
def makefile(path,istruncate):if os.path.exists(path) and istruncate:shutil.rmtree(path)os.mkdir(path)elif not os.path.exists(path):os.mkdir(path)# 下载图片
def save_pic(img_src,picname):try:response = requests.get(img_src)image = Image.open(BytesIO(response.content))image = image.convert('RGB')image.save(picname)logger.info("{0}图片下载成功".format(picname))flag = Trueexcept Exception as e:logger.info("{0}图片下载失败:{1}".format(picname,e))flag = Falsereturn flag# 保存图片
def resave_pic(img_src,picname):count,flag = 0,save_pic(img_src,picname)while not flag:flag = save_pic(img_src, picname)count += 1if count > 5:breakdef get_data(path,nextid):url = 'http://hhzapi.ishuhui.com/cartoon/post/ver/76906890/id/{0}.json'.format(nextid)data = get_url(url)if data:server = 'http://pic04.ishuhui.com/'source, id, title, book, number = data['source'], data['id'], data['title'], data['book_text'], data['number']content_img = eval(data['content_img']) if data['content_img'] else {}if source == 1: # 鼠绘汉化makefile(path + '\\' + book, False)title = clean(title)filepath = path + '\{0}\{0} 第 {1} 话 {2}'.format(book,number,title)makefile(filepath, True) # 新建文件夹if content_img: # 下载图片for img, imgurl in content_img.items():imgurl = server + imgurl.replace('/upload/','')picname = filepath + '\\'+ imgresave_pic(imgurl,picname)logger.info("ID:{2} 第 {0} 话 {1}下载完成".format(number,title,id))next = data['prev']if next:return next['id']elif nextid == 900: # 900的时候会找不到上一页return 899if __name__ == "__main__":path=sys.path[0]nextid=get_data(path,10881)while nextid:nextid=get_data(path,nextid)time.sleep(3)

四、结果如下：

　　第598话 2年后 -- 第908话世界會議開幕，共309话，3.22G，其中680和681话缺失了，接口扫了一下也没找到。

转载于:https://www.cnblogs.com/TurboWay/p/9243971.html

【python3】爬取鼠绘汉化的海贼王漫画相关推荐

scrapy 动态网页处理——爬取鼠绘海贼王最新漫画
简介 scrapy是基于python的爬虫框架,易于学习与使用.本篇文章主要介绍如何使用scrapy爬取鼠绘漫画网海贼王最新一集的漫画. 源码参见:https://github.com/liudaol ...
Python3 爬取豆瓣电影信息
原文链接: Python3 爬取豆瓣电影信息上一篇: python3 爬取电影信息下一篇: neo4j 查询豆瓣api https://developers.douban.com/wiki/?t ...
用python3爬取百度首页
用python3读取百度首页代码爬取百度首页 import urllib.request import urlliburl="http://www.baidu.com/" ht ...
Python2 Python3 爬取赶集网租房信息,带源码分析
*之前偶然看了某个腾讯公开课的视频,写的爬取赶集网的租房信息,这几天突然想起来,于是自己分析了一下赶集网的信息,然后自己写了一遍,写完又用用Python3重写了一遍.之中也遇见了少许的坑.记一下.算是 ...
python3爬取百度图片
python3爬取百度图片最终目的:能通过输入关键字进行搜索,爬取相应的图片存储到本地或者数据库首先打开百度图片的网站,搜索任意一个关键字,比如说:水果,得到如下的界面分析: 1.百度图片搜索结 ...
Python3爬取影片入库
Python3爬取影片入库 1.服务器说明 [root@openshift maoyan]# cat /etc/redhat-release CentOS Linux release 7.4.1708 ...
Python3爬取企查查网站的企业年表并存入MySQL
Python3爬取企查查网站的企业年表并存入MySQL 本篇博客的主要内容:爬取企查查网站的企业年报数据,存到mysql中,为了方便记录,分成两个模块来写: 第一个模块是爬取数据+解析数据,并将数据存 ...
Python3爬取国家统计局官网2019年全国所有城市（2020年更新）
Python3爬取国家统计局官网2019年全国所有城市(2020年更新) 一级城市爬取一级城市爬取由于最近需要用到所有城市的数据,故从统计局爬取19年的一级城市数据 import random i ...
python3爬取巨潮资讯网站年报数据
python3爬取巨潮资讯网站年报数据 2018年年底巨潮资讯http://www.cninfo.com.cn改版了,之前实习生从网上找的脚本不能用了,因此重新修改了下爬取脚本.最初脚本的原链接忘了, ...

【python3】爬取鼠绘汉化的海贼王漫画

【python3】爬取鼠绘汉化的海贼王漫画相关推荐

最新文章

热门文章