python 爬虫工具爬取 bing,百度浏览器图片

360搜索爬虫图片

百度爬虫图片

大批量爬虫工具包（可爬取bing、谷歌、百度）～～～～～～超好用，也简单

360搜索爬虫图片

import json
import os
import requests# 路径
BASE_URL = './厨房'
# 关键词
NAME = '厨房'class PictureDownload(object):def __init__(self, q=None, sn=100):self.url = 'https://m.image.so.com/j?q={}&src=srp&pn=100&sn={}&kn=0&gn=0&cn=0'self.headers = {'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'}self.q = qself.sn = snself.num = 0self.total = 2def makedir(self):if not os.path.exists(os.path.join(BASE_URL, self.q)):os.makedirs(os.path.join(BASE_URL, self.q))def parse_url(self):response = requests.get(self.url.format(self.q, self.num), headers=self.headers)return response.content.decode()def parse_image_list(self, html_json_str):image_list = json.loads(html_json_str)['list']total = json.loads(html_json_str)['total']return image_list, totaldef save_image(self, image_list):for item in image_list:response = requests.get(item['thumb'], headers=self.headers)with open(os.path.join(BASE_URL, '%s\%s.jpg' % (self.q, item['index'])), 'wb') as f:f.write(response.content)def run(self):self.makedir()while self.num < self.total:html_json_str = self.parse_url()image_list, self.total = self.parse_image_list(html_json_str)self.save_image(image_list)self.num += 100print(self.num)if __name__ == '__main__':xxx = PictureDownload(NAME)xxx.run()

百度爬虫图片

"""
The script is used to crawl images from Baidu.
Example usage:
python scrapy.py -k 白衬衫-sd ./dataset/-np 1000
"""import urllib
import urllib.request
from urllib.parse import quote
import re
import os
import argparsedef main():headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36","referer": "https://image.baidu.com"}keyword = args.keywordlast_dir = args.saving_dirdir = os.path.join(last_dir, keyword)if os.path.exists(last_dir):if os.path.exists(dir):print("文件夹已经存在")else:os.mkdir(dir)print(dir + "已经创建成功")else:os.mkdir(last_dir)if os.path.exists(dir):print("文件夹已经存在")else:os.mkdir(dir)print(dir + "已经创建成功")keyword1 = quote(keyword, encoding="utf-8")num_pages = args.num_pagesnum = 0for pn in range(1, int(num_pages*30), 30):url = "http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&pn={}&word=".format(pn) + keyword1req = urllib.request.Request(url, headers=headers)f = urllib.request.urlopen(req).read().decode("utf-8")key = r'thumbURL":"(.+?)"'key1 = re.compile(key)for string in re.findall(key1, f):print("正在下载" + string)f_req = urllib.request.Request(string, headers=headers)try:f_url = urllib.request.urlopen(f_req).read()except urllib.error.URLError:continuefs = open(dir + "/" + keyword + str(num) + ".jpg", "wb+")fs.write(f_url)fs.close()num += 1print(string + "已下载成功")if __name__ == '__main__':parser = argparse.ArgumentParser()parser.add_argument('-k', '--keyword', required=True, help="The keyword that will be searched on www.baidu.com.")parser.add_argument('-sd', '--saving_dir', required=True, help="The directory which the crawled images will be saved.")parser.add_argument('-np', '--num_pages', required=True, help="Number of pages you want to crawl through.")args = parser.parse_args()main()

大批量爬虫工具包（可爬取bing、谷歌、百度）～～～～～～超好用，也简单

https://blog.csdn.net/aabbcccddd01/article/details/109647287

pip install icrawler

from icrawler.builtin import BaiduImageCrawler
from icrawler.builtin import BingImageCrawler
from icrawler.builtin import GoogleImageCrawler
#需要爬虫的关键字
list_word = ['水位尺']
for word in list_word:#bing爬虫#保存路径#bing_storage = {'root_dir': 'bing\\'+word}#从上到下依次是解析器线程数，下载线程数，还有上面设置的保存路径#bing_crawler = BingImageCrawler(parser_threads=2, downloader_threads=4, storage=bing_storage)#开始爬虫，关键字+图片数量#bing_crawler.crawl(keyword=word, max_num=2000)#百度爬虫baidu_storage = {'root_dir': 'baidu\\' + word }baidu_crawler = BaiduImageCrawler(parser_threads=2, downloader_threads=4, storage=baidu_storage)baidu_crawler.crawl(keyword=word, max_num=2000)# google爬虫# google_storage = {'root_dir': '‘google\\' + word}# google_crawler = GoogleImageCrawler(parser_threads=4, downloader_threads=4, storage=google_storage)# google_crawler.crawl(keyword=word, max_num=2000)

python 爬虫工具爬取 bing,百度浏览器图片相关推荐

【python爬虫】爬取Bing词典的单词存到SQLite数据库（加了pyqt5界面显示）
之前一篇博客的连接: [python爬虫]爬取Bing词典的单词存到SQLite数据库相比之前这篇文章中的源码,这次带UI的代码不仅改进了UI界面,爬虫部分的代码也有改进. 展示代码github ...
Python爬虫之爬取绝对领域美女图片
Python爬虫之爬取绝对领域美女图片第一步: 导入模块: import requests from lxml import etree 第二步:定义函数: def get_url(start_ur ...
【python爬虫】爬取Bing词典的单词存到SQLite数据库
爬取Bing词典的单词打算做一个单词相关的app自己用,那词典从何而来呢? 想到了用爬虫.爬哪里的数据呢? 个人比较喜欢微软的东西,所以打算从Bing翻译爬取单词 Bug 由于Bing翻译的html ...
python爬虫：爬取男生喜欢的图片
任务目标: 1.抓取不同类型的图片 2.编写一个GUI界面爬虫程序,打包成exe重新文件 3.遇到的难点 1.分析如何抓取不同类型的图片首先打开网站,可以看到有如下6个类型的菜单在这里插入图片描述 ...
10-24-程序员日-我的第一个python爬虫项目-爬取蜂鸟上的图片
今天是诸位程序员的节日,为了1024勋章,我这个菜鸟也来发个博客网络爬虫(又被称为网页蜘蛛,网络机器人)就是模拟浏览器发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序. 在 ...
python爬虫（爬取王者荣耀英雄图片）
爬取王者荣耀全英雄头像和全英雄皮肤图片英雄信息分析在页面加载过程中,我们按F12进入开发者工具,由于页面需要加载英雄信息,我们可以获取全部英雄的信息,我们看到Network可以看到加载的东西其中 ...
Python爬虫，爬取二次元萌妹图片（可自由设定图片像素）
原理通过request获得服务器返回的网址,再使用re正则表达式筛选出图片的地址. 然后将图片下载在一个临时文件夹筛选出符合要求的图片,返回到指定文件夹下. (ps:目前着迷网已经关停了,不知道还能 ...
python爬虫之爬取“唯美“主流图片
个人名片:
python爬虫知乎图片_python爬虫（爬取知乎答案图片）
python爬虫(爬取知乎答案图片) 1.⾸先,你要在电脑⾥安装 python 的环境我会提供2.7和3.6两个版本的代码,但是本⽂只以python3.6版本为例. 安装完成后,打开你电脑的终端(T ...

python 爬虫工具爬取 bing,百度浏览器图片

360搜索爬虫图片

百度爬虫图片

大批量爬虫工具包（可爬取bing、谷歌、百度）～～～～～～超好用，也简单

python 爬虫工具爬取 bing,百度浏览器图片相关推荐

最新文章

热门文章

python 爬虫工具爬 取 bing,百度 浏览器 图片

360搜索爬虫图片

百度爬虫图片

大批量爬虫工具包（可爬取bing、谷歌、百度）～～～～～～超好用，也简单

python 爬虫工具爬 取 bing,百度 浏览器 图片相关推荐

最新文章

热门文章

python 爬虫工具爬取 bing,百度浏览器图片

python 爬虫工具爬取 bing,百度浏览器图片相关推荐