python 爬虫糗百成人

import urllib
from time import sleepimport requests
from lxml import etreetry:def all_links(url,page):# if "900.html" in url:#     print("结束");#     return Noneurl = url + str(page) + ".html";response = requests.get(url)print(url, response.status_code)html = etree.HTML(response.content.decode('gbk'))## 获取图片 并且保存imgs = html.xpath('.//div[@id="wrapper"]//div[@class="ui-module"]//img/@src')for img in imgs:file_name = img.split('/')[-1]first = img.split('/')[0]if first != 'http:' and first != 'https:':print("错误图片"+img)else:dir_path = "/www/spider/images/"try:file_content = requests.get(img)if file_content.status_code != 200:print(img,"下载失败")else:#urllib.request.urlretrieve(img, dir_path + file_name)with open(dir_path+file_name,"wb") as f:f.write(file_content.content)print("保存图片" + dir_path + file_name + "成功")except Exception as ee:print(str(ee))# links = html.xpath('.//div[@class="page"]//a[contains(text(),"下一页")]/@href')# print(links)# if len(links) < 1:#     pass# else:sleep(1)host = 'http://www.qiubaichengren.net/'next_page = page + 1all_links(host,next_page)for i in range(1,991):all_links("http://www.qiubaichengren.net/",354)
except Exception as e:print(str(e))

循环的版本

import urllibfrom time import sleep

import requestsfrom lxml import etree

try:    def all_links(url):        if "100.html" in url:            print("结束");            return None        response = requests.get(url)        print(url, response.status_code)        html = etree.HTML(response.content.decode('gbk'))        ## 获取图片 并且保存        imgs = html.xpath('.//div[@id="wrapper"]//div[@class="ui-module"]//img/@src')        for img in imgs:            file_name = img.split('/')[-1]            first = img.split('/')[0]            if first != 'http:' and first != 'https:':                print("错误图片"+img)            else:                dir_path = "d:\\www\\spider\\images\\"                urllib.request.urlretrieve(img, dir_path + file_name)                print("保存图片" + dir_path + file_name + "成功")        links = html.xpath('.//div[@class="page"]//a[contains(text(),"下一页")]/@href')        print(links)        if len(links) < 1:            pass        else:            sleep(5)            host = 'http://www.qiubaichengren.net/'            new_url = host + links[0];            all_links(new_url)    all_links("http://www.qiubaichengren.net/8.html")except Exception as e:    print(str(e))

python 爬虫糗百成人相关推荐

Python爬虫实例：糗百
看了下python爬虫用法,正则匹配过滤对应字段,这里进行最强外功:copy大法实践一开始是直接从参考链接复制粘贴的,发现由于糗百改版导致失败,这里对新版html分析后进行了简单改进,把整理过程记录 ...
python爬虫——从此不用再愁找不到小说txt文件
python爬虫--从此不用再愁找不到小说txt文件最近在学习python,学了个大概就开始写爬虫了,之前做了个糗百的简单爬虫,然后底下还做了一些学校教务系统的爬虫,爬取了自己的成绩,看着挂科的大英 ...
python爬虫多线程是什么意思_python爬虫中多线程的使用详解
queue介绍 queue是python的标准库,俗称队列.可以直接import引用,在python2.x中,模块名为Queue.python3直接queue即可在python中,多个线程之间的数据 ...
python爬虫概述及简单实践
文章目录一.先了解用户获取网络数据的方式二.简单了解网页源代码的组成 1.web基本的编程语言 2.使用浏览器查看网页源代码三.爬虫概述 1.认识爬虫 2.python爬虫 3.爬虫分类 4.爬 ...
Python爬虫实战一之爬取糗事百科段子
点我进入原文另外, 中间遇到两个问题: 1. ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128) 解 ...
【游戏开发进阶】带你玩转模型法线，实验一下大胆的想法（法线贴图 | shader | Unity | python | 爬虫）
文章目录一.前言二.直观感受法线贴图三.表面法线 1.表面法线的概念 2.空间与坐标系 2.1.世界空间--世界坐标系 2.2.局部空间--局部坐标系 2.3.切线空间--切线坐标系 2.4.小 ...
Python爬虫5.3 — scrapy框架spider[Request和Response]模块的使用
Python爬虫5.3 - scrapy框架spider[Request和Response]模块的使用综述 Request对象 scrapy.Request()函数讲解: Response对象发送 ...
网络爬虫是什么？怎么学python爬虫
网络爬虫又称网络蜘蛛.网络机器人,它是一种按照一定的规则自动浏览.检索网页信息的程序或者脚本.网络爬虫能够自动请求网页,并将所需要的数据抓取下来.通过对抓取的数据进行处理,从而提取出有价值的信息. 认 ...
Python爬虫实战之12306抢票
12306抢票前言一.爬虫是什么? 二.使用步骤 1.引入库 2.爬虫代码 3.城市编码 4.主程序总结前言提示:用python实现简单的12306余票查询提示:以下是本篇文章正文内容,下 ...
python爬虫教程简书_7个Python爬虫实战项目教程
有很多小伙伴在开始学习Python的时候,都特别期待能用Python写一个爬虫脚本,实验楼上有不少python爬虫的课程,这里总结几个实战项目,如果你想学习Python爬虫的话,可以挑选感兴趣的学习哦 ...

python 爬虫糗百成人

python 爬虫糗百成人相关推荐

最新文章

热门文章

python 爬虫 糗百成人

python 爬虫 糗百成人相关推荐

最新文章

热门文章

python 爬虫糗百成人

python 爬虫糗百成人相关推荐