python段子_python爬虫爬取段子

#-*-coding:utf-8

import requests

import os

from bs4 import BeautifulSoup

import lxml

import sys

#改变系统的默认编码

reload(sys)

sys.setdefaultencoding('utf-8')

class DZ():

def __init__(self,url,pageIndex):

self.url=url+str(pageIndex)

self.headers={'User_Agent':'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

#得到一页的源码

def get_one_page_html(self):

re=requests.get(self.url,self.headers)

html=re.text

return html

#得到所有的段子的url

def get_one_text_url(self):

all_a=[] #这里用来记录每一条段子的链接

for i in range(1,2): #这里先暂时爬取30页的段子

html=self.get_one_page_html()

soup=BeautifulSoup(html,'lxml')

all_h2=soup.find_all('h2')

for h2 in all_h2:

all_a.append(h2.find('a').get('href'))

return all_a

#下载所有的段子

def get_text(self):

all_a=self.get_one_text_url() #首先获得所有段子的url

x=0

for a in all_a:

re=requests.get(a,headers=self.headers)

html=re.text

soup=BeautifulSoup(html,'lxml')

all_p=soup.find('article',class_='article-content').find_all('p')

for p in all_p:

print p.text

if __name__=='__main__':

url='https://duanziwang.com/category/duanzi/page/'

app=DZ(url,1)

app.get_text()

希望与广大网友互动？？

点此进行留言吧！

python段子_python爬虫爬取段子相关推荐

python爬虫经典段子_Python爬虫-爬取糗事百科段子
闲来无事,学学python爬虫. 在正式学爬虫前,简单学习了下HTML和CSS,了解了网页的基本结构后,更加快速入门. 1.获取糗事百科url 2.先抓取HTML页面 importurllibimpo ...
智联招聘python岗位_Python爬虫爬取智联招聘职位信息
import urllib2 import re import xlwt '''遇到不懂的问题?Python学习交流群:821460695满足你的需求,资料都已经上传群文件,可以自行下载!''' cl ...
python + selenium +pyquery 爬虫爬取 1688详情图片阿里巴巴详情图片与标题下载图片并进行压缩
python + selenium +pyquery 爬虫爬取 1688详情图片阿里巴巴详情图片与标题下载图片并进行压缩用到的库和源码下载地址需要用到chromedriver 包含wi ...
python简单网站爬虫-爬取北京7天最高、最低气温
python简单网站爬虫-爬取北京7天最高.最低气温前置操作: 1.待爬取网站: 北京天气的网址: http://www.weather.com.cn/weather1d/101010100.sht ...
Python网络爬虫与聚焦爬虫，如何用爬虫爬取段子
一.网络爬虫 Python爬虫开发工程师,从网站某一个页面(通常是首页)开始,读取网页的内容,找到在网页中的其它链接地址,然后通过这些链接地址寻找下一个网页,这样一直循环下去,直到把这个网站所有的网页 ...
python爬虫提取人名_python爬虫—爬取英文名以及正则表达式的介绍
python爬虫-爬取英文名以及正则表达式的介绍爬取英文名: 一. 爬虫模块详细设计 (1)整体思路对于本次爬取英文名数据的爬虫实现,我的思路是先将A-Z所有英文名的连接爬取出来,保存在一个csv ...
python跑一亿次循环_python爬虫爬取微博评论
原标题:python爬虫爬取微博评论 python爬虫是程序员们一定会掌握的知识,练习python爬虫时,很多人会选择爬取微博练手.python爬虫微博根据微博存在于不同媒介上,所爬取的难度有差异,无 ...
python爬虫网页中的图片_Python爬虫爬取一个网页上的图片地址实例代码
本文实例主要是实现爬取一个网页上的图片地址,具体如下. 读取一个网页的源代码: import urllib.request def getHtml(url): html=urllib.request. ...
python爬表格数据_python爬虫,爬取表格数据
python爬虫,爬取表格数据 python爬虫,爬取表格数据 python爬虫,爬取全国空气质量指数编程环境:Jupyter Notebook 所要爬取的网页数据内容如下图 python爬虫代码及 ...

python段子_python爬虫爬取段子

python段子_python爬虫爬取段子相关推荐

最新文章

热门文章