用python 爬取百度百科内容-使用python爬取小说全部内容

爬取代码为import urllib.request

from bs4 import BeautifulSoup

#coding: utf-8

class xiaoShuo():

def __init__(self,url,parLabelValue,parLabelType,parLabel,clildLabelValue,clildLabelType,clildLabel,enc):

self.url = url;

self.parLabelValue = parLabelValue;

self.parLabelType = parLabelType;

self.enc=enc;

self.parLabel = parLabel;

self.clildLabelValue = clildLabelValue;

self.clildLabelType = clildLabelType;

self.clildLabel = clildLabel;

def getUrlContent(self):

response = urllib.request.urlopen(self.url);

html = response.read().decode(self.enc);

pageNode = BeautifulSoup(html, 'html.parser')

iterms = pageNode.find_all(self.parLabel,{self.parLabelType:self.parLabelValue})

for i in range(len(iterms)):

tagA = iterms[i].select("a");

for j in range(len(tagA)):

# print("%s: %s"%(tagA[j].get_text(),tagA[j].get("href")))

content = self.getXiaoShuoContent(self.url,self.clildLabel,self.clildLabelValue,self.clildLabelType,self.enc)

print(content)

def getXiaoShuoContent(self,url,childLabel,childLabelValue,childLabelType,enc):

response = urllib.request.urlopen(url);

html = response.read().decode(enc);

pageNode = BeautifulSoup(html, 'html.parser')

iterms = pageNode.find_all(childLabel, {childLabelType: childLabelValue})

content = "";

for i in range(len(iterms)):

content = iterms[i].get_text(),

return content;

def writeTofile(self,fileName,content):

try:

with open("%s.txt" %(fileName), "w") as f: # 格式化字符串还能这么用！

for i in content:

f.write(i)

except:

print("写入错误")

a = xiaoShuo("https://www.szzyue.com/dushu/11/11255/","L","class","td","contents","id","dd","gbk");

html = a.getUrlContent();

# print(html)

用python 爬取百度百科内容-使用python爬取小说全部内容相关推荐

用python 爬取百度百科内容-爬虫实战(一) 用Python爬取百度百科
最近博主遇到这样一个需求:当用户输入一个词语时,返回这个词语的解释我的第一个想法是做一个数据库,把常用的词语和词语的解释放到数据库里面,当用户查询时直接读取数据库结果但是自己又没有心思做这样一个数 ...
python爬取百度百科表格_第一个python爬虫（python3爬取百度百科1000个页面）
以下内容参考自:http://www.imooc.com/learn/563 一.爬虫简介爬虫:一段自动抓取互联网信息的程序爬虫可以从一个url出发,访问其所关联的所有的url.并从每个url指向 ...
python 百度百科爬虫_爬虫爬取百度百科数据
以前段时间<青春有你2>为例,我们使用Python来爬取百度百科中<青春有你2>所有参赛选手的信息. 什么是爬虫? 为了获取大量的互联网数据,我们自然想到使用爬虫代替我们完成这 ...
python 爬取百度知道,Python 爬虫爬取百度百科网站
利用python写一个爬虫,爬取百度百科的某一个词条下面的全部链接和每一个链接内部的词条主题和摘要.利用request库爬取页面,然后利用BeautifulSoup对爬取到的页面提取url和关键内容. ...
学习开源web知识图谱项目，爬取百度百科中文页面
github上找到的项目,感觉比较适合知识图谱入门源码地址:https://github.com/lixiang0/WEB_KG ubuntu环境(如果在windows下跑改下文件路径,我改了一下还 ...
实战爬取百度百科1000多个页面，发现惊天大密，你们猜猜看
推荐一下我建的python学习交流qun:850973621,群里有免费的视频教程,开发工具. 电子书籍.项目源码分享.一起交流学习,一起进步! 爬虫最简单的架构就三个方面: 1.URL管理器:主要负 ...
python类百度百科_Python抓取百度百科数据
抓取策略确定目标:确定抓取哪个网站的哪些页面的哪部分数据.本实例抓取百度百科python词条页面以及python相关词条页面的标题和简介. 分析目标:分析要抓取的url的格式,限定抓取范围.分析要抓 ...
python爬虫百度百科-如何入门 Python 爬虫？
目前网上有关网页爬虫的指导有很多,但是套路却是千篇一律,基本都是围绕以下内容进行展开,CSS/html等网页知识 requests或urllib BeautifulSoup或正则表达式 Seleniu ...
python爬虫实战(一)～爬取百度百科人物的文本+图片信息+Restful api接口
我的github地址:https://github.com/yuyongsheng1990/python_spider_from_bdbaike # -*- coding: UTF-8 -*- # @ ...
【爬虫实战】10应用Python网络爬虫——定向爬取百度百科文字
python百度百科爬虫网页源代码分析编程实现小结网页源代码分析首先找一下需要爬取的正文: 对应的源代码有两个地方: 上图往后翻会发现省略号,所以下面这张图才是我们需要爬取的部分: 编程实现 ...

用python 爬取百度百科内容-使用python爬取小说全部内容

用python 爬取百度百科内容-使用python爬取小说全部内容相关推荐

最新文章

热门文章