python_爬虫_豆瓣TOP250

本文仅供学习使用，如有侵权，联系删除。

获得豆瓣top 250书单的url

import lxml
import requests
import re
import csv
from requests.exceptions import RequestExceptionurl_lt = []def get_one_page(url):try:headers = {"User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36"}response = requests.get(url,headers=headers,timeout = 5)if response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef get_book_url_list(html):soup = BeautifulSoup(html,'lxml')url_list_info = soup.find_all(class_ = 'pl2')pattern = re.compile('<a.*?href=(.*?)onclick=.*?title.*?>.*?</a>',re.S)for url in url_list_info:url = str(url)url = re.search(pattern,url)url_lt.append(url.group(1).strip())def main(offset):url = 'https://book.douban.com/top250?start=' + str(offset)html = get_one_page(url)get_book_url_list(html)print(len(url_lt))def write_csv(file,url_list):with open(file,'a',encoding='utf-8',newline='') as csvfile:fieldnames = ["rank","book_url"]writer = csv.DictWriter(csvfile,fieldnames=fieldnames)writer.writeheader()for i in range(len(url_list)):writer.writerow({"rank":i+1,"book_url":url_list[i]})if __name__ == '__main__':for i in range(10):main(i)write_csv("douban_TOP250_data.csv",url_lt)

python_爬虫_豆瓣TOP250_url相关推荐

python_爬虫_豆瓣TOP250_页面内容
本文仅供学习使用,如有侵权,联系删除豆瓣TOP250书籍页面内容如下,此次将爬取图片中的内容 from bs4 import BeautifulSoup import lxml import req ...
Python_爬虫_网页图片下载_その日の紋
Python_爬虫_网页图片下载_その日の紋项目效果项目需求项目分析 URL分析页面分析项目实施项目源码项目效果项目需求目标页面:https://www.hanakomon.jp/c ...
Python_爬虫_案例汇总：
1.豆瓣采集 1 #coding:utf-8 2 #采集豆瓣书信息和图片,写进数据库 3 4 from urllib import request 5 # from bs4 import Beauti ...
Python_爬虫_猫眼电影网电影预告片批量下载
非常简单的一个基础爬虫代码,可以根据不同的url自动下载同一页中的所有预告片 import requests from lxml import etree import re# 1.确定url地址 u ...
python_爬虫_七麦网
本文用于学习交流使用,如有侵权,联系删除 1 爬取需求 1.1 七麦网简介七麦网(https://www.qimai.cn/),该平台支持提供iOS.Android应用市场.微信.小程序等数据查询, ...
爬虫_豆瓣全部正在热映电影（xpath）
单纯地练习一下xpath 1 import requests 2 from lxml import etree 3 4 5 def get_url(url): 6 html = requests.ge ...
最帅爬虫_豆瓣读书(加密数据获取)
网址: http://book.douban.com/subject_search?search_text=python&cat=1001&start=%s0 需求: 获取所有 pyt ...
Python_爬虫_中文乱码
今天在用Python2.7爬取百度百科的一个网页时发现输出时中文为乱码. 尝试一: 查看网页页面信息,发现其中文字编码为"GBK",遂准备对其进行解码. content = url ...
python爬虫和数据分析的书籍_豆瓣书籍数据爬取与分析
前言 17年底,买了清华大学出版社出版的<Hadoop权威指南>(第四版)学习,没想到这本书质量之差,超越我的想象,然后上网一看,也是骂声一片.从那个时候其就对出版社综合实力很感兴趣,想通 ...

python_爬虫_豆瓣TOP250_url

获得豆瓣top 250书单的url

python_爬虫_豆瓣TOP250_url相关推荐

最新文章

热门文章