python功能性爬虫案例_Python爬虫实现使用beautifulSoup4爬取名言网功能案例

本文实例讲述了Python爬虫实现使用beautifulSoup4爬取名言网功能。分享给大家供大家参考，具体如下：

爬取名言网top10标签对应的名言，并存储到mysql中，字段(名言，作者，标签)

#! /usr/bin/python3

# -*- coding:utf-8 -*-

from urllib.request import urlopen as open

from bs4 import BeautifulSoup

import re

import pymysql

def find_top_ten(url):

response = open(url)

bs = BeautifulSoup(response,'html.parser')

tags = bs.select('span.tag-item a')

top_ten_href = [tag.get('href') for tag in tags]

top_ten_tag = [tag.text for tag in tags]

# print(top_ten_href)

# print(top_ten_tag)

return top_ten_href

def insert_into_mysql(records):

con = pymysql.connect(host='localhost',user='root',password='root',database='quotes',charset='utf8',port=3306)

cursor = con.cursor()

sql = "insert into quotes(content,author,tags) values(%s,%s,%s)"

for record in records:

cursor.execute(sql, record)

con.commit()

cursor.close()

con.close()

# http://quotes.toscrape.com/tag/love/

#要获取对应标签中所有的名言所以这里要考虑分页的情况

#经过在网页上查看知道分页查询的url

#http://quotes.toscrape.com/tag/love/page/1/

#判断到那一页没有数据 div.container div.row [1]

def find_link_content(link):

page = 1

while True:

new_link = "http://quotes.toscrape.com" + link + "page/"

# print(new_link)

new_link = new_link + str(page)

print(new_link)

sub_bs = open(new_link)

sub_bs = BeautifulSoup(sub_bs,'html.parser')

quotes = sub_bs.select('div.row div.col-md-8 span.text')

# 如果没有数据就退出

if len(quotes) == 0:

break

#名言

quotes = [quote.text.strip('“”') for quote in quotes]

#作者

authors = sub_bs.select('small.author')

authors = [author.text for author in authors]

# 标签

tags_list = sub_bs.select('meta.keywords')

tags_list = [tags.get('content') for tags in tags_list]

# print(authors)

# print(quotes)

#print(tags_list)

record_list = []

for i in range(len(quotes)):

tags = tags_list[i]

tags = tags.replace(',','，')

print(tags)

record = [quotes[i],authors[i],tags]

record_list.append(record)

insert_into_mysql(record_list)

page += 1

def main():

url = "http://quotes.toscrape.com/"

parent_link = find_top_ten(url)

for link in parent_link:

print(link)

find_link_content(link)

if __name__ == '__main__':

main()

希望本文所述对大家Python程序设计有所帮助。

python功能性爬虫案例_Python爬虫实现使用beautifulSoup4爬取名言网功能案例相关推荐

Python爬虫学习---------使用beautifulSoup4爬取名言网
爬取名言网top10标签对应的名言,并存储到mysql中,字段(名言,作者,标签) #! /usr/bin/python3 # -*- coding:utf-8 -*-from urllib.requ ...
Python爬虫学习笔记 (9) [初级] 小练习爬取慕课网课程清单
更新日期: 2021.03.28 本节学习内容 : 练习使用 bs4 和 xlwings - 爬取慕课网免费课程清单并存为 Excel 文件. 目录 1. 目标信息 2. 爬取步骤 3. 代码 5. ...
python 批量下载网页图片_手把手教你爬取天堂网1920*1080大图片（批量下载）——实战篇|python基础教程|python入门|python教程...
https://www.xin3721.com/eschool/pythonxin3721/ /1 前言/ 上篇文章手把手教你爬取天堂网1920*1080大图片(批量下载)--理论篇我们谈及了天堂网 ...
python交通调查数据处理_Python突破高德API限制爬取交通态势数据+GIS可视化（超详细）...
一.需求: 爬取高德的交通态势API,将数据可视化为含有交通态势信息的矢量路网数据. 二.使用的工具: Python IDLE.记事本编辑器.ArcGIS 10.2.申请的高德开发者KEY(免费). ...
基于python的音乐数据分析_Python对QQ音乐进行爬取并进行数据分析
三方包引入使用到了以下包: 爬虫 scrapy 网络测试 requests 数据分析 numpy和pandas 绘图 matplotlib和wordcloud pip install scrapy ...
python扇贝单词书_Python脚本扇贝单词书爬取
这是一个·用于爬取扇贝单词书的脚本将在.py文件目录得到一个名为out.txt的输出文件主要使用了selenium库(webdriver) 使用方式: 更改 13行中指向webdriver驱动器 ...
爬虫实战1-多进程爬取名言网
import requests import re from multiprocessing import Pooldef get_html(url, header=''):''':param url ...
python卡路里程序_SpiderFlow平台v0.3.0初次使用并爬取薄荷网的热量和减法功效
spider-flow 作为web爬虫他可以简单的说是新一代的爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫. 也就是说我们不用在刻意的为了一些数据就去学一下语言如python,我们只要画个 ...
python网页爬虫漫画案例_Python爬虫-用Scrapy框架实现漫画的爬取
14.jpg 在之前一篇抓取漫画图片的文章里,通过实现一个简单的Python程序,遍历所有漫画的url,对请求所返回的html源码进行正则表达式分析,来提取到需要的数据. 本篇文章,通过 scrapy ...

python功能性爬虫案例_Python爬虫实现使用beautifulSoup4爬取名言网功能案例

python功能性爬虫案例_Python爬虫实现使用beautifulSoup4爬取名言网功能案例相关推荐

最新文章

热门文章