梦幻西游python验证成语,Python爬取成语信息

2024-05-13 09:55:28

可以用来为制作成语游戏提供原始数据。

import requests

from bs4 import BeautifulSoup

import csv

# 获取成语表：成语-拼音-释义。

headers = {

'User-Agent':

'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15'

}

def getIntroduction(url):

res = requests.get(url, headers=headers, allow_redirects=False)

res.encoding = "utf-8"

soup = BeautifulSoup(res.text)

chengyu = soup.select('h1')

chengyu = str(chengyu[0]).replace("

", "")

chengyu = chengyu.replace("", "")

introText = soup.select('div[class="con"]')

introText = str(introText[0]).replace("

", "")

introText = introText.replace("

", "")

introText = introText.replace('

', "")

introText = introText.replace('

', "")

introText = introText.replace('', "")

introText = introText.replace('

', "")

introText = introText.replace('

', "")

introText = introText.replace('

', "")

introText = introText.replace('', "")

introText = introText.replace('', "")

introText = introText.replace('

', "")

introList = introText.split('\n')

pinyin = introList[1]

intro = introList[2]

print(chengyu, pinyin, intro)

return chengyu, pinyin, intro

def writeCsv(path, content):

'''写入csv'''

with open(path, 'w', encoding='utf-8', newline='') as f:

writer = csv.writer(f)

for r in content:

writer.writerow(r)

print("已写入 ", path)

if __name__ == "__main__":

maxIndex = 30898

idioms = []

targetNum = 10

curNum = 0

for index in range(1, maxIndex):

url = r'https://www.chengyucidian.net/cy/' + str(index) + '.html'

chengyu, pinyin, intro = getIntroduction(url)

if (len(chengyu) == 4):

curList = []

curList = [chengyu, pinyin, intro]

idioms.append(curList)

curNum += 1

if curNum >= targetNum:

break

writeCsv('成语.csv', idioms)

最终会生成一个表格：成语.csv

内容如下：

image.png

梦幻西游python验证成语,Python爬取成语信息相关推荐

Python爬虫：Xpath爬取网页信息（附代码）
Python爬虫:Xpath爬取网页信息(附代码) 上一次分享了使用Python简单爬取网页信息的方法.但是仅仅对于单一网页的信息爬取一般无法满足我们的数据需求.对于一般的数据需求,我们通常需要从一个 ...
python爬虫实战之爬取成语大全
业余时间,想学习中华文化之成语,身边没有成语词典,网上一个个翻网页又比较懒,花了半个小时搞定数字成语词典,你值得拥有! 爬取思路找到首页网址:https://www.chengyucidian.ne ...
一个简单python爬虫的实现——爬取电影信息
最近在学习网络爬虫,完成了一个比较简单的python网络爬虫.首先为什么要用爬虫爬取信息呢,当然是因为要比人去收集更高效. 网络爬虫,可以理解为自动帮你在网络上收集数据的机器人. 网络爬虫简单可以大致 ...
Python爬虫教你爬取视频信息
大家好,我是拉斯,今天分享一个爬取某音视频的一个小案例,大家一起学习目录前言基本环境配置爬取目标视频获取视频链接 1.查看网页源代码 2.抓包工具捕捉下载视频(以mp4格式进行保存) 获取 ...
python爬虫实战笔记——爬取图书信息（利用selenium库+chromedriver.exe插件）
准备: 1.插件chromedriver.exe 2.已经安装好谷歌浏览器Chrome 编写代码 from bs4 import BeautifulSoup from selenium import ...
python平台租用_Python爬取房屋租售信息
缘起第一次接触"租售比"这个概念是在知乎团支书对如何通过房屋租售比来判断房产的价值或泡沫? 这个问题的回答上看到的,当时看到她搞出来的一些图和分析就感觉很有意思,寻思着自己 ...
python爬取网页信息
最近在学习python,发现通过python爬取网页信息确实方便,以前用C++写了个简单的爬虫,爬取指定网页的信息,代码随便一写都几百行,而要用python完成相同的工作,代码量相当少.前几天看到了一 ...
利用python的scrapy框架爬取google搜索结果页面内容
scrapy google search 实验目的爬虫实习的项目1,利用python的scrapy框架爬取google搜索结果页面内容. https://github.com/1012598167/ ...
【Python爬虫】从零开始爬取Sci-Hub上的论文(串行爬取)
[Python爬虫]从零开始爬取Sci-Hub上的论文(串行爬取) 维护日志项目简介步骤与实践 STEP1 获取目标内容的列表 STEP2 利用开发者工具进行网页调研 2.1 提取文章链接和分页链 ...
[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息（四） —— 应对反爬技术（选取 User-Agent、添加 IP代理池以及Cookies池）
上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) -- 数据的持久化--使用MongoDB存储爬取的数据最近项目有些忙,很多需求紧急上线,所以一直没能完善< 使用 ...

最新文章

热门文章