Python E化-爬虫VOA-下载MP3

import re
import json, requests, sys
from bs4 import BeautifulSoup
import openpyxlfile1=r"D:\Python_E\others\trial\VOA\Essay.xlsx"# 建立excel保存数据
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = 'English'
sheet.cell(row=1,column=1).value='Category'
sheet.cell(row=1,column=2).value='Essay'
sheet.cell(row=1,column=3).value='Link'
sheet.cell(row=1,column=4).value='Content'
sheet.cell(row=1,column=5).value='mp3_link'# 打开网页，判断为静态网页
url ='https://www.51voa.com/'
#params={'pageNo': '{}'.format(k),}
response = requests.get(url)
print(response.status_code)# 爬取数据 并保存数据
pageSource = response.text # 获取Elements中渲染完成的网页源代码
soup = BeautifulSoup(pageSource,'html.parser')  # 使用bs解析网页
Essays_message = soup.find('div',id="list").find('ul').find_all('li') # 使用bs提取元素
print(len(Essays_message))i=2
##导出文章列表
for Essay_message in Essays_message:Essay_Category=Essay_message.find_all('a')[0].textEssay_Essay=Essay_message.find_all('a')[-1].textEssay_Link=Essay_message.find_all('a')[-1]['href']#保存数据sheet.cell(row=i,column=1).value=Essay_Categorysheet.cell(row=i,column=2).value=Essay_Essaysheet.cell(row=i,column=3).value='https://www.51voa.com'+Essay_Linki=i+1wb.save(file1)def download_music(music_name, music_url):
#"""下载音乐"""response = requests.get(music_url)content = response.contentsave_file(r'D:\Python_E\others\trial\VOA\\'+music_name+'.mp3', content)def save_file(filename, content):
#"""保存音乐"""with open(file=filename, mode="wb") as f:f.write(content)##导出文章内容
rows=sheet.max_row
print(rows)## 下面程式会报错，容易发现问题
'''
for j in range(2,4+1):url = sheet.cell(row=j,column=3).valueresponse = requests.get(url)# 爬取数据 并保存数据pageSource = response.text # 获取Elements中渲染完成的网页源代码soup = BeautifulSoup(pageSource,'html.parser')  # 使用bs解析网页Essay_contents = soup.find('div',id="Right_Content").find('div',class_="Content").find_all('p') # 使用bs提取元素Essay_content_list=[]for Essay_content in Essay_contents:Essay_content_p=Essay_content.textEssay_content_list.append(Essay_content_p)Essay_content="\n".join(Essay_content_list)sheet.cell(row=j,column=4).value=Essay_content#提取MP3链接Essay_MP3s = soup.find('a',id="mp3")['href'] # 使用bs提取元素sheet.cell(row=j,column=5).value=Essay_MP3swb.save(file1)#更改歌名music_name=(sheet.cell(row=j,column=2).value).strip().replace('.', '').replace('?', '').replace('/', '').replace(' ', '').replace('(面议)','')music_url=sheet.cell(row=j,column=5).valuedownload_music(music_name, music_url)except Exception:pass
'''## 下面程式不会报错，不容易发现问题
#for j in range(2,rows+1):
for j in range(2,3+1):try:url = sheet.cell(row=j,column=3).valueresponse = requests.get(url)# 爬取数据 并保存数据pageSource = response.text # 获取Elements中渲染完成的网页源代码soup = BeautifulSoup(pageSource,'html.parser')  # 使用bs解析网页Essay_contents = soup.find('div',id="Right_Content").find('div',class_="Content").find_all('p') # 使用bs提取元素Essay_content_list=[]for Essay_content in Essay_contents:Essay_content_p=Essay_content.textEssay_content_list.append(Essay_content_p)Essay_content="\n".join(Essay_content_list)sheet.cell(row=j,column=4).value=Essay_content#提取MP3链接Essay_MP3s = soup.find('a',id="mp3")['href'] # 使用bs提取元素sheet.cell(row=j,column=5).value=Essay_MP3swb.save(file1)#更改歌名music_name=(sheet.cell(row=j,column=2).value).strip().replace('.', '').replace('?', '').replace('/', '').replace(' ', '_')music_url=sheet.cell(row=j,column=5).valuedownload_music(music_name, music_url)except Exception:print(Exception)passprint('数据获取完毕，OK')
wb.close()

Python E化-爬虫VOA-下载MP3相关推荐

Python入门之爬虫--自动下载图片
这个互联网上的数据90%的访问都是爬虫来完成的,爬虫由于检索速度快,定向性高,效率高而受到许多公司和个人的喜爱,如果我们想把一个网站上的图片全部下载下来,可以会花费我们很多时间,如果用爬虫来做的话,我 ...
python 实现 Pixiv 爬虫：下载画师的所有插画
这个帖子主要分享基于python 环境的 pixiv 图片爬取经验分享,主要实现的是给出画师uid,爬取画师的所有插画,下载到本地. 测试环境:PyCharm + Python 3.8.8 @Win ...
Python爬虫+PyQt5制作mp3下载工具
实际效果如下图所示: 源代码有四个文件,分别是kugou.ui.kugou_ui.py.my_spider.py.main.py. kugou.ui: <?xml version="1 ...
新一配：perl循环调用python爬虫批量下载喜马拉雅音频
新一配:perl循环调用python爬虫批量下载喜马拉雅音频手机下载喜马拉雅音频后,获得的音频文件虽然可以转成mp3格式,但其文件名却是一长串字符串,无法辨别是哪一集,网上找了各种工具,都有局限性, ...
Python常用网络爬虫速查表下载
Python常用网络爬虫速查表下载 Post方法: Get方法: css选择器 beautiful soup选择器 xpath选择器可以将图片打印出来,放在桌面看下载地址: 一天掌握python网 ...
python爬虫下载-python爬虫之下载文件的方式总结以及程序实例
python爬虫之下载文件的方式以及下载实例目录第一种方法:urlretrieve方法下载第二种方法:request download 第三种方法:视频文件.大型文件下载实战演示第一种方法: ...
Dataset之MNIST：MNIST(手写数字图片识别+ubyte.gz文件)数据集的下载(基于python语言根据爬虫技术自动下载MNIST数据集)
Dataset之MNIST:MNIST(手写数字图片识别+ubyte.gz文件)数据集的下载(基于python语言根据爬虫技术自动下载MNIST数据集) 目录数据集下载的所有代码 1.主文件 mni ...
python 执行js_Python爬虫之记录一次下载验证码的尝试
好久没有写过爬虫的文章了,今天在尝试着做验证码相关的研究时,遇到了验证码的收集问题. 一般,验证码的加载都有着比较复杂的算法和加密在里边,但是笔者今天碰到的验证码却比较幸运,有迹可循.在此,给 ...
python 下载文件-python爬虫之下载文件的方式总结以及程序实例
python爬虫之下载文件的方式以及下载实例目录第一种方法:urlretrieve方法下载第二种方法:request download 第三种方法:视频文件.大型文件下载实战演示第一种方法: ...
python爬虫批量下载“简谱”
python讨论qq群:996113038 导语: 上次发过一篇关于"python打造电子琴"的文章,从阅读量来看,我们公众号的粉丝里面还是有很多对音乐感兴趣的朋友的.于是,今天我 ...

Python E化-爬虫VOA-下载MP3

Python E化-爬虫VOA-下载MP3

Python E化-爬虫VOA-下载MP3相关推荐

最新文章

热门文章