python爬虫爬取qq音乐巅峰榜热歌歌词,jieba中文分词,词云展示
先看结果
1、获取列表页信息,url为https://c.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg?tpl=3&page=detail&date=2019_02&topid=26&type=top&song_begin=0&song_num=30&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0
json样式为:
2、获取详情页
headers = {"authority": "c.y.qq.com","method": "GET","path": "/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=225716644&-=jsonp1&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0","scheme": "https","accept": "application/json, text/javascript, */*; q=0.01","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","cookie": "pgv_pvi=5936793600; pt2gguin=o1952436511; RK=g+4hNa7BQD; ptcz=653047c5b0174eb6b929c242110d08693b9dfcbaa701ddbf37ccc23c3366b94c; pgv_pvid=9049425500; ts_uid=9851761599; o_cookie=1952436511; tvfe_boss_uuid=5e81ff5fb8d5a1ea; yqq_stat=0; pgv_info=ssid=s484511232; ts_refer=ADTAGbaiduald; pgv_si=s21197824; yq_index=0; player_exist=1; qqmusic_fromtag=66; yplayer_open=0; ts_last=y.qq.com/n/yqq/song/002krvKI4Jgvq9.html","origin": "https://y.qq.com","referer": "https://y.qq.com/n/yqq/song/002krvKI4Jgvq9.html","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
}
jsond = {"nobase64": "1","musicid": item['data']['songid'],"-": "jsonp1","g_tk": "5381","loginUin": "0","hostUin": "0","format": "json","inCharset": "utf8","outCharset": "utf-8","notice": "0","platform": "yqq.json","needNewCode": "0"
}
r = requests.get("https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg", params=jsond, headers=headers)
json样式为:
3、将歌词存到文件test.txt里,用于读取。
4、逐行读取文件、构建要处理的数据字符串
5、jieba库、词云制作。
上爬虫代码:
# -*-coding:UTF-8 -*-
import json
import re
import requests
headers = {
"authority": "c.y.qq.com",
"method": "GET",
"path": "/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=225716644&-=jsonp1&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0",
"scheme": "https",
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"cookie": "pgv_pvi=5936793600; pt2gguin=o1952436511; RK=g+4hNa7BQD; ptcz=653047c5b0174eb6b929c242110d08693b9dfcbaa701ddbf37ccc23c3366b94c; pgv_pvid=9049425500; ts_uid=9851761599; o_cookie=1952436511; tvfe_boss_uuid=5e81ff5fb8d5a1ea; yqq_stat=0; pgv_info=ssid=s484511232; ts_refer=ADTAGbaiduald; pgv_si=s21197824; yq_index=0; player_exist=1; qqmusic_fromtag=66; yplayer_open=0; ts_last=y.qq.com/n/yqq/song/002krvKI4Jgvq9.html",
"origin": "https://y.qq.com",
"referer": "https://y.qq.com/n/yqq/song/002krvKI4Jgvq9.html",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
}
jsonlist={
"tpl":"3" ,
"page": "detail",
"date": "2019_02",
"topid": "26",
"type": "top",
"song_begin": "0",
"song_num": "100",
"g_tk": "5381",
"loginUin": "0",
"hostUin": "0",
"format": "json",
"inCharset": "utf8",
"outCharset": "utf-8",
"notice": "0",
"platform": "yqq.json",
"needNewCode": "0"
}
r1 = requests.get("https://c.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg", params=jsonlist)
jlist = json.loads(r1.text)
f = open('test.txt', 'a+')
for item in jlist['songlist']:
#print (str(item['data']['songid'])+" "+item['data']['songname'])
jsond = {
"nobase64": "1",
"musicid": item['data']['songid'],
"-": "jsonp1",
"g_tk": "5381",
"loginUin": "0",
"hostUin": "0",
"format": "json",
"inCharset": "utf8",
"outCharset": "utf-8",
"notice": "0",
"platform": "yqq.json",
"needNewCode": "0"
}
r = requests.get("https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg", params=jsond, headers=headers)
r.encoding = "utf-8"
ch_pat = re.compile(r'[\u4e00-\u9fa5:]+')
ch_words = ch_pat.findall(r.text)
first = 0
for i in range(1, int(len(ch_words) / 2)):
if ch_words[i].find(':') > 0:
first = i
break
flag = first
for i in range(first, int(len(ch_words) / 2)):
if ch_words[i].find(':') < 0 and ch_words[i + 1].find(':') < 0 and ch_words[i + 2].find(':') < 0:
flag = i
break
#print(ch_words[flag:], "\n", flag)
#strres = ','.join(ch_words[flag:])
strquqita = ''
for i in ch_words[flag:]:
if i.find(':')<0:
strquqita = strquqita+i+","
#chuli = r.text.replace(" ",'').replace('[:','').replace("]
",'')
#f.write(codecs.BOM_UTF8)
f.write(strquqita+"\n")
print (strquqita)
f.close()
上词云代码
#-*-coding:UTF-8 -*-
import jieba
from wordcloud import WordCloud
f = open('test.txt', 'r+')
f.readline()
strchuli = ''
for i in f:strchuli = strchuli+i+"。"
wordlist = jieba.cut(strchuli, cut_all=False)
#print (len(list(wordlist)))
word_string = " ".join(wordlist)
wordcloud = WordCloud(font_path='C:\Windows\Fonts\simkai.ttf', background_color="white",width=1000, height=860, margin=2).generate(word_string)
import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
wordcloud.to_file('jieguo.png')
python爬虫爬取qq音乐巅峰榜热歌歌词,jieba中文分词,词云展示相关推荐
- python爬虫爬取qq音乐热歌榜的歌曲到本地
文章目录 项目目标 具体实现步骤 完整代码 运行结果 项目目标 爬取qq音乐热歌榜https://y.qq.com/n/yqq/toplist/26.html到本地文件夹 具体实现步骤 程序思路:用s ...
- python定时器爬取豆瓣音乐Top榜歌名
python定时器爬取豆瓣音乐Top榜歌名 作者:vpoet 日期:大约在夏季 注:这些小demo都是前段时间为了学python写的,现在贴出来纯粹是为了和大家分享一下 #coding=utf-8im ...
- python爬虫爬取qq空间说说_用python爬取qq空间说说
环境:PyCharm+Chorme+MongoDB Window10 爬虫爬取数据的过程,也类似于普通用户打开网页的过程.所以当我们想要打开浏览器去获取好友空间的时候必定会要求进行登录,接着再是查看说 ...
- python爬虫——爬取淘票票正在热映电影
今天正好学习了一下python的爬虫,觉得收获蛮大的,所以写一篇博客帮助想学习爬虫的伙伴们. 这里我就以一个简单地爬取淘票票正在热映电影为例,介绍一下一个爬虫的完整流程. 首先,话不多说,上干货--源 ...
- 爬取QQ音乐中一首歌的相关信息及评论(破解反爬虫、多协程队列爬虫)
刚写完这个实验作业,顺便来记录一下一些易错的地方: 目录 一.页面预览 二.其他信息 三.一些小细节 四.源码 项目框图: 一.页面预览 先从歌手页爬取到这首歌的相关信息,包括它歌曲.专辑的url.这 ...
- [python爬虫]--爬取豆瓣音乐topX
最近在学习python爬虫,写出来的一些爬虫记录在csdn博客里,同时备份一个放在了github上. github地址:https://github.com/wjsaya/python_spider_ ...
- python批量爬取QQ音乐歌手的歌曲等信息
import requests,openpyxl #导入模块 wb=openpyxl.Workbook() #创建工作薄 sheet=wb.active #获取工作薄的活动表 sheet.title= ...
- 如何用python爬虫爬取qq空间说说
之前学了下爬虫一直就想爬一下QQ空间 在爬取之前需要做的准备工作 安装python3 需要的库: re 正则 selenium 需要安装 chrome 或者 Firefox 还有他们的模拟 Chrom ...
- Python爬虫爬取豆瓣TOP250和网易云歌单
python爬虫(网易云)笔记 @(python学习) 先推荐看一下b站的视频链接如下:https://www.bilibili.com/video/BV12E411A7ZQ?from=search& ...
- python爬虫爬取音乐单曲_Python爬取qq音乐的过程实例
一.前言 qq music上的音乐还是不少的,有些时候想要下载好听的音乐,但有每次在网页下载都是烦人的登录什么的.于是,来了个qqmusic的爬虫.至少我觉得for循环爬虫,最核心的应该就是找到待爬元 ...
最新文章
- Eclipse-常用插件
- Libra教程之:Libra testnet使用指南
- pytorch0.4版的CNN对minist分类
- CMOS图像传感器——工作原理
- Android:源码环境编译自定义的APP到ROM(System Image)中
- Linux卸载/删除多余网卡
- hadoop-集群安装
- LeetCode 240. Search a 2D Matrix II
- threadpooltaskexecutor线程池使用_线程池的理解及使用
- R 学习笔记《十二》 R语言初学者指南--格包
- 如何使用Syncios Data Recovery直接从 iOS 设备恢复数据?
- Bayer Pattern
- Python教你实现微信防撤回~
- Pytorch基础知识(13)对抗样本
- 群晖NAS 7.X版搭建博客网站,并内网穿透发布公网可访问 8-8
- CASS9.2启动提示连接数据库失败的解决方案
- Unity-模型导入-材质
- 有机化学研究生博士生为什么被要求长时间工作
- CSS经常会用到的属性
- 《机器学习》周志华第10章降维与度量学习 思维导图+笔记+习题