python爬虫爬取qq音乐巅峰榜热歌歌词，jieba中文分词，词云展示

先看结果

1、获取列表页信息，url为https://c.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg?tpl=3&page=detail&date=2019_02&topid=26&type=top&song_begin=0&song_num=30&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0

json样式为：

2、获取详情页

headers = {"authority": "c.y.qq.com","method": "GET","path": "/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=225716644&-=jsonp1&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0","scheme": "https","accept": "application/json, text/javascript, */*; q=0.01","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","cookie": "pgv_pvi=5936793600; pt2gguin=o1952436511; RK=g+4hNa7BQD; ptcz=653047c5b0174eb6b929c242110d08693b9dfcbaa701ddbf37ccc23c3366b94c; pgv_pvid=9049425500; ts_uid=9851761599; o_cookie=1952436511; tvfe_boss_uuid=5e81ff5fb8d5a1ea; yqq_stat=0; pgv_info=ssid=s484511232; ts_refer=ADTAGbaiduald; pgv_si=s21197824; yq_index=0; player_exist=1; qqmusic_fromtag=66; yplayer_open=0; ts_last=y.qq.com/n/yqq/song/002krvKI4Jgvq9.html","origin": "https://y.qq.com","referer": "https://y.qq.com/n/yqq/song/002krvKI4Jgvq9.html","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
}

jsond = {"nobase64": "1","musicid": item['data']['songid'],"-": "jsonp1","g_tk": "5381","loginUin": "0","hostUin": "0","format": "json","inCharset": "utf8","outCharset": "utf-8","notice": "0","platform": "yqq.json","needNewCode": "0"
}
r = requests.get("https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg", params=jsond, headers=headers)

json样式为：

3、将歌词存到文件test.txt里，用于读取。

4、逐行读取文件、构建要处理的数据字符串

5、jieba库、词云制作。

上爬虫代码：

# -*-coding:UTF-8 -*-

import json
import re
import requests

headers = {
    "authority": "c.y.qq.com",
    "method": "GET",
    "path": "/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=225716644&-=jsonp1&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0",
    "scheme": "https",
    "accept": "application/json, text/javascript, */*; q=0.01",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "zh-CN,zh;q=0.9",
    "cookie": "pgv_pvi=5936793600; pt2gguin=o1952436511; RK=g+4hNa7BQD; ptcz=653047c5b0174eb6b929c242110d08693b9dfcbaa701ddbf37ccc23c3366b94c; pgv_pvid=9049425500; ts_uid=9851761599; o_cookie=1952436511; tvfe_boss_uuid=5e81ff5fb8d5a1ea; yqq_stat=0; pgv_info=ssid=s484511232; ts_refer=ADTAGbaiduald; pgv_si=s21197824; yq_index=0; player_exist=1; qqmusic_fromtag=66; yplayer_open=0; ts_last=y.qq.com/n/yqq/song/002krvKI4Jgvq9.html",
    "origin": "https://y.qq.com",
    "referer": "https://y.qq.com/n/yqq/song/002krvKI4Jgvq9.html",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
}

jsonlist={
    "tpl":"3" ,
    "page": "detail",
    "date": "2019_02",
    "topid": "26",
    "type": "top",
    "song_begin": "0",
    "song_num": "100",
    "g_tk": "5381",
    "loginUin": "0",
    "hostUin": "0",
    "format": "json",
    "inCharset": "utf8",
    "outCharset": "utf-8",
    "notice": "0",
    "platform": "yqq.json",
    "needNewCode": "0"
}
r1 = requests.get("https://c.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg", params=jsonlist)
jlist = json.loads(r1.text)
f = open('test.txt', 'a+')
for item in jlist['songlist']:
    #print (str(item['data']['songid'])+" "+item['data']['songname'])
    jsond = {
        "nobase64": "1",
        "musicid": item['data']['songid'],
        "-": "jsonp1",
        "g_tk": "5381",
        "loginUin": "0",
        "hostUin": "0",
        "format": "json",
        "inCharset": "utf8",
        "outCharset": "utf-8",
        "notice": "0",
        "platform": "yqq.json",
        "needNewCode": "0"
    }
    r = requests.get("https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg", params=jsond, headers=headers)
    r.encoding = "utf-8"
    ch_pat = re.compile(r'[\u4e00-\u9fa5：]+')
    ch_words = ch_pat.findall(r.text)

first = 0
    for i in range(1, int(len(ch_words) / 2)):
        if ch_words[i].find('：') > 0:
            first = i
            break
    flag = first
    for i in range(first, int(len(ch_words) / 2)):
        if ch_words[i].find('：') < 0 and ch_words[i + 1].find('：') < 0 and ch_words[i + 2].find('：') < 0:
            flag = i
            break

#print(ch_words[flag:], "\n", flag)
    #strres = ','.join(ch_words[flag:])
    strquqita = ''
    for i in ch_words[flag:]:
        if i.find('：')<0:
            strquqita = strquqita+i+","
    #chuli = r.text.replace("&#32",'').replace('[:','').replace("]
",'')
    #f.write(codecs.BOM_UTF8)
    f.write(strquqita+"\n")
    print (strquqita)
f.close()
上词云代码

#-*-coding:UTF-8 -*-
import jieba
from wordcloud import WordCloud
f = open('test.txt', 'r+')
f.readline()
strchuli = ''
for i in f:strchuli = strchuli+i+"。"
wordlist = jieba.cut(strchuli, cut_all=False)
#print (len(list(wordlist)))
word_string = " ".join(wordlist)
wordcloud = WordCloud(font_path='C:\Windows\Fonts\simkai.ttf', background_color="white",width=1000, height=860, margin=2).generate(word_string)
import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
wordcloud.to_file('jieguo.png')

python爬虫爬取qq音乐巅峰榜热歌歌词，jieba中文分词，词云展示相关推荐

python爬虫爬取qq音乐热歌榜的歌曲到本地
文章目录项目目标具体实现步骤完整代码运行结果项目目标爬取qq音乐热歌榜https://y.qq.com/n/yqq/toplist/26.html到本地文件夹具体实现步骤程序思路:用s ...
python定时器爬取豆瓣音乐Top榜歌名
python定时器爬取豆瓣音乐Top榜歌名作者:vpoet 日期:大约在夏季注:这些小demo都是前段时间为了学python写的,现在贴出来纯粹是为了和大家分享一下 #coding=utf-8im ...
python爬虫爬取qq空间说说_用python爬取qq空间说说
环境:PyCharm+Chorme+MongoDB Window10 爬虫爬取数据的过程,也类似于普通用户打开网页的过程.所以当我们想要打开浏览器去获取好友空间的时候必定会要求进行登录,接着再是查看说 ...
python爬虫——爬取淘票票正在热映电影
今天正好学习了一下python的爬虫,觉得收获蛮大的,所以写一篇博客帮助想学习爬虫的伙伴们. 这里我就以一个简单地爬取淘票票正在热映电影为例,介绍一下一个爬虫的完整流程. 首先,话不多说,上干货--源 ...
爬取QQ音乐中一首歌的相关信息及评论（破解反爬虫、多协程队列爬虫）
刚写完这个实验作业,顺便来记录一下一些易错的地方: 目录一.页面预览二.其他信息三.一些小细节四.源码项目框图: 一.页面预览先从歌手页爬取到这首歌的相关信息,包括它歌曲.专辑的url.这 ...
[python爬虫]--爬取豆瓣音乐topX
最近在学习python爬虫,写出来的一些爬虫记录在csdn博客里,同时备份一个放在了github上. github地址:https://github.com/wjsaya/python_spider_ ...
python批量爬取QQ音乐歌手的歌曲等信息
import requests,openpyxl #导入模块 wb=openpyxl.Workbook() #创建工作薄 sheet=wb.active #获取工作薄的活动表 sheet.title= ...
如何用python爬虫爬取qq空间说说
之前学了下爬虫一直就想爬一下QQ空间在爬取之前需要做的准备工作安装python3 需要的库: re 正则 selenium 需要安装 chrome 或者 Firefox 还有他们的模拟 Chrom ...
Python爬虫爬取豆瓣TOP250和网易云歌单
python爬虫(网易云)笔记 @(python学习) 先推荐看一下b站的视频链接如下:https://www.bilibili.com/video/BV12E411A7ZQ?from=search& ...
python爬虫爬取音乐单曲_Python爬取qq音乐的过程实例
一.前言 qq music上的音乐还是不少的,有些时候想要下载好听的音乐,但有每次在网页下载都是烦人的登录什么的.于是,来了个qqmusic的爬虫.至少我觉得for循环爬虫,最核心的应该就是找到待爬元 ...

python爬虫爬取qq音乐巅峰榜热歌歌词，jieba中文分词，词云展示

python爬虫爬取qq音乐巅峰榜热歌歌词，jieba中文分词，词云展示相关推荐

最新文章

热门文章