微博热词搜索并生成云词html（完整）

from urllib.parse import urlencode
import requests
from pyquery import PyQuery as pq
import time
import os
import csv
import json
import pandas as pd
#-----------------------------------判断是否存在文件，否则增加
if os.path.exists('微博热词.csv'):os.remove('微博热词.csv')
if os.path.exists('微博热词.txt'):os.remove('微博热词.txt')
try:f =open("停用词库.txt",'r')f.close()
except IOError:f = open("停用词库.txt",'w')f.close()try:f =open("分词词典.txt",'r',encoding='utf-8')f.close()
except IOError:f = open("分词词典.txt",'w',encoding='utf-8')f.close()#-----------------------------------判断是否存在文件，否则增加base_url = 'https://m.weibo.cn/api/container/getIndex?'headers = {'Host': 'm.weibo.cn','Referer': 'https://m.weibo.cn/u/2830678474','User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36','X-Requested-With': 'XMLHttpRequest',
}
class SaveCSV(object):def save(self, keyword_list,path, item):"""保存csv方法:param keyword_list: 保存文件的字段或者说是表头:param path: 保存文件路径和名字:param item: 要保存的字典对象:return:"""try:# 第一次打开文件时，第一行写入表头if not os.path.exists(path):with open(path, "w", newline='', encoding='utf-8') as csvfile:  # newline='' 去除空白行writer = csv.DictWriter(csvfile, fieldnames=keyword_list)  # 写字典的方法writer.writeheader()  # 写表头的方法# 接下来追加写入内容with open(path, "a", newline='', encoding='utf-8') as csvfile:  # newline='' 一定要写，否则写入数据有空白行writer = csv.DictWriter(csvfile, fieldnames=keyword_list)writer.writerow(item)  # 按行写入数据print("^_^ write success")except Exception as e:print("write error==>", e)# 记录错误数据with open("error.txt", "w") as f:f.write(json.dumps(item) + ",\n")passdef get_page(page,title): #得到页面的请求，params是我们要根据网页填的，就是下图中的Query String里的参数params = {'containerid': '100103type=1&q='+title,'page': page,#page是就是当前处于第几页，是我们要实现翻页必须修改的内容。'type':'all','queryVal':title,'featurecode':'20000320','luicode':'10000011','lfid':'106003type=1','title':title}url = base_url + urlencode(params)print(url)try:response = requests.get(url, headers=headers)if response.status_code == 200:print(page)return response.json()except requests.ConnectionError as e:print('Error', e.args)# 解析接口返回的json字符串
def parse_page(json , label):res = []if json:items = json.get('data').get('cards')for i in items:if i == None:continueitem = i.get('mblog')if item == None:continueweibo = {}weibo['id'] = item.get('id')weibo['label'] = labelweibo['text'] = pq(item.get('text')).text().replace(" ", "").replace("\n" , "")res.append(weibo)return resif __name__ == '__main__':title = input("请输入搜索关键词：")path = "微博热词.csv"item_list = ['id','text', 'label']s = SaveCSV()for page in range(10,20):#循环页面try:time.sleep(1)         #设置睡眠时间，防止被封号json = get_page(page , title )results = parse_page(json , title)if requests == None:continuefor result in results:if result == None:continueprint(result)s.save(item_list, path , result)except TypeError:print("完成")continue#数据转txt------------------------------------
data = pd.read_csv('微博热词.csv', encoding='utf-8')
with open('微博热词.txt','a+', encoding='utf-8') as f:for line in data.values:f.write((str(line[0])+'\t'+str(line[1])+'\n'))#---------------------------------词云图
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author LQ6Himport jieba
from collections import Counter
import pyecharts.options as opts
from pyecharts.charts import WordClouddef get_text():# f=open("text1.txt",encoding="gbk")f = open("微博热词.txt", encoding="utf-8")lines=f.read()text=lines.split("\n\n")return "".join(text)def split_word(text):jieba.load_userdict("分词词典.txt")# word_list=list(jieba.cut_for_search(text))#精准模式后搜素模式word_list = list(jieba.cut(text))#默认进准模式with open("停用词库.txt") as f:meaningless_word=f.read().splitlines()result=[]for i in word_list:if i not in meaningless_word:result.append(i.replace(" ",""))return resultdef word_counter(words):words_counter=Counter(words)words_list=words_counter.most_common(101)#排名前Nfc = words_listwith open('分词排名.txt', 'a+', encoding='utf-8') as f:for line in fc:f.write((str(line[0]) + '\t' + str(line[1]) + '\n'))return words_listdef word_cloud(data):(WordCloud().add(series_name="热词分析",data_pair=data,word_gap=5,word_size_range=[20,100],#词的大小[20,500]shape="",rotate_step=90,# width=2000,# height=1000# mask_image="书.jpg").set_global_opts(title_opts=opts.TitleOpts(title="热词分析",title_textstyle_opts=opts.TextStyleOpts(font_size=23)),tooltip_opts=opts.TooltipOpts(is_show=True),).render("微博热词.html"))def main():text=get_text()words=split_word(text)data=word_counter(words)word_cloud(data)if __name__ == '__main__':main()

最终效果
数据整理

分词计数

分词生成txt

通过‘’分词词典‘’自定义词语，通过停用词库，取消词统计。（每行一个数据）

微博热词搜索并生成云词html（完整）相关推荐

文本挖掘之WordCloud+Python3快速生成中英文词云图
引言: "词云",又称文字云,是由词汇组成类似云的彩色图形.可对网络文本中出现频率较高的"关键词"予以视觉上的突出,形成"关键词云层"或&q ...
android中的热词搜索的实现
热词搜索的实现方法其实就是平常SDK项目中view的实现方法,根据实际的需求新建出相应个数的textview. 首先通过createTextView方法实现textview的创建,代码如下: /** ...
python生成中文词云的代码_[python] 基于词云的关键词提取：wordcloud的使用、源码分析、中文词云生成和代码重写...
1. 词云简介词云,又称文字云.标签云,是对文本数据中出现频率较高的"关键词"在视觉上的突出呈现,形成关键词的渲染形成类似云一样的彩色图片,从而一眼就可以领略文本数据的主要表达意 ...
python删除中文停用词_python词云 wordcloud+jieba生成中文词云图
简介 Python+jieba+wordcloud+txt+gif生成动态中文词云本文基于爬虫爬取某微信号三个月的文章为例,展示了生成中文词云的完整过程.本文需要的两个核心Python类库: jie ...
封装汉语自然语言处理中的常用方法(附代码:生成中文词云)
前叙该文章写作共花费二十分钟,阅读只需要七分钟左右,读完该文章后,你将学会使用少量代码,将中文小说,中文新闻,或者其他任意一段中文文本生成词云图背景在进行汉语自然语言处理时候,经常使用的几个方法 ...
Python快速简单生成矩形词云
效果实现打开IDLE新建文件rectangle.py import os from os import path from wordcloud import WordCloud from matp ...
python中文词云生成_Python 词云生成
图片来自网络所谓"词云"就是对网络文本中出现频率较高的"关键词"予以视觉上的突出,形成"关键词云层"或"关键词渲染",从 ...
python生成的词云没有图案_还在为专栏封面发愁？我用Python写了个词云生成器！...
妈妈再也不用担心我写专栏找不到合适的封面了!B站专栏的封面至少是我一直头疼的问题,每次写完文章却找不到合适的图片作为封面. 词云是一个很不错的选择,既美观,又提纲挈领.网上也有词云生成的工具,但大多收 ...
jieba库词频统计_用jieba库统计文本词频及云词图的生成
一.安装jieba库 :\>pip install jieba #或者 pip3 install jieba 二.jieba库解析 jieba库主要提供提供分词功能,可以辅助自定义分词词典. j ...
python生成的词云没有图案_Python如何生成词云的方法
这篇文章主要介绍了关于Python如何生成词云的方法,有着一定的参考价值,现在分享给大家,有需要的朋友可以参考一下前言今天教大家用wrodcloud模块来生成词云,我读取了一篇小说并生成了词云,先 ...

微博热词搜索并生成云词html（完整）

微博热词搜索并生成云词html（完整）相关推荐

最新文章

热门文章