python 合并word文件,在Python上的WordCloud中，我想合并两种语言

In WordCloud on Python I would like to merge two languages into one picture (English, Arabic) but I was unable to add the Arabic language as you see a squares instead of words and when I call the Arabic_reshaper library and make it read the csv file It shows me the Arabic language and make the English language as a squares

wordcloud = WordCloud(

collocations = False,

width=1600, height=800,

background_color='white',

stopwords=stopwords,

max_words=150,

random_state=42,

#font_path='/Users/mac/b.TTF'

).generate(' '.join(df['body_new']))

print(wordcloud)

plt.figure(figsize=(9,8))

fig = plt.figure(1)

plt.imshow(wordcloud)

plt.axis('off')

plt.show()

解决方案

I've been struggling with the same problem for a while now and the best way to deal with it is the generate_from_frequencies() function. You also need a proper font for Arabic. 'Shorooq' will work fine and available online for free. Here is a quick fix to your code:

from arabic_reshaper import arabic_reshaper

from bidi.algorithm import get_display

from nltk.corpus import stopwords

from itertools import islice

text = " ".join(line for lines in df['body_new'])

stop_ar = stopwords.words('arabic')

# add more stop words here like numbers, special characters, etc. It should be customized for your project

top_words = {}

words = text.split()

for w in words:

if w in stop_ar:

continue

else:

if w not in top_words:

top_words[w] = 1

else:

top_words[w] +=1

# Sort the dictionary of the most frequent words

top_words = {k: v for k, v in sorted(top_words.items(), key=lambda item: item[1], reverse = True)}

# select the first 150 most frequent words

def take(n, iterable):

"Return first n items of the iterable as a list"

return list(islice(iterable, n))

for_wc = take(150, top_words.items())

# you need to reshape your words to be shown properly and turn the result into a dictionary

dic_data = {}

for t in for_wc:

r = arabic_reshaper.reshape(t[0]) # connect Arabic letters

bdt = get_display(r) # right to left

dic_data[bdt] = t[1]

# Plot

wc = WordCloud(background_color="white", width=1600, height=800,max_words=400, font_path='fonts/Shoroq.ttf').generate_from_frequencies(dic_data)

plt.figure(figsize=(16,8))

plt.imshow(wc, interpolation='bilinear')

plt.axis("off")

plt.show()

Important:

get_display() or reshape() might give you error. It is because there is a weird character in your text that these functions are unable to deal with. However finding it should not be so difficult as you only use 150 words to display in your plot. Find it and add it to your Stop Words and rerun the code.

python 合并word文件,在Python上的WordCloud中，我想合并两种语言相关推荐

python去重复排序_Python实现删除排序数组中重复项的两种方法示例
本文实例讲述了Python实现删除排序数组中重复项的两种方法.分享给大家供大家参考,具体如下: 对于给定的有序数组nums,移除数组中存在的重复数字,确保每个数字只出现一次并返回新数组的长度注意:不 ...
sh执行文件参数传递_详解shell中脚本参数传递的两种方式
方式一:$0,$1,$2.. 采用$0,$1,$2..等方式获取脚本命令行传入的参数,值得注意的是,$0获取到的是脚本路径以及脚本名,后面按顺序获取参数,当参数超过10个时(包括10个),需要使用${ ...
批量改变文件夹和子文件夹中图片格式的两种方法
生活中,我们通常会遇到这种问题:一个文件夹内部有多个子文件夹,每个文件夹内部有很多图片,我们想改变这些图片的格式. 例如.png格式图片支持背景透明,但我们想把它变成背景不透明的.jpg图片.又比如. ...
python获取word页数_python,_如何在 Linux 上使用 Python 读取 word 文件信息（如页数）？，python - phpStudy...
如何在 Linux 上使用 Python 读取 word 文件信息(如页数)? R.T. doc 是二进制文件,Python 如何进行读取呢? .docx 可用 python-docx 读取,但如何读 ...
Python检查Word文件中包含特定关键字的所有页码
推荐教材:<Python程序设计基础与应用>(ISBN:9787111606178),董付国,机械工业出版社图书详情: 配套资源: 用书教师可以联系董老师获取教学大纲.课件.源码.教案. ...
python入门教程2word-入门干货：Python操作Word文件经验分享
原标题:入门干货:Python操作Word文件经验分享导读:Microsoft Word在当前使用中是占有巨大优势的文字处理器,这使得Word专用的档案格式Word 文件(.docx)成为事实上最通 ...
使用python读取word文件里的表格信息
在企查查查询企业信息的时候,得到了一些word文件,里面有些控股企业的数据放在表格里,需要我们将其提取出来. word文件看起来很复杂,不方便进行结构化.实际上,一个word文档中大概有这么几种类型的 ...
Python提取Word文件中的目录标题保存为Excel文件
from docx import Document from openpyxl import Workbook from openpyxl.styles import Alignment, Borde ...
用Python将word文件转换成html（转）
用Python将word文件转换成html 序最近公司一个客户大大购买了一堆医疗健康方面的科普文章,希望能放到我们正在开发的健康档案管理软件上.客户大大说,要智能推送!要掌握节奏!要深度学习!要让用 ...

python 合并word文件,在Python上的WordCloud中，我想合并两种语言

python 合并word文件,在Python上的WordCloud中，我想合并两种语言相关推荐

最新文章

热门文章