通过Python翻译word文档

翻译过程保留原文字样式，表格样式还待优化

import os
from docx.document import Document
import docx
from docx.oxml.table import CT_Tbl
from docx.oxml.text.paragraph import CT_P
from docx.table import _Cell, Table
from docx.text.paragraph import Paragraph
from urllib import request
from docx.oxml.ns import qn
import requests
import random
import json
from hashlib import md5
import time
import os
import globdef iter_block_items(parent):"""Yield each paragraph and table child within *parent*, in document order.Each returned value is an instance of either Table or Paragraph. *parent*would most commonly be a reference to a main Document object, butalso works for a _Cell object, which itself can contain paragraphs and tables."""if isinstance(parent, Document):parent_elm = parent.element.bodyelif isinstance(parent, _Cell):parent_elm = parent._tcelse:raise ValueError("something's not right")for child in parent_elm.iterchildren():if isinstance(child, CT_P):yield Paragraph(child, parent)elif isinstance(child, CT_Tbl):yield Table(child, parent)def make_md5(s,encoding='utf-8'):return md5(s.encode(encoding)).hexdigest()appid=''
appkey=''from_lang='en'
to_lang='zh'translate_api='http://api.fanyi.baidu.com'
path='/api/trans/vip/translate'
url=translate_api + path
headers={'Content-Type':'application/x-www-form-urlencoded'}def translate_content(query):if len(query) > 0:try:salt = random.randint(32768,65536)sign = make_md5(appid + query + str(salt) + appkey)payload = {'appid':appid,'q':query,'from':from_lang,'to':to_lang,'salt':salt,'sign':sign}r = requests.post(url,params=payload,headers=headers)result = r.json()translate_result = ''for res in result['trans_result']:translate_result += res['dst']return translate_resultexcept:return ''return ''def translate_paragraph(paragraph,wordfile_new):if not paragraph.text:wordfile_new.add_paragraph('\n')else:       size_list            = []bold_list            = []italic_list          = []color_list           = []highlight_color_list = []underline_list       = []strike_list          = []double_strike_list   = []substript_list       = []superstript_list     = []for run in paragraph.runs:size_list.append(run.font.size)    # 字体大小bold_list.append(run.font.bold)    # 是否加粗italic_list.append(run.font.italic)  # 是否斜体color_list.append(run.font.color.rgb) # 字体颜色highlight_color_list.append(run.font.highlight_color)  #字体高亮underline_list.append(run.font.underline) # 下划线strike_list.append(run.font.strike) # 删除线double_strike_list.append(run.font.double_strike) # 双删除线superstript_list.append(run.font.superscript) # 上标substript_list.append(run.font.subscript) # 下标p = wordfile_new.add_paragraph()p.paragraph_format.alignment = paragraph.paragraph_format.alignment   # 段落对齐方式p.paragraph_format.left_indent  = paragraph.paragraph_format.left_indent  # 左缩进p.paragraph_format.right_indent = paragraph.paragraph_format.right_indent  # 右缩进p.paragraph_format.first_line_indent = paragraph.paragraph_format.first_line_indent  # 首行缩进p.paragraph_format.line_spacing = paragraph.paragraph_format.line_spacing  # 行间距p.paragraph_format.space_before = paragraph.paragraph_format.space_before # 段前间距p.paragraph_format.space_after = paragraph.paragraph_format.space_after # 段后间距result = translate_content(paragraph.text)p.add_run(result)for run in p.runs:run.font.name = '微软雅黑'r = run._element.rPr.rFontsr.set(qn('w:eastAsia'),'微软雅黑')run.font.size            = max(size_list,key = size_list.count)run.font.bold            = max(bold_list,key = bold_list.count)run.font.italic          = max(italic_list,key=italic_list.count)run.font.color.rgb       = max(color_list,key=color_list.count)run.font.highlight_color = max(highlight_color_list,key=highlight_color_list.count)run.font.underline       = max(underline_list,key=underline_list.count)run.font.strike          = max(strike_list         ,key=strike_list.count)run.font.subscript       = max(substript_list      ,key=substript_list.count)run.font.superscript     = max(superstript_list    ,key=superstript_list.count)time.sleep(1)def get_table_max_cols(table):max_cols = 0for row in table.rows:temp_cols = len(row.cells)if temp_cols > max_cols:max_cols = temp_colsreturn max_colsdef translate_table(table,wordfile_new):new_table = wordfile_new.add_table(rows=len(table.rows), cols=get_table_max_cols(table),style='Light List Accent 1')row_index = 0for row in table.rows:row_data = []for cell in row.cells:row_data.append(cell.text)#    new_table.rows[row_index].cells[cell_index].text = translate_content(cell.text)merged_row_data = "|".join(row_data)translate_data = translate_content(merged_row_data)row_data.clear()row_data = translate_data.split('|')cell_index = 0for cell_text in row_data:new_table.rows[row_index].cells[cell_index].text = cell_textcell_index += 1row_index += 1time.sleep(1)def translate_file(file_dir_path):for file in glob.glob(file_dir_path + r'\*.docx'):wordfile = docx.Document(file)wordfile_new = docx.Document()sections = wordfile.sectionssec = sections[0]sections_new = wordfile_new.sectionssec_new = sections_new[0]sec_new.left_margin         = sec.left_marginsections_new.right_margin   = sec.right_marginsections_new.top_margin     = sec.top_marginsections_new.bottom_margin  = sec.bottom_marginsections_new.header_distance= sec.header_distancesections_new.footer_distance= sec.footer_distancesections_new.orientation    = sec.orientationsections_new.page_height    = sec.page_heightsections_new.page_width     = sec.page_widthfor block in iter_block_items(wordfile):if isinstance(block, Paragraph):translate_paragraph(block,wordfile_new)elif isinstance(block, Table):translate_table(block,wordfile_new)wordfile_new.save(file_dir_path + r'\\' + os.path.basename(file)[:-5] + 'translated.docx')

通过Python翻译word文档相关推荐

python操作word文档（python-docx）
python操作word文档(python-docx) 1. 效果图 1.1 python-docx文档标题段落(等级.加粗.斜体.居中)效果图 1.2 python-docx字体(加粗.斜体.居中. ...
Python 操作Word文档插入图片和表格实例演示
Python 操作Word文档插入图片和表格实例演示效果图实现过程 ① python-docx 库安装 ② word 文档插入图片演示 ③ word 文档插入表格演示 [ 文章推荐 ] Pytho ...
用python将word文档导入数据库_python读取word文档,插入mysql数据库的示例代码
表格内容如下: 1.实现批量导入word文档,取文档标题中的数字作为编号 2.除取上面打钩的内容需要匹配出来入库入库,其他内容全部直接入库mysql # wuyanfeng # -*- coding: ...
python操作word文档中的图片_Python操作word文档插入图片和表格的实例演示
前言P6Q免费资源网图片是Word的一种特殊内容,这篇文章主要介绍了关于Python操作word文档,向里面插入图片和表格的相关内容,下面话不多说了,来一起看看详细的代码P6Q免费资源网实例代码: ...
python 读取word_教你怎么使用 Python 对 word文档进行操作
使用Python对word文档进行操作一.安装Python-docx Python-docx是专门针对于word文档的一个模块,只能读取docx 不能读取doc文件.说白了,python就相当于wi ...
python获取word页数_使用Python的word文档的页数(Number of pages of a word document with Python)...
使用Python的word文档的页数(Number of pages of a word document with Python) 有没有办法用Python有效地获得word文档(.doc,.doc ...
怎么翻译Word文档？这里有Word文档翻译小妙招
Word文档你会翻译嘛?文字翻译对大家来说很简单,直接进行释义就好了,但是怎么翻译Word文档呢?今天小编就要带大家一起来了解下Word文档翻译的小妙招,感兴趣的不妨来看看,说不定哪天你真的会用到哦! ...
python读取word文档并做简单的批量文档筛选
python读取word文档并做简单的批量文档筛选最近参与了一项解析大量的word文档(试验报告形式)的工作,因为其中包含着一些对项目无意义的报告,所以要进行初步地筛选,通过查阅资料发现了pytho ...
Python读取Word文档段落或者表格
Python解析word文档 1 .安装并导依赖包 2.word的doc格式转docx格式 3.解析word_doc文档段落.表格内容 4.word读取表格存列表封装优化,节省读取时间 1 .安装并导 ...

通过Python翻译word文档

通过Python翻译word文档相关推荐

最新文章

热门文章