


pip install python_docx

(注意:不是pip install docx  ! docx也可以安装,但总是报错,缺少exceptions,无法导入)

接下来就可以用Python_docx 来读取word文本了。


import docx
from docx import Document
path = "C:\\Users\\Administrator\\Desktop\\word.docx"
document = Document(path)
for paragraph in document.paragraphs:print(paragraph.text)




import os
import docx
for filename in os.listdir(os.getcwd()):if filename.endswith('.doc'):print(filename[:-4])doc = docx.Document(filename[:-4]+".docx")for para in doc.paragraphs:print (para.text)

结果报错:docx.opc.exceptions.PackageNotFoundError: Package not found。还是无法识别doc


# Document 还有添加标题、分页、段落、图片、章节等方法,说明如下  |  add_heading(self, text='', level=1)|      Return a heading paragraph newly added to the end of the document,|      containing *text* and having its paragraph style determined by|      *level*. If *level* is 0, the style is set to `Title`. If *level* is|      1 (or omitted), `Heading 1` is used. Otherwise the style is set to|      `Heading {level}`. Raises |ValueError| if *level* is outside the|      range 0-9.|  |  add_page_break(self)|      Return a paragraph newly added to the end of the document and|      containing only a page break.|  |  add_paragraph(self, text='', style=None)|      Return a paragraph newly added to the end of the document, populated|      with *text* and having paragraph style *style*. *text* can contain|      tab (``\t``) characters, which are converted to the appropriate XML|      form for a tab. *text* can also include newline (``\n``) or carriage|      return (``\r``) characters, each of which is converted to a line|      break.|  |  add_picture(self, image_path_or_stream, width=None, height=None)|      Return a new picture shape added in its own paragraph at the end of|      the document. The picture contains the image at|      *image_path_or_stream*, scaled based on *width* and *height*. If|      neither width nor height is specified, the picture appears at its|      native size. If only one is specified, it is used to compute|      a scaling factor that is then applied to the unspecified dimension,|      preserving the aspect ratio of the image. The native size of the|      picture is calculated using the dots-per-inch (dpi) value specified|      in the image file, defaulting to 72 dpi if no value is specified, as|      is often the case.|  |  add_section(self, start_type=2)|      Return a |Section| object representing a new section added at the end|      of the document. The optional *start_type* argument must be a member|      of the :ref:`WdSectionStart` enumeration, and defaults to|      ``WD_SECTION.NEW_PAGE`` if not provided.|  |  add_table(self, rows, cols, style=None)|      Add a table having row and column counts of *rows* and *cols*|      respectively and table style of *style*. *style* may be a paragraph|      style object or a paragraph style name. If *style* is |None|, the|      table inherits the default table style of the document.|  |  save(self, path_or_stream)|      Save this document to *path_or_stream*, which can be eit a path to|      a filesystem location (a string) or a file-like object.



