python各种文件数据的读取

(持续更新中…)

文章目录

- (持续更新中...)
0 常规方法open
- 0.1 读取时存在中文无法识别
- 0.2 写入，写入中文
1.读取excel文件
- 一）python读取：
2.读取csv文件
- 一）python读取：
2.读取txt文件

0 常规方法open

0.1 读取时存在中文无法识别

关键因素：

以二进制读取，‘rb’
以utf-8解码
通过open可以读取写入，参考open读取写入
open语法：

open(name[, mode[, buffering]])

参数描述如下

以下举例说明：
举例：现在有一个test.txt文件,内容如下：

分别使用上面的三个函数打印：
1），使用read()—读取全部文件内容
返回字符串类型

f = open('new/test.txt','rb')
a = f.read()
print(a)
print(a.decode('utf-8'))

注意，需要是用utf-8才能打印出来想要的结果，

2），使用readline()----只读取一行
返回字符串类型

f = open('new/test.txt','rb')
a = f.readline()
print(a)
print(a.decode('utf-8'))

同理需要使用utf-8

3），使用readlines()—一行行读取文件
返回类型是列表，每行作为列表的一个元素（元素都是字符串）存放在list中

f = open('new/test.txt','rb')
a = f.readlines()
for i in a:print(i.decode('utf-8'))

0.2 写入，写入中文

1）写入英文
这里使用追加和读写模式a+，这样增加的内容会在最后，同时我使用了\n表示换一行再写。如果是w或者r+模式则在首行开始写

f = open('new/test.txt','a+')
f.write('\nabcd')
f.close()
f = open('new/test.txt','rb')
w=f.read().decode('utf-8')
print(w)

2）写入中文
需要加入编码encoding=‘utf-8’

f = open('new/test.txt','a+',encoding='utf-8')
f.write('\n今天天气很好')
f.close()
f = open('new/test.txt','rb')
w=f.read().decode('utf-8')
print(w)

3）从文件A读取写入到文件B
有一个名为bibtex.bib的文件，内容如下：

@article{Akrami:2018mcd,author         = "Akrami, Y. and others",title          = "{Planck 2018 results. IV. Diffuse component separation}",collaboration  = "Planck",year           = "2018",eprint         = "1807.06208",archivePrefix  = "arXiv",primaryClass   = "astro-ph.CO",SLACcitation   = "%%CITATION = ARXIV:1807.06208;%%"
}

现在要提取‘author’,‘title’,'year’和’eprint’到文件test.txt中

f = open(r'demo1\bibtex.bib','r+')  #阅读形式打开文件
f_list = f.readlines()      #list方式缓存每一行
f.close()
save_f = open(r'demo1\test.txt','w+') #打开需要存入数据的文件
vocab = ['author', 'title', 'year','eprint']
#可以使用列表解析或者两层for循环
# [save_f.write(i) for i in f_list for x in vocab if x in i]
for i in f_list:for x in vocab:if x in i:# print(i.decode('utf-8'))save_f.write(i)# save_f.close()#测试一下保存的文件
test_f = open(r'demo1/test.txt','rb')
print(test_f.read().decode('utf-8'))

读取的test.txt文件如下：

      author         = "Akrami, Y. and others",title          = "{Planck 2018 results. IV. Diffuse component separation}",year           = "2018",eprint         = "1807.06208",

在此强调一下，列表解析可能比for循环快点儿，但并没有太大的提高，如stackoverflow说的，使用c会更好，比如列表解析提高15%的话，c可以有300%

1.读取excel文件

已知有个名为student_score.xlsx的文件，现需要读取里面的文件

一）python读取：

a)使用xlrd库函数

import numpy as np
import xlrd   #使用库函数workbook = xlrd.open_workbook('C:/users/lenovo/desktop/student_score.xlsx')  #读取路径
sheet = workbook.sheet_by_name('Sheet1')     #读取excel中的第一个sheetdata_name = sheet.col_values(0)    #按列读取，读取第一列
#data_name1 = sheet.row_values(0)  #按行读取，读取第一行
data_st_ID = sheet.col_values(1)
data_st_score = sheet.col_values(2)

结果如下

2.读取csv文件

csv文件是逗号隔开的文件，比如将上面的excel文件另存为csv文件然后通过下面的方式打开,csv通过记事本打开如下

一）python读取：

可以通过with open ‘xxx’ as的方式也可以直接open

a)with open (‘xxx’) as f

with open('C:/users/lenovo/desktop/student_score.csv','r') as f:for line in f.readlines(): #逐行读取print(line)

b) open(‘xxx’)

filename = open('C:/users/lenovo/desktop/student_score.csv','r')
for line in filename:print(line)

2.读取txt文件

numpy.loadtxt

dataset = np.loadtxt('路径')

通过with open

一次性读完

with open('my_file.txt') as file_object:contents = file_object.read()  #一次性全读print(contents)

逐行读取

with open('my_file.txt') as f:for line in f:      #逐行读取print(line.strip())  #使用strip删除空格和空行，否则会有\n在最后