将多个csv文件导入到pandas中并串联到一个DataFrame中

本文翻译自：Import multiple csv files into pandas and concatenate into one DataFrame

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. 我想将目录中的多个csv文件读入pandas，并将它们连接成一个大的DataFrame。 I have not been able to figure it out though. 我还无法弄清楚。 Here is what I have so far: 这是我到目前为止的内容：

import glob
import pandas as pd# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")dfs = []
for filename in filenames:dfs.append(pd.read_csv(filename))# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

I guess I need some help within the for loop??? 我想我在for循环中需要一些帮助吗？？？

#1楼

参考：https://stackoom.com/question/1PijC/将多个csv文件导入到pandas中并串联到一个DataFrame中

#2楼

If you have same columns in all your csv files then you can try the code below. 如果所有csv文件中的列均相同，则可以尝试以下代码。 I have added header=0 so that after reading csv first row can be assigned as the column names. 我添加了header=0以便在读取csv之后可以将第一行分配为列名。

import pandas as pd
import globpath = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")li = []for filename in all_files:df = pd.read_csv(filename, index_col=None, header=0)li.append(df)frame = pd.concat(li, axis=0, ignore_index=True)

#3楼

Edit: I googled my way into https://stackoverflow.com/a/21232849/186078 . 编辑：我用谷歌搜索https://stackoverflow.com/a/21232849/186078 。 However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too. 但是，最近我发现使用numpy进行任何操作，然后将其分配给数据框一次，而不是在迭代的基础上操纵数据框本身，这样更快，并且似乎也可以在此解决方案中工作。

I do sincerely want anyone hitting this page to consider this approach, but don't want to attach this huge piece of code as a comment and making it less readable. 我确实希望任何访问此页面的人都考虑采用这种方法，但又不想将这段巨大的代码作为注释附加到可读性较低的地方。

You can leverage numpy to really speed up the dataframe concatenation. 您可以利用numpy真正加快数据帧的连接速度。

import os
import glob
import pandas as pd
import numpy as nppath = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))np_array_list = []
for file_ in allFiles:df = pd.read_csv(file_,index_col=None, header=0)np_array_list.append(df.as_matrix())comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)big_frame.columns = ["col1","col2"....]

Timing stats: 时间统计：

total files :192
avg lines per file :8492
--approach 1 without numpy -- 8.248656988143921 seconds ---
total records old :1630571
--approach 2 with numpy -- 2.289292573928833 seconds ---

#4楼

An alternative to darindaCoder's answer : 替代darindaCoder的答案：

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independentdf_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

#5楼

If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below: 如果压缩了多个csv文件，则可以使用zipfile读取所有文件并进行如下连接：

import zipfile
import numpy as np
import pandas as pdziptrain = zipfile.ZipFile('yourpath/yourfile.zip')train=[]for f in range(0,len(ziptrain.namelist())):if (f == 0):train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))else:my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), columns=list(my_df.columns.values)))

#6楼

import glob, os
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

将多个csv文件导入到pandas中并串联到一个DataFrame中相关推荐

csv文件导入sqlite
由于初次使用SQLite,尝试把之前一个csv文件导进去,看了网上各种教程,大多是在SQLite shell模式下使用的,比较麻烦, 这里用了pandas,来讲csv文件导入到sqlite数据库中 i ...
收藏！用Python一键批量将任意结构的CSV文件导入MySQL数据库。
Python有很多库可以对CSV文件和Excel文件进行自动化和规模化处理.但是,使用数据库可以将计算机完成任务的能力提升成千上万倍! 那么问题来了,如果有很多个文件需要导入数据库,一个一个操作效率太 ...
mysql可视化导入csv文件_我们如何将数据从.CSV文件导入MySQL表？
实际上,CSV也是一个文本文件,其中的值由逗号分隔,换句话说,我们可以说该文本文件带有CSV(逗号分隔的值).在将数据从.CSV文件导入到MySQL表时,我们需要将FIELDS SEPARATED O ...
navicat 导入csv未响应_使用navicat将csv文件导入mysql
本文为大家分享了使用navicat将csv文件导入mysql的具体代码,供大家参考,具体内容如下 1.打开navicat,连接到数据库并找到自己想要导入数据的表.数据库表在指定数据库下的表下. 2.右 ...
python导入csv文件-Python从CSV文件导入数据和生成简单图表
原标题:Python从CSV文件导入数据和生成简单图表我们已经完成Python的基础环境搭建工作,现在我们尝试导入CSV数据我们准备一个csv测试数据,文件名是csv-test-data.csv数 ...
将csv文件导入到数据库中
1.csv文件简介 CSV全称 Comma Separated values,是一种用来存储数据的纯文本文件格式,通常用于电子表格或数据库软件.这样你就发现了,csv其实就是纯文本文件,可以使用记事本 ...
csv文件导入导出到mysql
为什么80%的码农都做不了架构师?>>> 1. 从mysql中导出csv文件 echo "select * from table into outfile '/tmp ...
csv导入mysql linux_如何将CSV文件导入MySQL表
如何将CSV文件导入MySQL表我有一个未规范化的事件-来自客户端的日记CSV,我试图将它加载到MySQL表中,以便将其重构为正常的格式.我创建了一个名为"CSVImport"的 ...
excel导入csv文件_如何将包含以0开头的列的CSV文件导入Excel
excel导入csv文件 Microsoft Excel will automatically convert data columns into the format that it thinks ...

将多个csv文件导入到pandas中并串联到一个DataFrame中

#1楼

#2楼

#3楼

#4楼

#5楼

#6楼

将多个csv文件导入到pandas中并串联到一个DataFrame中相关推荐

最新文章

热门文章