bulk insert

http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

MySQL的批插入 BULK INSERT和load data的速度差不多,并且可靠。

语法如下

假设有表test (ID NUMBER,NAME VARCHAR(10))

insert into test values(1,'aa'),(2,'bb'),.....(100000,'bb');

也就是每个记录间用逗号隔开,最后用分号结束,插入10万条记录只需不到10秒

Multiple Inserts with MySQL

JUNE 10, 2005

Andy Jarrett posted this technique for insterting multiple rows (bulk insert) in one SQL statement a few months ago. It's a handy trick:

INSERT INTO x (a,b)
VALUES ('1', 'one'),('2', 'two'),('3', 'three')

I tried that on a couple hundred thousand rows today, and I got an error that my statement was bigger than the Max Allowed Packet size in MySQL. So keep that in mind when using. You can either change the setting, or go about it a different way.

http://metabetageek.com/2010/02/04/learning-mysql-find-in-set-and-bulk-insert-options/

Bulk INSERTs FTW

A short while ago, I had to research some API for a company I’m consulting for. This API yields very good quality data, but isn’t comfortable enough to process it for further research.
The obvious solution was to dump this data into some kind of database, and process it there.
Our first attempt was pickle files. It worked nicely enough, but when the input data was 850 megs, it died horribly with a memory error.

(It should be mentioned that just starting to work with the API costs about a 1.2 gigs of RAM.)

Afterwards, we tried sqlite, with similar results. After clearing it of memory errors, the code (sqlite + sqlalchemy + our code) was still not stable, and apart from that, dumping the data took too much time.

We decided that we needed some *real* database engine, and we arranged to get some nice sql-server with plenty of RAM and CPUs. We used the same sqlalchemy code, and for smaller sized inputs (a few megs) it worked very well. However, for our real input the processing, had it not died in a fiery MemoryError (again!) would have taken more than two weeks to finish.

(As my defense regarding the MemoryError I’ll add that we added an id cache for records, to try and shorten the timings. We could have avoided this cache and the MemoryError, but the timings would have been worse. Not to mention that most of the memory was taken by the API…)

At this point, we asked for help from someone who knows *a little bit* more about databases than us, and he suggested bulk inserts.

The recipe is simple: dump all your information into a csv file (tabs and newlines as delimiters).
Then do BULK INSERT, and a short while later, you’ll have your information inside.
We implemented the changes, and some tens of millions of records later, we had a database full of interesting stuff.

My suggestion: add FTW as a possible extension for the bulk insert syntax. It won’t do anything, but it will certainly fit.

Tags: bulk-insert, Databases, Python, sql-server, sqlalchemy, sqlite

Related posts:

  • One-liner Guitar Tuner in Python
  • Visualizing Data Using the Hilbert Curve
  • Fuzz-Testing With Nose
  • Pyweb-il Presentation on Optimization Slides
  • Call for Volunteers: Open Knesset – oknesset.org

This entry was posted on Friday, July 17th, 2009 at 7:23 am and is filed under Databases, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Bulk INSERTs FTW”

  1. budowski Says: 
    July 17th, 2009 at 9:04 pm

    In your case (really large CSV files), the BULK INSERT process probably takes up a lot of time.
    You could speed up the process by hundreds of percents – instead of using CSV files, you could use a binary dump, where each record and column is saved in a binary format.
    The BULK INSERT command is given a format file (bcp format file – FORMATFILE parameter), that defines how each record looks like.

    I know that in a project I worked on, it dramatically increased processing speed (we processed tens of GBs of information).

    More information on the bcp format files:
    http://msdn.microsoft.com/en-us/library/ms191516.aspx
    http://msdn.microsoft.com/en-us/library/ms191479.aspx

    (Note: BULK INSERT can be also be executed using the bcp command line utility)

posted on 2010-10-06 23:35 lexus 阅读(...) 评论(...) 编辑 收藏

转载于:https://www.cnblogs.com/lexus/archive/2010/10/06/1844866.html

bulk insert相关推荐

  1. bulkwrite 批量插入_SQL SERVER 使用BULK Insert将txt文件中的数据批量插入表中(1)

    1/首先建立数据表 CREATE TABLE BasicMsg ( RecvTime FLOAT NOT NULL , --接收时间,不存在时间相同的数据 AA INT NOT NULL, --24位 ...

  2. navicat使用查询向表中插入一行记录_SQL--每日一解------Bulk Insert 快速插入

    SQL INSERT INTO 语句 INSERT INTO 语句用于向表中插入新记录. SQL INSERT INTO 语法 INSERT INTO 语句可以有两种编写形式. ① 第一种形式无需指定 ...

  3. SSIS常用的包—大量插入任务(Bulk Insert task)

    大量插入任务允许像BULK INSERT语句或者bcp.exe命令行工具一样从txt文件(也叫做平面文件)中插入数据.这个task工具箱中的Control Flow Items中,它不会产生数据流.这 ...

  4. sql server 2005 T-SQL BULK INSERT (Transact-SQL)

    以用户指定的格式将数据文件导入数据库表或视图. Transact-SQL 语法约定 语法 BULK INSERT [ database_name . [ schema_name ] . | schem ...

  5. azure 导入 bak_如何使用BULK INSERT在本地和Azure中导入数据

    azure 导入 bak 介绍 (Introduction) BULK INSERT is a popular method to import data from a local file to S ...

  6. 用BULK INSERT命令导入数据详解

    转载而来.来源已经不清楚了. 如果你从事与数据库相关的工作,有可能会涉及到将数据从外部数据文件插入倒SQL Server的操作.本文将为大家演示如何利用BULK INSERT命令来导入数据,并讲解怎样 ...

  7. MyBatis批量插入(sqlserver BULK INSERT)

    MyBatis批量插入: 1. foreach方式 2.sqlsession + sqlsession.flushStatements方式: //        SqlSession sqlSessi ...

  8. 使用BULK INSERT高效导入大量数据到SQL Server数据库

    源数据 (文本文件) 下载了大量的股票历史数据, 都是文本格式的: 每个文件第一行包含股票代码, 股票名称, 数据类型. 第二行是数据列的名称: 数据表 在数据库中新建了一个数据表TestStock, ...

  9. BULK INSERT如何将大量数据高效地导入SQL Server

    转载自:http://database.51cto.com/art/201108/282631.htm BULK INSERT如何将大量数据高效地导入SQL Server 本文我们详细介绍了BULK ...

  10. BULK INSERT如何将大量数据高效地导入SQL Server(转)

    在实际的工作需要中,我们有时候需将大量的数据导入到数据库中.这时候我们不得不考虑的就是效率问题.本文我们就介绍了一种将大量数据高效地导入SQL Server数据库的方法,该方法是使用BULK INSE ...

最新文章

  1. 零起点学算法02——输出简单的句子
  2. 写有效率的SQL查询(IV)
  3. 怎样才能快速批量绑定MAC与IP地址
  4. android点击弹出滑动条,IndicatorSeekBar Android自定义SeekBar,滑动时弹出气泡指示器显示进度...
  5. C语言在二叉搜索树找到第k个最小元素(附完整源码)
  6. ubuntu系统下安装docker并部署Springboot+mysql+redis
  7. linux ext4的块大小,linux – ext3 / ext4物理块大小视图
  8. Scala:函数和闭包
  9. PHP unicode与普通字符串的相互转化
  10. 采用计算机对酒店客房进行管理,酒店客房部计算机管理.doc
  11. 前端html项目总结,前端实习项目总结一
  12. 2020年最新Django经典面试问题与答案汇总(上)-大江狗整理
  13. 晶体管放大电路之应用
  14. Android AVD 存放路径修改
  15. 加密狗只是开始,区块链+文娱才是大趋势
  16. C语言程序怎么解决数独,数独解法解决方法
  17. 2022春季数据结构期中考试总结
  18. 业务流程再造理论的起源、演进及发展趋势
  19. 创建表 编码_创建没有编码的专业商业网站
  20. android 高德地图方向指向不变问题分析

热门文章

  1. Matlab深度学习上手初探
  2. wallys/IPQ4019/IPQ4029/Access Point Wireless Module Dual band 11AC Wave2 Module
  3. 开篇之作,什么是云原生,云原生技术为什么这么火?
  4. Java速成:Boot入门
  5. Zcash — 完全隐匿货币流向
  6. 【Axure教程】滑动输入元件
  7. 软件工程导论患者监护系统
  8. SQL2008卸载。
  9. 一种基于复制粘贴的cam350邮票孔拼版教程(三) 导出钻孔文件
  10. 网页导出的excel无法计算机,网页上不能导出excel表格数据-如何将网页表格导出到excel...