Java – Reading a Large File Efficiently--转
原文地址:http://www.baeldung.com/java-read-lines-large-file
1. Overview
This tutorial will show how to read all the lines from a large file in Java in an efficient manner.
This article is part of the “Java – Back to Basic” tutorial here on Baeldung.
2. Reading In Memory
The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:
1
|
Files.readLines( new File(path), Charsets.UTF_8);
|
1
|
FileUtils.readLines( new File(path));
|
The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.
For example – reading a ~1Gb file:
1
2
3
4
5
|
@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
String path = ...
Files.readLines( new File(path), Charsets.UTF_8);
}
|
This starts off with a small amount of memory being consumed: (~0 Mb consumed)
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb
|
However, after the full file has been processed, we have at the end: (~2 Gb consumed)
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb
|
Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.
It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.
What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.
3. Streaming Through the File
Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
FileInputStream inputStream = null ;
Scanner sc = null ;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8" );
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null ) {
throw sc.ioException();
}
} finally {
if (inputStream != null ) {
inputStream.close();
}
if (sc != null ) {
sc.close();
}
}
|
This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory: (~150 Mb consumed)
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb
|
4. Streaming with Apache Commons IO
The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:
1
2
3
4
5
6
7
8
9
|
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8" );
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
LineIterator.closeQuietly(it);
}
|
Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)
1
2
|
[main] INFO o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb
|
5. Conclusion
This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.
The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.
转载于:https://www.cnblogs.com/davidwang456/p/4766726.html
Java – Reading a Large File Efficiently--转相关推荐
- 【MRT报错问题】Error: ReadParameterFile : Reading Input Parameter File
盘点那些使用MRT过程中的 " 不走心" 带来的ERROR ~~知错能改,善莫大焉~~~ 疑难杂症浅录 ERROR 1: 执行投影时,读参数文件报错 MODIS Reproject ...
- Loading Large Bitmaps Efficiently(官方文档)
Displaying Bitmaps Efficiently系列 Loading Large Bitmaps Efficiently Processing Bitmaps Off the UI Thr ...
- Read a large file with python
python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...
- Java删除文件(delete file in java)
Java中,可用File.delete()删除一个文件,调用该方法后将返回一个布尔类型的值,true表示删除成功,false则表示删除失败. 本篇文章,将删除"H:\\temp\\style ...
- -bash: /tyrone/jdk/jdk1.8.0_91/bin/java: cannot execute binary file
问题描述:今天在linux环境下安装了一下JDK,安装成功后,打算输入java -version去测试一下,结果却出错了. 错误信息:-bash: /tyrone/jdk/jdk1.8.0_91/bi ...
- 80m的mysql文件要导入多久_mysql导入数据库文件最大限制更改解决方法:You probably tried to upload too large file...
最近一次在用phpmyadmin导入mysql数据库时,偶的15M的数据库不能导入,mysql数据库最大只能导入2M.. phpmyadmin数据库导入出错: You probably tried t ...
- java: cannot execute binary file错误
http://everlook.iteye.com/blog/1568886 tomcat报错: /data/cmsolr/tomcat-solr-bid/bin/catalina.sh: line ...
- windows安装 Git Large File Storage大文件下载工具ge
下载地址 导航到 git-lfs.github.com 并单击Download开始下载 git-lfs的用法指南 验证安装成功 打开Git Bash 验证安装成功,使用命令 git lfs insta ...
- java: cannot execute binary file 如果遇到这个错,一般是操作系统位数出问题了。
[root@testserver usr]# java/jdk1.6.0_12/bin/java -bash: java/jdk1.6.0_12/bin/java: cannot execute bi ...
最新文章
- 【linux】Valgrind工具集详解(七):Memcheck(内存错误检测器)
- 可视化深度学习模型的训练误差和验证误差
- mysql 参数sql文件_为MySQL的source命令导入SQL文件配置参数
- 简易灯箱画廊设计html,原生Js实现的画廊功能
- 归并排序的基本原理及实现
- 工艺流程计算机仿真设计,450kt/a合成氨反应器及生产过程计算机仿真设计
- VTK:几何对象之OpenVROrientedArrow
- python的scikit-learn算法库实现
- c 给定字符串中查找_面试 | 查找类算法精析
- Flask爱家租房--celery(发送验证短信)
- matlab 中 Transform,变换数据存储 - MATLAB transform
- MathWorks 中国
- python遍历获取一个类的所有子类
- 【实例分割论文】SOLOv2: Dynamic, Faster and Stronger
- git学习笔记-(9-高层命令-分支基础)
- Linux中句柄是什么?
- HTML(一)静态登录注册页面附有完整网页(html+css+js)
- DVWA之暴力破解漏洞
- uniapp跳转微信客服总结及报错 (deeplink customerservice no permission)
- MANIFEST.MF文件详解
- [洛谷3041]视频游戏的连击Video Game Combos
热门文章
- 10kv电压互感器型号_35kV、10kV母线电压异常处理
- matlab 蜂窝网格,blender怎么制作蜂巢网格 蜂窝式网格画法
- matlab 随机森林算法_(六)如何利用Python从头开始实现随机森林算法
- python画图如何调整图例位置_matplotlib中legend位置调整解析
- java的输出的例子_Java例子:万年历的输出
- pve虚拟机导入gho_用vmware安装gho文件心得
- mysql php 入门_第一节 数据库概述_MySQL_php入门教程
- springboot session默认失效时间_Spring Boot 整合 Redis,用起来真简单
- java long 对应mybati类型_修改 mybatis-generator 中数据库类型和 Java 类型的映射关系...
- java下载本地目录excel_java写简单Excel 首行是目录 然后前台下载