原文地址:http://www.baeldung.com/java-read-lines-large-file

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

2. Reading In Memory

The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

?
1
Files.readLines(new File(path), Charsets.UTF_8);
?
1
FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

?
1
2
3
4
5
@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

?
1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

?
1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory(~150 Mb consumed)

?
1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:

?
1
2
3
4
5
6
7
8
9
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers(~150 Mb consumed)

?
1
2
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.

转载于:https://www.cnblogs.com/davidwang456/p/4766726.html

Java – Reading a Large File Efficiently--转相关推荐

  1. 【MRT报错问题】Error: ReadParameterFile : Reading Input Parameter File

    盘点那些使用MRT过程中的 " 不走心" 带来的ERROR ~~知错能改,善莫大焉~~~ 疑难杂症浅录 ERROR 1: 执行投影时,读参数文件报错 MODIS Reproject ...

  2. Loading Large Bitmaps Efficiently(官方文档)

    Displaying Bitmaps Efficiently系列 Loading Large Bitmaps Efficiently Processing Bitmaps Off the UI Thr ...

  3. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  4. Java删除文件(delete file in java)

    Java中,可用File.delete()删除一个文件,调用该方法后将返回一个布尔类型的值,true表示删除成功,false则表示删除失败. 本篇文章,将删除"H:\\temp\\style ...

  5. -bash: /tyrone/jdk/jdk1.8.0_91/bin/java: cannot execute binary file

    问题描述:今天在linux环境下安装了一下JDK,安装成功后,打算输入java -version去测试一下,结果却出错了. 错误信息:-bash: /tyrone/jdk/jdk1.8.0_91/bi ...

  6. 80m的mysql文件要导入多久_mysql导入数据库文件最大限制更改解决方法:You probably tried to upload too large file...

    最近一次在用phpmyadmin导入mysql数据库时,偶的15M的数据库不能导入,mysql数据库最大只能导入2M.. phpmyadmin数据库导入出错: You probably tried t ...

  7. java: cannot execute binary file错误

    http://everlook.iteye.com/blog/1568886 tomcat报错: /data/cmsolr/tomcat-solr-bid/bin/catalina.sh: line ...

  8. windows安装 Git Large File Storage大文件下载工具ge

    下载地址 导航到 git-lfs.github.com 并单击Download开始下载 git-lfs的用法指南 验证安装成功 打开Git Bash 验证安装成功,使用命令 git lfs insta ...

  9. java: cannot execute binary file 如果遇到这个错,一般是操作系统位数出问题了。

    [root@testserver usr]# java/jdk1.6.0_12/bin/java -bash: java/jdk1.6.0_12/bin/java: cannot execute bi ...

最新文章

  1. 【linux】Valgrind工具集详解(七):Memcheck(内存错误检测器)
  2. 可视化深度学习模型的训练误差和验证误差
  3. mysql 参数sql文件_为MySQL的source命令导入SQL文件配置参数
  4. 简易灯箱画廊设计html,原生Js实现的画廊功能
  5. 归并排序的基本原理及实现
  6. 工艺流程计算机仿真设计,450kt/a合成氨反应器及生产过程计算机仿真设计
  7. VTK:几何对象之OpenVROrientedArrow
  8. python的scikit-learn算法库实现
  9. c 给定字符串中查找_面试 | 查找类算法精析
  10. Flask爱家租房--celery(发送验证短信)
  11. matlab 中 Transform,变换数据存储 - MATLAB transform - MathWorks 中国
  12. python遍历获取一个类的所有子类
  13. 【实例分割论文】SOLOv2: Dynamic, Faster and Stronger
  14. git学习笔记-(9-高层命令-分支基础)
  15. Linux中句柄是什么?
  16. HTML(一)静态登录注册页面附有完整网页(html+css+js)
  17. DVWA之暴力破解漏洞
  18. uniapp跳转微信客服总结及报错 (deeplink customerservice no permission)
  19. MANIFEST.MF文件详解
  20. [洛谷3041]视频游戏的连击Video Game Combos

热门文章

  1. 10kv电压互感器型号_35kV、10kV母线电压异常处理
  2. matlab 蜂窝网格,blender怎么制作蜂巢网格 蜂窝式网格画法
  3. matlab 随机森林算法_(六)如何利用Python从头开始实现随机森林算法
  4. python画图如何调整图例位置_matplotlib中legend位置调整解析
  5. java的输出的例子_Java例子:万年历的输出
  6. pve虚拟机导入gho_用vmware安装gho文件心得
  7. mysql php 入门_第一节 数据库概述_MySQL_php入门教程
  8. springboot session默认失效时间_Spring Boot 整合 Redis,用起来真简单
  9. java long 对应mybati类型_修改 mybatis-generator 中数据库类型和 Java 类型的映射关系...
  10. java下载本地目录excel_java写简单Excel 首行是目录 然后前台下载