本文翻译自:Read a file one line at a time in node.js?

I am trying to read a large file one line at a time. 我试图一次读一行大文件。 I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together. 我在Quora上发现了一个关于这个问题的问题,但是我错过了一些联系,以使整个事情融合在一起。

 var Lazy=require("lazy");new Lazy(process.stdin).lines.forEach(function(line) { console.log(line.toString()); });process.stdin.resume();

The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample. 我想弄清楚的是我如何从文件而不是STDIN一次读取一行,如本示例所示。

I tried: 我试过了:

 fs.open('./VeryBigFile.csv', 'r', '0666', Process);function Process(err, fd) {if (err) throw err;// DO lazy read }

but it's not working. 但它不起作用。 I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out. 我知道,在紧要关头我可以回归使用像PHP这样的东西,但我想弄明白这一点。

I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for. 我认为其他答案不会起作用,因为文件比我正在运行它的服务器大得多。


#1楼

参考:https://stackoom.com/question/PpaP/在node-js中一次读取一行文件


#2楼

Edit: 编辑:

Use a transform stream . 使用转换流 。


With a BufferedReader you can read lines. 使用BufferedReader,您可以读取行。

new BufferedReader ("lorem ipsum", { encoding: "utf8" }).on ("error", function (error){console.log ("error: " + error);}).on ("line", function (line){console.log ("line: " + line);}).on ("end", function (){console.log ("EOF");}).read ();

#3楼

there is a very nice module for reading a file line by line, it's called line-reader 有一个非常好的模块用于逐行读取文件,它被称为行读取器

with it you simply just write: 有了它,你只需写:

var lineReader = require('line-reader');lineReader.eachLine('file.txt', function(line, last) {console.log(line);// do whatever you want with line...if(last){// or check if it's the last one}
});

you can even iterate the file with a "java-style" interface, if you need more control: 如果需要更多控制,你甚至可以使用“java风格”界面迭代文件:

lineReader.open('file.txt', function(reader) {if (reader.hasNextLine()) {reader.nextLine(function(line) {console.log(line);});}
});

#4楼

You can always roll your own line reader. 您可以随时滚动自己的线路阅读器。 I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\\n' 我还没有对此片段进行基准测试,但它正确地将传入的块流分成不带尾随'\\ n'的行

var last = "";process.stdin.on('data', function(chunk) {var lines, i;lines = (last+chunk).split("\n");for(i = 0; i < lines.length - 1; i++) {console.log("line: " + lines[i]);}last = lines[i];
});process.stdin.on('end', function() {console.log("line: " + last);
});process.stdin.resume();

I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash. 我在处理日志解析期间需要累积数据的快速日志解析脚本时确实想到了这一点,我觉得尝试使用js和节点而不是使用perl或bash这样做会很好。

Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest . 无论如何,我确实认为小nodejs脚本应该是自包含的而不是依赖于第三方模块,所以在阅读了这个问题的所有答案后,每个人都使用各种模块来处理行解析,13 SLOC本机nodejs解决方案可能会引起关注。


#5楼

I wanted to tackle this same problem, basically what in Perl would be: 我想解决同样的问题,基本上是Perl中的问题:

while (<>) {process_line($_);
}

My use case was just a standalone script, not a server, so synchronous was fine. 我的用例只是一个独立的脚本,而不是服务器,所以同步很好。 These were my criteria: 这些是我的标准:

  • The minimal synchronous code that could reuse in many projects. 可在许多项目中重用的最小同步代码。
  • No limits on file size or number of lines. 文件大小或行数没有限制。
  • No limits on length of lines. 线条长度没有限制。
  • Able to handle full Unicode in UTF-8, including characters beyond the BMP. 能够处理UTF-8中的完整Unicode,包括BMP之外的字符。
  • Able to handle *nix and Windows line endings (old-style Mac not needed for me). 能够处理* nix和Windows系列结尾(我不需要旧式Mac)。
  • Line endings character(s) to be included in lines. 行结尾要包含在行中的字符。
  • Able to handle last line with or without end-of-line characters. 能够处理包含或不包含行尾字符的最后一行。
  • Not use any external libraries not included in the node.js distribution. 不使用node.js发行版中未包含的任何外部库。

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl. 这是一个让我了解node.js中的低级脚本类型代码的项目,并决定它作为Perl等其他脚本语言的替代品的可行性。

After a surprising amount of effort and a couple of false starts this is the code I came up with. 经过一番惊人的努力和一些错误的开始,这就是我提出的代码。 It's pretty fast but less trivial than I would've expected: (fork it on GitHub) 它比我预期的要快但不那么简单:( 在GitHub上分叉)

var fs            = require('fs'),StringDecoder = require('string_decoder').StringDecoder,util          = require('util');function lineByLine(fd) {var blob = '';var blobStart = 0;var blobEnd = 0;var decoder = new StringDecoder('utf8');var CHUNK_SIZE = 16384;var chunk = new Buffer(CHUNK_SIZE);var eolPos = -1;var lastChunk = false;var moreLines = true;var readMore = true;// each linewhile (moreLines) {readMore = true;// append more chunks from the file onto the end of our blob of text until we have an EOL or EOFwhile (readMore) {// do we have a whole line? (with LF)eolPos = blob.indexOf('\n', blobStart);if (eolPos !== -1) {blobEnd = eolPos;readMore = false;// do we have the last line? (no LF)} else if (lastChunk) {blobEnd = blob.length;readMore = false;// otherwise read more} else {var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);lastChunk = bytesRead !== CHUNK_SIZE;blob += decoder.write(chunk.slice(0, bytesRead));}}if (blobStart < blob.length) {processLine(blob.substring(blobStart, blobEnd + 1));blobStart = blobEnd + 1;if (blobStart >= CHUNK_SIZE) {// blobStart is in characters, CHUNK_SIZE is in octetsvar freeable = blobStart / CHUNK_SIZE;// keep blob from growing indefinitely, not as deterministic as I'd likeblob = blob.substring(CHUNK_SIZE);blobStart -= CHUNK_SIZE;blobEnd -= CHUNK_SIZE;}} else {moreLines = false;}}
}

It could probably be cleaned up further, it was the result of trial and error. 它可能会被进一步清理,这是反复试验的结果。


#6楼

i use this: 我用这个:

function emitLines(stream, re){re = re && /\n/;var buffer = '';stream.on('data', stream_data);stream.on('end', stream_end);function stream_data(data){buffer += data;flush();}//stream_datafunction stream_end(){if(buffer) stream.emmit('line', buffer);}//stream_endfunction flush(){var re = /\n/;var match;while(match = re.exec(buffer)){var index = match.index + match[0].length;stream.emit('line', buffer.substring(0, index));buffer = buffer.substring(index);re.lastIndex = 0;}}//flush}//emitLines

use this function on a stream and listen to the line events that is will emit. 在流上使用此函数并侦听将要发出的行事件。

gr- GR-

在node.js中一次读取一行文件?相关推荐

  1. 在Node.js中,如何从其他文件中“包含”函数?

    假设我有一个名为app.js的文件. 很简单: var express = require('express'); var app = express.createServer(); app.set( ...

  2. 读取Node.js中的环境变量

    有没有办法在Node.js代码中读取环境变量? 例如,例如Python的os.environ['HOME'] . #1楼 如果要使用在Node.js程序中生成的字符串键(例如var v = 'HOME ...

  3. node.js中模块_在Node.js中需要模块:您需要知道的一切

    node.js中模块 by Samer Buna 通过Samer Buna 在Node.js中需要模块:您需要知道的一切 (Requiring modules in Node.js: Everythi ...

  4. 在Node.js中操作文件系统(一)

    在Node.js中操作文件系统 在Node.js中,使用fs模块来实现所有有关文件及目录的创建,写入及删除操作.在fs模块中,所有对文件及目录的操作都可以使用同步与异步这两种方法.比如在执行读文件操作 ...

  5. node mongoose_如何使用Express,Mongoose和Socket.io在Node.js中构建实时聊天应用程序

    node mongoose by Arun Mathew Kurian 通过阿伦·马修·库里安(Arun Mathew Kurian) 如何使用Express,Mongoose和Socket.io在N ...

  6. boa支持https_Boa: 在 Node.js 中使用 Python

    Hello,大家好,有一段时间不见了. 这次主要给大家带来一个好东西,它的主要用途就是能让大家在 Node.js 中使用 Python 的接口和函数.可能你看到这里会好奇,会疑惑,会不解,我 Node ...

  7. 如何在node.js中发出HTTP POST请求?

    如何在node.js中使用数据发出出站HTTP POST请求? #1楼 如果您使用请求库,这会变得更容易. var request = require('request');request.post( ...

  8. import export php,import与export在node.js中的使用方法

    import与export是es6中模块化的导入与导出,node.js现阶段不支持,需要通过babel进行编译,使其变成node.js的模块化代码.(关于node.js模块,可参考其他node.js模 ...

  9. 二十六、深入Node.js中的文件系统fs模块

    @Author:Runsen @Date:2020/6/8 人生最重要的不是所站的位置,而是内心所朝的方向.只要我在每篇博文中写得自己体会,修炼身心:在每天的不断重复学习中,耐住寂寞,练就真功,不畏艰 ...

最新文章

  1. datagrid资料+ by iCeSnaker - Program rhapsody
  2. C#设置System.Net.ServicePointManager.DefaultConnectionLimit,突破Http协议的并发连接数限制...
  3. 颜值当道,画风为王——桌游美术风格漫谈
  4. python to sql_python的to_sql那点儿事
  5. BSP技术详解3---有图有真相
  6. sysdig案例分析 - 哪些文件正在被进程访问
  7. python渲染html 库_在Python中使用CasperJS获取JS渲染生成的HTML内容的教程
  8. 祝贺!王小云院士连获两项国际大奖
  9. mysql双机热备 配置文件,MYSQL 双机热备配置手册()
  10. 执行work count程序报错Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
  11. 90后过年,吃零下18度的年夜饭
  12. JBoss企业级应用服务平台群集指南(一)
  13. 33. 深入解析互联网协议的原理
  14. 目录-管壳式换热器的分析与计算
  15. HashMap 底层
  16. 计算机软硬件故障排除知识,计算机软硬件基础知识及常见故障排除方法
  17. C#桌面办公应用-工资管理系统系列二
  18. HDU 1847 Good Luck in CET-4 Everybody!(巴什博弈论)
  19. HCIA/HCIP使用eNSP模拟VRRP配置实验(接入层 汇聚层 核心层 VLAN OSPF VRRP STP DHCP的综合应用)
  20. 通过z39.50协议用YAZ软件获取Marc数据(JAVA版)

热门文章

  1. oracle 创建数据库 表空间 用户 授权和toad导入导出数据库
  2. Java 获取 Julian Day (Calendar)
  3. Ubuntu 修改截屏快捷键
  4. 解决 Unable to get provider
  5. Flutter中关键字Const和Final之间的区别
  6. 如何在Laravel 中对大文件进行加密?
  7. iOS架构-静态库.framework(引用第三方SDK、开源库、资源包)(9)
  8. 如何用纯 CSS 创作一个单元素抛盒子的 loader
  9. List集合的三个实现类比较
  10. Introduction to pinatrace annotate version 2: a look into latches again