java csv快速解析

总览 (Overview)

With a plethora of options to parse CSV files/data and adding to it the inconsistency of the data present in the files, have you ever wondered about a simple and efficient package to do it for you? Presenting Papa Parse, a robust JavaScript library that is claimed to be the fastest in-browser CSV parser! This is your one-stop-shop for parsing CSV to JSON!

有很多选项可解析CSV文件/数据，并增加文件中数据的不一致性，您是否曾经想过一个简单有效的软件包可以为您做这件事？向您介绍Papa Parse，这是一个功能强大JavaScript库，据称是浏览器中最快的CSV解析器！这是将CSV解析为JSON的一站式服务！

强调 (Highlights)

Before getting into the features of Papa Parse, let’s look at how we can include this package in our code:

在了解Papa Parse的功能之前，让我们看一下如何在代码中包含此软件包：

/* babel or ES6 */import papaparse from ‘papaparse’;/* node or require js */const papaparse = require(‘papaparse’);

使用的一般语法 (The general syntax of use)

For a CSV string:

对于CSV字符串：

var parsedOutput = Papa.parse(stringOfCsv[, config])

There are numerous configurations to choose from, best explained in the Papa Parse documentation here.

有很多配置可供选择，请在此处的Papa Parse文档中详细说明。

For a file:

对于文件：

Papa.parse(myFileInput.files[0], { complete: function(parsedOutput) {        console.log(parsedOutput);    }});

As the portion of file parsing is an asynchronous activity, a call back must be added to collect the results.

由于文件解析的部分是异步活动，因此必须添加回调以收集结果。

The same is the case when we want to fetch the CSV file from a URL:

当我们要从URL提取CSV文件时，情况也是如此：

Papa.parse(csvUrl, { download: true,   complete: function(parsedOutput) {        console.log(parsedOutput);    }});

The parsed output/result consists of three parts — data array, errors array and the meta object. The data array has the result of the CSV rows parsed.

解析的输出/结果由三部分组成- data数组， errors数组和meta对象。 data数组具有解析的CSV行的结果。

The data is an array only when the value is header: false in configs. If the config is header: true, then the output data is a set of objects keyed by the column field names.

的 data 是只有当该值是一个数组 header: false 在CONFIGS。 如果config是 header: true ，那么输出数据是一组由列字段名称作为键的对象。

The errors array contains the information on any errors which are encountered while parsing the CSV. The meta object is an object consisting of metadata related to the parsing such as delimiters, line break sequences, and field names to name a few.

errors数组包含有关解析CSV时遇到的任何错误的信息。 meta对象是一个由与解析相关的元数据组成的对象，例如定界符，换行符和字段名称等。

自动定界符检测 (Auto delimiter detection)

There are many scenarios in which you wouldn’t be sure of the delimiter used in the CSV. Not to worry! Papa Parse has an auto delimiter detection feature in which the first few rows of the CSV are scanned to automatically figure out the delimiter used in the CSV file.

在许多情况下，您将无法确定CSV中使用的分隔符。不要担心！ Papa Parse具有自动分隔符检测功能，该功能会扫描CSV的前几行，以自动找出CSV文件中使用的分隔符。

The delimiter which was considered for parsing can always be checked in the result output’s meta object under the delimiter field.

始终可以在delimiter字段下的结果输出的meta对象中检查要分析的定界符。

var output = Papa.parse(stringOfCsv); // input: a,b,c,d,econsole.log(output.meta.delimiter); // delimiter: ,

If you don’t want to have auto-detection of delimiters but want to provide a range of delimiters to guess from while parsing the CSV, there’s a config option called delimitersToGuess which takes in a list of delimiters provided as input. The default value for delimitersToGuess is -

如果您不想自动检测到定界符，但想在解析CSV时提供一定范围的定界符以进行猜测，则可以使用一个名为delimitersToGuess的配置选项，该选项将一列定界符作为输入提供。 delimitersToGuess的默认值为-

delimitersToGuess : [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP]

Where Papa.RECORD_SEP and Papa.UNIT_SEP are read-only properties used to represent the ASCII Code 30 and ASCII Code 31 respectively as delimiters.

其中Papa.RECORD_SEP和Papa.UNIT_SEP是只读属性，用于分别将ASCII代码30和ASCII代码31表示为定界符。

解析大型文件输入的能力 (Ability to parse huge file inputs)

If the input file is really huge, then Papa Parse has the ability to stream the input data and provide the output row-by-row. Doing this will avoid loading the whole file into memory which might otherwise crash the browser. The step function should be provided as a config which collects the result for each row.

如果输入文件确实很大，则Papa Parse可以流传输输入数据并逐行提供输出。这样做可以避免将整个文件加载到内存中，否则可能导致浏览器崩溃。应该将step函数作为配置提供，该配置收集每一行的结果。

Papa.parse("http://csvexample.com/enormous.csv", { download: true,   step: function(row, parser) {     console.log("Row:", row.data);  },    complete: function() {        console.log("All done!");   }});

The second input to the step function is parser. The parser object can be used to abort, pause, or resume the CSV parsing.

step函数的第二个输入是parser 。解析器对象可用于中止，暂停或恢复CSV解析。

parser.abort();parser.pause();parser.resume();

Do not use parser.pause() and parser.resume() while using Web Workers in your CSV parsing as the threads can get held up waiting for the continue signal from the main thread making the whole UX sluggish. More on that here.

在CSV解析中使用Web Workers时，请勿使用parser.pause()和parser.resume()，因为线程可能被阻塞，等待主线程发出的continue信号，从而使整个UX缓慢。 在这里更多。

Papa Parse中的多线程选项 (Multithreading option in Papa Parse)

If you are worried that your webpage will become unresponsive because of a CSV parsing script running for a long time on the main/UI thread, Papa Parse provides a configuration called worker which when set to true will ensure that a worker thread is used for the parsing of the CSV. Adding a worker thread might result in the parsing operation to slow down a little bit but will ensure that your website will remain responsive.

如果您担心由于CSV解析脚本在主/ UI线程上长时间运行而导致网页无响应，则Papa Parse提供了一个名为worker的配置，当将其设置为true将确保将worker线程用于CSV解析。添加辅助线程可能会导致解析操作稍慢一些，但可以确保您的网站保持响应状态。

Papa.parse("http://csvexample.com/enormous.csv",  { worker: true, step: function(row) {     console.log("Row:", row.data);  },    complete: function() {        console.log("All done!");   }});

The worker thread is an extension of the default Worker interface provided by javascript.

工作线程是默认的扩展，工人由JavaScript提供的接口。

在CSV中发表评论？ (Comments in your CSV?)

However bizarre it sounds, if there are comments in your CSV which you would not want to parse, you can add the config provided by Papa Parse called comments and set it to a value that represents the comments’ format.

无论听起来多么奇怪，如果CSV中有您不想解析的注释，您都可以添加Papa Parse提供的称为comments的配置，并将其设置为代表注释格式的值。

Papa.parse("http://csvexample.com/csv.csv”,  {        comments: “#”, // All lines starting with ‘#’ are treated as comments and ignored by the parser. complete: function(parsedOutput) {        console.log(parsedOutput);    }});

爸爸解析中的类型转换 (Type Conversion in Papa Parse)

By default, all lines and fields are parsed as strings. But if you want to preserve the numeric and boolean types, Papa Parse provides an option called dynamicTyping to automatically enable the type conversion for your data.

默认情况下，所有行和字段都解析为字符串。但是，如果要保留数字和布尔类型，Papa Parse提供了一个称为dynamicTyping的选项，可以自动为数据启用类型转换。

Papa.parse("http://csvexample.com/csv.csv”,  {        dynamicTyping: true,    complete: function(parsedOutput) {        console.log(parsedOutput);    }});

If true, numeric and boolean data in the string will be converted to their respective types. Numeric data must conform to the definition of a decimal literal. Numerical values greater than 2⁵³ or less than -2⁵³ will not be converted to numbers to preserve precision. European-formatted numbers must have commas and dots swapped. It also accepts an object or a function. In the case of an object, its values should be a boolean to indicate if dynamic typing should be applied for each column number (or header name if using headers). If it’s a function, it should return a boolean value for each field number (or name if using headers) which will be passed as the first argument.

如果为true ，则字符串中的数字和布尔数据将转换为它们各自的类型。数值数据必须符合十进制文字的定义。大于2⁵³或小于-2⁵³的数值将不会转换为数字以保持精度。欧洲格式的数字必须替换逗号和点。它还接受对象或函数。对于对象，其值应为布尔值，以指示是否应将动态类型应用于每个列号(如果使用标题，则为标题名称)。如果是函数，则应为每个字段号(如果使用标题，则为名称)返回一个布尔值，该值将作为第一个参数传递。

将JSON转换为CSV格式 (Converting JSON to CSV format)

Another wonderful feature of Papa Parse is its ability to convert JSON to CSV. All this while, you would have come across the parse() function. But for this feature, Papa Parse provides the unparse() option.

Papa Parse的另一个出色功能是它能够将JSON转换为CSV。所有这一切，您都会遇到parse()函数。但是对于此功能，Papa Parse提供了unparse()选项。

The output of the unparse() is a neatly formatted string of CSV. The general syntax is -

unparse()的输出是格式正确的CSV字符串。通用语法是-

Papa.unparse(data[, config])

The data field can be an array of objects, an array of arrays or an object with header fields and data. The optional config for unparse(), much like the one for the parse() function has a wide range of options to choose from. You can check them out here.

数据字段可以是对象数组，数组数组或带有标头字段和数据的对象。 unparse()的可选配置，非常类似于parse()函数的配置，具有多种选择。您可以在此处查看它们。

错误处理 (Error Handling)

The last feature we will be discussing in this article is about the error handling by Papa Parse.

我们将在本文中讨论的最后一个功能是关于Papa Parse的错误处理。

As mentioned at the top of the article, the parsed results consist of three components: data, errors and meta.

如本文顶部所述，解析结果由三个部分组成： data, errors and meta 。

The errors array is structured in the following way:

errors数组的结构如下：

{   type: "",     // A generalization of the error  code: "",     // Standardized error code    message: "",  // Human-readable details row: 0,       // Row index of parsed data where error is}

One way of extracting the errors:

提取错误的一种方法：

var results = Papa.parse(csvString);console.log(results.errors.<key_type>);

Even if you do encounter errors while parsing, that’s no indication that the parsing of the CSV file failed.

即使您在解析时遇到错误，也不表示CSV文件的解析失败。

一些有用的解析配置 (A few useful configs for parsing)

Some notable configs of Papa Parse for parsing which we will just mention here are:

Papa Parse进行解析的一些值得注意的配置如下：

newline - The newline sequencequoteChar - The character used to quote fieldsescapeChar - The character used to escape the quote character within a fieldpreview - If > 0, only that many rows will be parsedtransformHeader - A function to apply on each header. Requires header:truechunk - A callback function, identical to step, which activates streaming

newline -换行符序列quoteChar -用来报字符字段escapeChar -用于转义一个字段内的引号字符的字符preview -如果> 0，只有多行会被解析transformHeader -施加在每个头标的函数。需要header:true chunk -一个与step相同的回调函数，用于激活流

And many more :)

还有很多：)

附加实用程序功能 (Bonus Utility Functions)

Below are some React and Angular implementations for using Papa Parse to parse CSV data:

以下是一些使用Papa Parse解析CSV数据的React和Angular实现：

React Hook

React钩

function useGoogleSheetData(url) {  const [rows, setRows] = useState([]);  useEffect(() => {    Papa.parse(url, {          download: true,          header: true,          complete: function(results) {            setRows(results.data);          }  }, [url]);  return rows;}and we would use it as:const rows = useGoogleSheetData("<my_csv_url>");

Angular Observable

角度可观察

useGoogleSheetData = (url: string): Observable<any> => {    return new Observable((observer) => {      parse(url, {        download: true,        header: true,        complete: (result) => {          observer.next(result);          observer.complete();        },        error: (error) => {           observer.error(error);          observer.complete();        }      })    });};Can be used as below:this.useGoogleSheetData("<my_csv_url>").pipe(catchError((error) => {    console.error(error);    })).subscribe((data) => {      this.sheetData = data;    });}

评估指标 (Evaluation Metrics)

结论 (Conclusion)

Looking at the features described above for Papa Parse, and many more it has to offer (you can check them out here), it is beyond any doubt that this package is the real deal. The ability of Papa Parse to handle huge files and unstructured data and its support for taking in readable streams as input(used in node.js) is what makes it stand out from the rest of the CSV parsing packages.

查看上述针对Papa Parse的功能，以及它提供的更多功能(您可以在此处查看 )，毫无疑问，此程序包是真正的交易。 Papa Parse处理大型文件和非结构化数据的能力及其支持将可读流作为输入(在node.js中使用)的支持，使其在其他CSV解析包中脱颖而出。

Hope you’ve got a good insight into what Papa Parse is all about and how you can use it for your future projects :-)

希望您对Papa Parse的意义以及如何将其用于未来的项目有很好的了解：-)

检查包装和一些阅读材料 (Check out the package and some reading materials)

https://www.npmjs.com/package/papaparse

https://www.npmjs.com/package/papaparse
https://www.papaparse.com/

https://www.papaparse.com/

套餐的视频审查 (Video review of the package)

Video review of the package with interesting use cases and in-depth exploration of the features coming soon! For more related content, check out Unpackaged Reviews.

对该视频进行了视频回顾，并附带了有趣的用例，并对即将推出的功能进行了深入探索！有关更多相关内容，请查看未包装的评论。

披露事项 (Disclosures)

The content and evaluation scores mentioned in this article/review is subjective and is the personal opinion of authors at Unpackaged Reviews based on everyday usage and research on popular developer forums. They do not represent any company’s views and are not impacted by any sponsorships/collaboration.

本文/评论中提到的内容和评估得分是主观的，是无包装评论中作者的个人观点，基于日常使用和对流行的开发者论坛的研究。它们不代表任何公司的观点，也不受任何赞助/合作的影响。

翻译自: https://codeburst.io/papa-parse-lightning-fast-csv-parsing-experience-5ee41cb5f4cf