1 错误重现

ERROR queue.BoundedInMemoryExecutor: error producing records0]
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hdp-yl-1:8020/user/testJoin/test_join27/join/default/1d0f7a5b-fcbc-40aa-994d-ada47e3a3257-0_0-59-5054_20211119171950.parquet

2 出现原因以及解决

出错原因为将要写入的表格的字段和目的表格字段的数据类型不同导致的。

解决的方法，就是重置写入数据的数据类型，例子见下。

write_df2 = write_df2.withColumn("superior_emp_id",col("superior_emp_id").cast("string"))

3 对Dataframe使用union时的问题

在spark上对dataframe使用union时，可能也会导致该问题。导致问题的原因为：Also as standard in SQL, this function resolves columns by position (not by name)【只是根据位置而不是根据名字做拼接】。

即，如果两个dataframe中字段顺序不一致，就会导致出现莫名的字段类型转换。例如，下面的例子（对两个dataframe做union，例子中打印出了两个原始表的schema和union后的表信息），sub_total_trans_cost被从int莫名的转为了double类型，

temp_df_inc_left schema:
root|-- sub_total_trans_cost: double (nullable = true)|-- sub_total_trans_price: integer (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)temp_df_inc_right:
root|-- sub_total_trans_price: integer (nullable = true)|-- sub_total_trans_cost: double (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)20
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|sub_total_trans_cost|sub_total_trans_price|id |trans_code                      |account_id|pay_channel_code|total_trans_price|total_trans_cost|create_time        |update_time        |
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|2565.2              |2905.0               |147|9a4a7a3f424f43018fef4a2ec0188c1e|yun4      |Alipay          |3385.8           |4285.2          |2021-03-20 03:49:34|2021-03-20 03:49:34|
|1414.0              |1614.0               |133|b47b20ac89fe4ccfb9b29005d338ad51|yun8      |Cash            |1781.5           |2307.6          |2021-03-30 12:39:57|2021-03-30 12:39:57|
|2127.2              |2367.0               |138|d8403a2dd23c4f24ab46ed709a995be4|yun1      |Cash            |2620.8           |3557.6          |2021-01-14 20:31:25|2021-01-14 20:31:25|
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
only showing top 3 rows

问题解决的方法为使用unionByName方法，该方法的功能解释（也就是按名字进行联合操作）如下【解决方法参考冯卡门迪的博文】：

   * Returns a new Dataset containing union of rows in this Dataset and another Dataset.** This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. To do a SQL-style set* union (that does deduplication of elements), use this function followed by a [[distinct]].** The difference between this function and [[union]] is that this function* resolves columns by name (not by position):

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding相关推荐

清除error.log、access.log并限制Apache日志文件大小的方法
清除error.log.access.log并限制Apache日志文件大小的方法 Apache下的access.log和error.log文件从安装服务器到现在没有动过,今天突然discuz 的MYS ...
Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Error cre
问题的发现 Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Err ...
spark-sql运行报错 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC
环境:CDH6.3.2 spark版本2.4.0 spark-sql脚本 ```shell #!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf ex ...
eclipse生成java项目出错,Java项目使用了HttpClients相关包，用eclipse导出jar包就不能正常运行Error: A JNI error has occurred...
代码是execute里面设置null不会有问题,如果是对象,就会报错.代码如下: HttpGet get = new HttpGet(); HttpResponse response = HttpCl ...
使用IntelliJ IDEA导入 Flink 消费kafka报错 Error: A JNI error has occurred, please check your installation an
提示找不到类,pom中已经引用了jar包,使用eclipse也可以执行,就是IntelliJ不行 java.lang.NoClassDefFoundError: org/apache/flink/ap ...
ERROR tool.ExportTool: Error during export: Export job failed!错误解析
在使用sqoop导出数据的时候出现了这种报错. 20/08/27 15:03:05 ERROR tool.ExportTool: Error during export: Export job fai ...
Error getting authority: Error initializing authority: Could not connect: No such file or directory
今天早上使用内网gitlab仓库的时候,发现页面无法打开,ssh也无法连接. 到机房接上显示器,发现如下错误: Error getting authority: Error initializing ...
ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user
使用sqoop从MySQL数据库导出数据时: [root@node3 bin]# ./sqoop import --connect jdbc:mysql://192.168.0.109:3306/fa ...
solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configured to run in local mode
问题: pip安装pydoop的时候报错: This will solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configure ...

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding

文章目录

1 错误重现

2 出现原因以及解决

3 对Dataframe使用union时的问题

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding相关推荐

最新文章

热门文章