ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding
文章目录
- 1 错误重现
- 2 出现原因以及解决
- 3 对Dataframe使用union时的问题
1 错误重现
ERROR queue.BoundedInMemoryExecutor: error producing records0]
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hdp-yl-1:8020/user/testJoin/test_join27/join/default/1d0f7a5b-fcbc-40aa-994d-ada47e3a3257-0_0-59-5054_20211119171950.parquet
2 出现原因以及解决
出错原因为将要写入的表格的字段和目的表格字段的数据类型不同导致的。
解决的方法,就是重置写入数据的数据类型,例子见下。
write_df2 = write_df2.withColumn("superior_emp_id",col("superior_emp_id").cast("string"))
3 对Dataframe使用union时的问题
在spark上对dataframe使用union时,可能也会导致该问题。导致问题的原因为:Also as standard in SQL, this function resolves columns by position (not by name)【只是根据位置而不是根据名字做拼接】。
即,如果两个dataframe中字段顺序不一致,就会导致出现莫名的字段类型转换。例如,下面的例子(对两个dataframe做union,例子中打印出了两个原始表的schema和union后的表信息),sub_total_trans_cost
被从int
莫名的转为了double
类型,
temp_df_inc_left schema:
root|-- sub_total_trans_cost: double (nullable = true)|-- sub_total_trans_price: integer (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)temp_df_inc_right:
root|-- sub_total_trans_price: integer (nullable = true)|-- sub_total_trans_cost: double (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)20
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|sub_total_trans_cost|sub_total_trans_price|id |trans_code |account_id|pay_channel_code|total_trans_price|total_trans_cost|create_time |update_time |
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|2565.2 |2905.0 |147|9a4a7a3f424f43018fef4a2ec0188c1e|yun4 |Alipay |3385.8 |4285.2 |2021-03-20 03:49:34|2021-03-20 03:49:34|
|1414.0 |1614.0 |133|b47b20ac89fe4ccfb9b29005d338ad51|yun8 |Cash |1781.5 |2307.6 |2021-03-30 12:39:57|2021-03-30 12:39:57|
|2127.2 |2367.0 |138|d8403a2dd23c4f24ab46ed709a995be4|yun1 |Cash |2620.8 |3557.6 |2021-01-14 20:31:25|2021-01-14 20:31:25|
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
only showing top 3 rows
问题解决的方法为使用unionByName
方法,该方法的功能解释(也就是按名字进行联合操作)如下【解决方法参考冯卡门迪的博文】:
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.** This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. To do a SQL-style set* union (that does deduplication of elements), use this function followed by a [[distinct]].** The difference between this function and [[union]] is that this function* resolves columns by name (not by position):
ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding相关推荐
- 清除error.log、access.log并限制Apache日志文件大小的方法
清除error.log.access.log并限制Apache日志文件大小的方法 Apache下的access.log和error.log文件从安装服务器到现在没有动过,今天突然discuz 的MYS ...
- Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Error cre
问题的发现 Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Err ...
- spark-sql运行报错 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC
环境:CDH6.3.2 spark版本2.4.0 spark-sql脚本 ```shell #!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf ex ...
- eclipse生成java项目出错,Java项目使用了HttpClients相关包,用eclipse导出jar包就不能正常运行Error: A JNI error has occurred...
代码是execute里面设置null不会有问题,如果是对象,就会报错.代码如下: HttpGet get = new HttpGet(); HttpResponse response = HttpCl ...
- 使用IntelliJ IDEA导入 Flink 消费kafka报错 Error: A JNI error has occurred, please check your installation an
提示找不到类,pom中已经引用了jar包,使用eclipse也可以执行,就是IntelliJ不行 java.lang.NoClassDefFoundError: org/apache/flink/ap ...
- ERROR tool.ExportTool: Error during export: Export job failed!错误解析
在使用sqoop导出数据的时候出现了这种报错. 20/08/27 15:03:05 ERROR tool.ExportTool: Error during export: Export job fai ...
- Error getting authority: Error initializing authority: Could not connect: No such file or directory
今天早上使用内网gitlab仓库的时候,发现页面无法打开,ssh也无法连接. 到机房接上显示器,发现如下错误: Error getting authority: Error initializing ...
- ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user
使用sqoop从MySQL数据库导出数据时: [root@node3 bin]# ./sqoop import --connect jdbc:mysql://192.168.0.109:3306/fa ...
- solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configured to run in local mode
问题: pip安装pydoop的时候报错: This will solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configure ...
最新文章
- 数据结构之【队列】的基本操作C语言实现
- Visual Studio 2005常用插件搜罗
- wordpress网站后台打开速度很慢解决方法?
- 跨域加了header也解决不了?
- bootstrap-wysiwyg 结合 base64 解码 .net bbs 图片操作类 (三) 图片裁剪
- add in Web.config
- Vista开发兼容性概述
- jfreechart环形图完美实现
- 洛天依-元宵吃货节歌词
- qt 使用msvc 打断点无反应解决办法
- wamp phpMyAdmin error #1045 - Access denied for user root@locahost Fixed!
- 在华为公司的项目总结
- 智公网:公务员行测基础考点
- 106句激励自我的话
- 参加第一场多校大一训练赛后的感想
- 【tensorrt】——Network has dynamic or shape inputs, but no optimization profile has been defined.
- python绘制敏感性和特异性曲线(交叉)
- [cesium] 基于Cesium的动态泛光效果示例
- oracle11g-R2数据库的逻辑备份
- 多属性决策的权重确定方法及matlab 程序