文章目录

  • 1 错误重现
  • 2 出现原因以及解决
  • 3 对Dataframe使用union时的问题

1 错误重现

ERROR queue.BoundedInMemoryExecutor: error producing records0]
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hdp-yl-1:8020/user/testJoin/test_join27/join/default/1d0f7a5b-fcbc-40aa-994d-ada47e3a3257-0_0-59-5054_20211119171950.parquet

2 出现原因以及解决

出错原因为将要写入的表格的字段和目的表格字段的数据类型不同导致的。

解决的方法,就是重置写入数据的数据类型,例子见下。

write_df2 = write_df2.withColumn("superior_emp_id",col("superior_emp_id").cast("string"))

3 对Dataframe使用union时的问题

在spark上对dataframe使用union时,可能也会导致该问题。导致问题的原因为:Also as standard in SQL, this function resolves columns by position (not by name)【只是根据位置而不是根据名字做拼接】。

即,如果两个dataframe中字段顺序不一致,就会导致出现莫名的字段类型转换。例如,下面的例子(对两个dataframe做union,例子中打印出了两个原始表的schema和union后的表信息),sub_total_trans_cost被从int莫名的转为了double类型,

temp_df_inc_left schema:
root|-- sub_total_trans_cost: double (nullable = true)|-- sub_total_trans_price: integer (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)temp_df_inc_right:
root|-- sub_total_trans_price: integer (nullable = true)|-- sub_total_trans_cost: double (nullable = true)|-- id: integer (nullable = true)|-- trans_code: string (nullable = true)|-- account_id: string (nullable = true)|-- pay_channel_code: string (nullable = true)|-- total_trans_price: double (nullable = true)|-- total_trans_cost: double (nullable = true)|-- create_time: string (nullable = true)|-- update_time: string (nullable = true)20
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|sub_total_trans_cost|sub_total_trans_price|id |trans_code                      |account_id|pay_channel_code|total_trans_price|total_trans_cost|create_time        |update_time        |
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
|2565.2              |2905.0               |147|9a4a7a3f424f43018fef4a2ec0188c1e|yun4      |Alipay          |3385.8           |4285.2          |2021-03-20 03:49:34|2021-03-20 03:49:34|
|1414.0              |1614.0               |133|b47b20ac89fe4ccfb9b29005d338ad51|yun8      |Cash            |1781.5           |2307.6          |2021-03-30 12:39:57|2021-03-30 12:39:57|
|2127.2              |2367.0               |138|d8403a2dd23c4f24ab46ed709a995be4|yun1      |Cash            |2620.8           |3557.6          |2021-01-14 20:31:25|2021-01-14 20:31:25|
+--------------------+---------------------+---+--------------------------------+----------+----------------+-----------------+----------------+-------------------+-------------------+
only showing top 3 rows

问题解决的方法为使用unionByName方法,该方法的功能解释(也就是按名字进行联合操作)如下【解决方法参考冯卡门迪的博文】:

   * Returns a new Dataset containing union of rows in this Dataset and another Dataset.** This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. To do a SQL-style set* union (that does deduplication of elements), use this function followed by a [[distinct]].** The difference between this function and [[union]] is that this function* resolves columns by name (not by position):

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding相关推荐

  1. 清除error.log、access.log并限制Apache日志文件大小的方法

    清除error.log.access.log并限制Apache日志文件大小的方法 Apache下的access.log和error.log文件从安装服务器到现在没有动过,今天突然discuz 的MYS ...

  2. Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Error cre

    问题的发现 Error parsing SQL Mapper Configuration. Cause: org.apache.ibatis.builder.BuilderException: Err ...

  3. spark-sql运行报错 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC

    环境:CDH6.3.2 spark版本2.4.0 spark-sql脚本 ```shell #!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf ex ...

  4. eclipse生成java项目出错,Java项目使用了HttpClients相关包,用eclipse导出jar包就不能正常运行Error: A JNI error has occurred...

    代码是execute里面设置null不会有问题,如果是对象,就会报错.代码如下: HttpGet get = new HttpGet(); HttpResponse response = HttpCl ...

  5. 使用IntelliJ IDEA导入 Flink 消费kafka报错 Error: A JNI error has occurred, please check your installation an

    提示找不到类,pom中已经引用了jar包,使用eclipse也可以执行,就是IntelliJ不行 java.lang.NoClassDefFoundError: org/apache/flink/ap ...

  6. ERROR tool.ExportTool: Error during export: Export job failed!错误解析

    在使用sqoop导出数据的时候出现了这种报错. 20/08/27 15:03:05 ERROR tool.ExportTool: Error during export: Export job fai ...

  7. Error getting authority: Error initializing authority: Could not connect: No such file or directory

    今天早上使用内网gitlab仓库的时候,发现页面无法打开,ssh也无法连接. 到机房接上显示器,发现如下错误: Error getting authority: Error initializing ...

  8. ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user

    使用sqoop从MySQL数据库导出数据时: [root@node3 bin]# ./sqoop import --connect jdbc:mysql://192.168.0.109:3306/fa ...

  9. solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configured to run in local mode

    问题: pip安装pydoop的时候报错: This will solve error pydoop.LocalModeNotSupported: ERROR: Hadoop is configure ...

最新文章

  1. 数据结构之【队列】的基本操作C语言实现
  2. Visual Studio 2005常用插件搜罗
  3. wordpress网站后台打开速度很慢解决方法?
  4. 跨域加了header也解决不了?
  5. bootstrap-wysiwyg 结合 base64 解码 .net bbs 图片操作类 (三) 图片裁剪
  6. add in Web.config
  7. Vista开发兼容性概述
  8. jfreechart环形图完美实现
  9. 洛天依-元宵吃货节歌词
  10. qt 使用msvc 打断点无反应解决办法
  11. wamp phpMyAdmin error #1045 - Access denied for user root@locahost Fixed!
  12. 在华为公司的项目总结
  13. 智公网:公务员行测基础考点
  14. 106句激励自我的话
  15. 参加第一场多校大一训练赛后的感想
  16. 【tensorrt】——Network has dynamic or shape inputs, but no optimization profile has been defined.
  17. python绘制敏感性和特异性曲线(交叉)
  18. [cesium] 基于Cesium的动态泛光效果示例
  19. oracle11g-R2数据库的逻辑备份
  20. 多属性决策的权重确定方法及matlab 程序

热门文章

  1. java计算机毕业设计小区宠物管理系统源码+数据库+系统+lw文档+mybatis+运行部署
  2. 【木德木作杯楼市达人秀NO.28】南湖买房故事
  3. 将excel转成PDF导出
  4. 巴西龟饲养日志-----黑壳虾成长速度
  5. 三层网络靶场搭建MSF内网渗透
  6. 如何成为一名Java高手?月薪3K到17K,他做了什么?
  7. 接收损伤和补偿(调制解调信号,QAM)
  8. 是时候不得不学英语了,技多不压身,给自己多条路
  9. 图片下载(包含了download打开图片无法下载的方法)
  10. 计算机规划发展措施,计算机专业发展规划.doc