Cast

Cast 强制类型转换发生在 Logical Plan 转成 Analyzed Logical Plan阶段,

根据表达式 override def inputTypes() 方法进行校验,然后
childrenResolved 最终和 inputTypes 进行校验

    override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {// Skip nodes who's children have not been resolved yet.case e if !e.childrenResolved => e
/*** Casts types according to the expected input types for [[Expression]]s.*/object ImplicitTypeCasts extends TypeCoercionRule {.../*** Given an expected data type, try to cast the expression and return the cast expression.** If the expression already fits the input type, we simply return the expression itself.* If the expression has an incompatible type that cannot be implicitly cast, return None.*/def implicitCast(e: Expression, expectedType: AbstractDataType): Option[Expression] = {implicitCast(e.dataType, expectedType).map { dt =>if (dt == e.dataType) e else Cast(e, dt)}}
private def implicitCast(inType: DataType, expectedType: AbstractDataType): Option[DataType] = {// Note that ret is nullable to avoid typing a lot of Some(...) in this local scope.// We wrap immediately an Option after this.@Nullable val ret: DataType = (inType, expectedType) match {// If the expected type is already a parent of the input type, no need to cast.case _ if expectedType.acceptsType(inType) => inType// Cast null type (usually from null literals) into target typescase (NullType, target) => target.defaultConcreteType// If the function accepts any numeric type and the input is a string, we follow the hive// convention and cast that input into a doublecase (StringType, NumericType) => NumericType.defaultConcreteType// Implicit cast among numeric types. When we reach here, input type is not acceptable.// If input is a numeric type but not decimal, and we expect a decimal type,// cast the input to decimal.case (d: NumericType, DecimalType) => DecimalType.forType(d)// For any other numeric types, implicitly cast to each other, e.g. long -> int, int -> longcase (_: NumericType, target: NumericType) => target// Implicit cast between date time typescase (DateType, TimestampType) => TimestampTypecase (TimestampType, DateType) => DateType// Implicit cast from/to stringcase (StringType, DecimalType) => DecimalType.SYSTEM_DEFAULTcase (StringType, target: NumericType) => targetcase (StringType, DateType) => DateTypecase (StringType, TimestampType) => TimestampTypecase (StringType, BinaryType) => BinaryType// Cast any atomic type to string.case (any: AtomicType, StringType) if any != StringType => StringType// When we reach here, input type is not acceptable for any types in this type collection,// try to find the first one we can implicitly cast.case (_, TypeCollection(types)) =>types.flatMap(implicitCast(inType, _)).headOption.orNull// Implicit cast between array types.//// Compare the nullabilities of the from type and the to type, check whether the cast of// the nullability is resolvable by the following rules:// 1. If the nullability of the to type is true, the cast is always allowed;// 2. If the nullability of the to type is false, and the nullability of the from type is// true, the cast is never allowed;// 3. If the nullabilities of both the from type and the to type are false, the cast is// allowed only when Cast.forceNullable(fromType, toType) is false.case (ArrayType(fromType, fn), ArrayType(toType: DataType, true)) =>implicitCast(fromType, toType).map(ArrayType(_, true)).orNullcase (ArrayType(fromType, true), ArrayType(toType: DataType, false)) => nullcase (ArrayType(fromType, false), ArrayType(toType: DataType, false))if !Cast.forceNullable(fromType, toType) =>implicitCast(fromType, toType).map(ArrayType(_, false)).orNull// Implicit cast between Map types.// Follows the same semantics of implicit casting between two array types.// Refer to documentation above. Make sure that both key and values// can not be null after the implicit cast operation by calling forceNullable// method.case (MapType(fromKeyType, fromValueType, fn), MapType(toKeyType, toValueType, tn))if !Cast.forceNullable(fromKeyType, toKeyType) && Cast.resolvableNullability(fn, tn) =>if (Cast.forceNullable(fromValueType, toValueType) && !tn) {null} else {val newKeyType = implicitCast(fromKeyType, toKeyType).orNullval newValueType = implicitCast(fromValueType, toValueType).orNullif (newKeyType != null && newValueType != null) {MapType(newKeyType, newValueType, tn)} else {null}}case _ => null}Option(ret)}...}

Expression 的 inputTypes 校验机制

override def checkInputDataTypes(): TypeCheckResult = {ExpectsInputTypes.checkInputDataTypes(children, inputTypes)}
object ExpectsInputTypes {def checkInputDataTypes(inputs: Seq[Expression],inputTypes: Seq[AbstractDataType]): TypeCheckResult = {val mismatches = inputs.zip(inputTypes).zipWithIndex.collect {case ((input, expected), idx) if !expected.acceptsType(input.dataType) =>s"argument ${idx + 1} requires ${expected.simpleString} type, " +s"however, '${input.sql}' is of ${input.dataType.catalogString} type."}if (mismatches.isEmpty) {TypeCheckResult.TypeCheckSuccess} else {TypeCheckResult.TypeCheckFailure(mismatches.mkString(" "))}}
}

这里以ShiftLeft 举例

/*** Bitwise left shift.** @param left the base number to shift.* @param right number of bits to left shift.*/
@ExpressionDescription(usage = "_FUNC_(base, expr) - Bitwise left shift.",examples = """Examples:> SELECT _FUNC_(2, 1);4""",since = "1.5.0")
case class ShiftLeft(left: Expression, right: Expression)extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant {override def inputTypes: Seq[AbstractDataType] =Seq(TypeCollection(IntegerType, LongType), IntegerType)override def dataType: DataType = left.dataTypeprotected override def nullSafeEval(input1: Any, input2: Any): Any = {input1 match {case l: jl.Long => l << input2.asInstanceOf[jl.Integer]case i: jl.Integer => i << input2.asInstanceOf[jl.Integer]}}override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {defineCodeGen(ctx, ev, (left, right) => s"$left << $right")}
}
zip 用法

zip为按顺序一一对应


scala> val numbers = Seq(0, 1, 2, 3, 4)
numbers: Seq[Int] = List(0, 1, 2, 3, 4)scala> val series = Seq(10, 11, 12, 13, 14)
series: Seq[Int] = List(10, 11, 12, 13, 14)scala> numbers zip series
res0: Seq[(Int, Int)] = List((0,10), (1,11), (2,12), (3,13), (4,14))scala> numbers.zip(series)
res1: Seq[(Int, Int)] = List((0,10), (1,11), (2,12), (3,13), (4,14))

如果某一个集合多余,会去掉多余的
比如:

scala> val series = Seq(10, 11, 12, 13, 14, 15)
series: Seq[Int] = List(10, 11, 12, 13, 14, 15)scala> numbers zip series
res2: Seq[(Int, Int)] = List((0,10), (1,11), (2,12), (3,13), (4,14))

Spark 之 logical plan相关推荐

  1. Spark SQL之queryExecution运行流程解析Logical Plan(三)

    1.整体运行流程 使用下列代码对SparkSQL流程进行分析,让大家明白LogicalPlan的几种状态,理解SparkSQL整体执行流程 // sc is an existing SparkCont ...

  2. 利用SparkSQL Logical Plan Parse 打造大数据平台SQL诊断利器

    前言 对一个开源项目来说,虽然各种卷,动不动去深入研究源码啥的,但是没有真正去参与开发的话,了解里头的原理又少那么点感觉.实际情况来说很少机会去参与源码的改造吧,这里我提供一些思路,就是基于源码倒腾一 ...

  3. spark sql 查看分区_Spark SQL解析查询parquet格式Hive表获取分区字段和查询条件

    首先说一下,这里解决的问题应用场景: sparksql处理Hive表数据时,判断加载的是否是分区表,以及分区表的字段有哪些?再进一步限制查询分区表必须指定分区? 这里涉及到两种情况:select SQ ...

  4. hive表指定分区字段搜索_Spark SQL解析查询parquet格式Hive表获取分区字段和查询条件...

    首先说一下,这里解决的问题应用场景: sparksql处理Hive表数据时,判断加载的是否是分区表,以及分区表的字段有哪些?再进一步限制查询分区表必须指定分区? 这里涉及到两种情况:select SQ ...

  5. 基于Spark的大规模推荐系统特征工程

    分享嘉宾:陈迪豪 第四范式 架构师 编辑整理:刘璐 出品平台:第四范式天枢.DataFunTalk 导读:特征工程在推荐系统中有着举足轻重的作用,大规模特征工程处理的效率极大的影响了推荐系统线上的性能 ...

  6. hive编程指南电子版_第三篇|Spark SQL编程指南

    在<第二篇|Spark Core编程指南>一文中,对Spark的核心模块进行了讲解.本文将讨论Spark的另外一个重要模块--Spark SQL,Spark SQL是在Shark的基础之上 ...

  7. Apache Flink vs Apache Spark——感觉二者是互相抄袭啊 看谁的好就抄过来 Flink支持在runtime中的有环数据流,这样表示机器学习算法更有效而且更有效率...

    Apache Flink是什么 Flink是一款新的大数据处理引擎,目标是统一不同来源的数据处理.这个目标看起来和Spark和类似.没错,Flink也在尝试解决 Spark在解决的问题.这两套系统都在 ...

  8. Spark SQL Catalyst源代码分析之TreeNode Library

    /** Spark SQL源代码分析系列文章*/ 前几篇文章介绍了Spark SQL的Catalyst的核心执行流程.SqlParser,和Analyzer,本来打算直接写Optimizer的,可是发 ...

  9. Spark SQL Catalyst源代码分析Optimizer

    /** Spark SQL源代码分析系列*/ 前几篇文章介绍了Spark SQL的Catalyst的核心运行流程.SqlParser,和Analyzer 以及核心类库TreeNode,本文将具体解说S ...

最新文章

  1. Gerrit代码Review入门实战
  2. POJ 3275 Ranking the Cows (floyd传递闭包)
  3. Linux内核的各个组成部分,Linux 内核的组成部分
  4. python爬虫学习:爬虫QQ说说并生成词云图,回忆满满
  5. jdbc各种数据库的连接说明
  6. 基于深度学习的图像风格转换
  7. Android进阶:性能优化篇 Android进阶:性能优化篇
  8. mybatis源码分析(1)-----sqlSessionFactory创建
  9. 微型计算机原理与接口技术朱金钧课后答案,微型计算机原理及应用技术 第3版...
  10. cad(2000坐标系)转kml
  11. win8计算机休眠的区别,win8系统的休眠和睡眠有什么区别?如何用?
  12. linux RDP 共享磁盘,USB Over Network - USB Server for Linux (RDP使用讀卡機殘念)
  13. 如何创建(设置)一个可以开发微信小游戏的appid
  14. PCIe EA (Enhanced Allocation) 介绍
  15. LRC (Lyric) 字幕
  16. 基于照片标记的广州市旅游流特征简单分析(上)
  17. Eclipse 下载带有WTP 相应插件板本的地址
  18. 0基础自学软件测试的渠道你知道哪些?
  19. 举例说明协方差矩阵的运算
  20. http://www.makepic.com/print.php,在线制作印章

热门文章

  1. Linux信息收集常用脚本
  2. 10 款牛哄哄的 Chrome 插件
  3. 钉钉企业内部应用开发php,钉钉企业内部应用开发心得
  4. 《分布式与云计算》课程笔记——2.2 Distributed Systems:P2P
  5. MySql保留两位小数
  6. PHPCMS V9学习笔记(配置): 后台界面模板在线编辑
  7. webpack css loader
  8. 支付宝当面付打shang系统源码分享
  9. ExtJS教程(5)---Ext.data.Model之高级应用
  10. LINQ教程一:LINQ简介