问题描述

Error:(350, 43) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.val  userDataSet = spark.createDataset(usersForDSRDD)

出现问题的代码如下

import org.apache.spark
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Encoders, Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author:   hadoop* Email:    3165845957@qq.com* Date:     19-5-19 下午4:59* Description:**/
object RDD_Movie_Users_Analyzer {def main(args: Array[String]): Unit = {val  conf  = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case  class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval  userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样(把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)}
}

问题解决方法

首先需要引入隐式转换,然后将自定义的Case Class设置为全局变量。

import org.apache.spark
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Encoders, Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author:   hadoop* Email:    3165845957@qq.com* Date:     19-5-19 下午4:59* Description:**/
object RDD_Movie_Users_Analyzer {//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case  class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)def main(args: Array[String]): Unit = {val  conf  = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval  userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样(把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)}
}

这样问题就解决了。

更多问题可参考:http://mangocool.com/1477619031890.html

解决Unable to find encoder for type stored in a Dataset问题相关推荐

  1. unable to find encoder for type stored in a dataset的解决方法

    前言 一般来说,在我们将数据读到DataFrame之后,会继续使用其他一些算子进行处理,如map,flatMap等,但是如果你直接对其调用这些算子时,可能会出现类似unable to find enc ...

  2. spark报错 Unable to find encoder for type 你的样例类名

    spark样例类 为什么报错没有spark的Encoder // _ooOoo_ // o8888888o // 88" . "88 // (| -_- |) // O\ = /O ...

  3. Unable to resolve service for type`***` while attempting to activatre `***`

    Unable to resolve service for type 'XXX' while attempting to activate 'XXX'_Tiger_shl的博客- CSDN博客 Una ...

  4. Unable to resolve service for type ‘Movies.Domin.MoviesAccount‘ while attempting to activate ‘Movies

    出现的问题:Unable to resolve service for type 'Movies.Domin.MoviesAccount' while attempting to activate ' ...

  5. 解决 Illegal DefaultValue null for parameter type integer 异常

    该异常是由 swagger 引起的 swagger 版本 1.9.2 解决原因:重新导入 swagger-annotations 和 swagger-models 版本 为 1.5.21 pom.xm ...

  6. ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source,装cuda黑屏问题,装cuda循环登录问题)

    ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source,装cuda黑屏问题,装cuda循环登录问题) 参考文章: (1)ubuntu1 ...

  7. Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题

    Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题 参考文章: (1)Windows下Python 3.6 ...

  8. Sublime Text 解决 Unable to download XXX 问题

    Sublime Text 解决 Unable to download XXX 问题 参考文章: (1)Sublime Text 解决 Unable to download XXX 问题 (2)http ...

  9. 解决Unable to create group (name already exists)

    解决Unable to create group (name already exists) TF在保存模型tf.keras.models.Model.save(),出现ValueError: Una ...

最新文章

  1. nagios 数据库管理之 NDOUtils
  2. Debian,Ubuntu下安装zsh和oh-my-zsh
  3. [CISCN2018]crackme-java
  4. TensorFlow学习笔记(二十四)自制TFRecord数据集 读取、显示及代码详解
  5. TCP、UDP套接字的数据传输
  6. sql语句分析是否走索引_MySql 的SQL执行计划查看,判断是否走索引
  7. 什么是CDP(连续数据保护)?
  8. 大整数乘法(信息学奥赛一本通-T1174)
  9. tomcat 使用log4j 管理日志
  10. python相关函数_Python 函数相关概念
  11. 大部头出版物排版软件
  12. 2014年牡丹江现场赛打铁记
  13. VLC2.2.4命令参数
  14. 游戏任务剧情布局系统分析
  15. 九、cadence ic 5141 ——反相器版图绘制
  16. python文档生成_python文档生成工具:pydoc、sphinx;django如何使用sphinx?
  17. 国外服务器解决域名备案问题
  18. installation of package ‘igraph’ had non-zero exit status的解决方案
  19. 一般计算机电源都在多少压力,一般计算机电源都在多少电压 计算机电源一般都在多少电压...
  20. 微信小程序 input事件绑定

热门文章

  1. bzoj 1226 学校食堂
  2. vmware设置共享文件夹
  3. 城堡迷阵,51nod1527,贪心
  4. 被动语态和非谓语区别_判断非谓语动词是否用被动式的三个技巧
  5. ZooKeeper操作(包括命令行和API的使用)
  6. TF-IDF算法解析与Python实现
  7. JavaScript权威指南读书笔记——JavaScript的扩展
  8. Java中使用SMTP协议发送电子邮件
  9. 9.3 开发经验和屁股的关系——《逆袭大学》连载
  10. ad中电容用什么封装_何为无极性电容?是干什么用的?