解决Unable to find encoder for type stored in a Dataset问题
问题描述
Error:(350, 43) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.val userDataSet = spark.createDataset(usersForDSRDD)
出现问题的代码如下
import org.apache.spark import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Encoders, Row, SparkSession} import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author: hadoop* Email: 3165845957@qq.com* Date: 19-5-19 下午4:59* Description:**/ object RDD_Movie_Users_Analyzer {def main(args: Array[String]): Unit = {val conf = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样(把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)} }
问题解决方法
首先需要引入隐式转换,然后将自定义的Case Class设置为全局变量。
import org.apache.spark import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Encoders, Row, SparkSession} import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author: hadoop* Email: 3165845957@qq.com* Date: 19-5-19 下午4:59* Description:**/ object RDD_Movie_Users_Analyzer {//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)def main(args: Array[String]): Unit = {val conf = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样(把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)} }
这样问题就解决了。
更多问题可参考:http://mangocool.com/1477619031890.html
解决Unable to find encoder for type stored in a Dataset问题相关推荐
- unable to find encoder for type stored in a dataset的解决方法
前言 一般来说,在我们将数据读到DataFrame之后,会继续使用其他一些算子进行处理,如map,flatMap等,但是如果你直接对其调用这些算子时,可能会出现类似unable to find enc ...
- spark报错 Unable to find encoder for type 你的样例类名
spark样例类 为什么报错没有spark的Encoder // _ooOoo_ // o8888888o // 88" . "88 // (| -_- |) // O\ = /O ...
- Unable to resolve service for type`***` while attempting to activatre `***`
Unable to resolve service for type 'XXX' while attempting to activate 'XXX'_Tiger_shl的博客- CSDN博客 Una ...
- Unable to resolve service for type ‘Movies.Domin.MoviesAccount‘ while attempting to activate ‘Movies
出现的问题:Unable to resolve service for type 'Movies.Domin.MoviesAccount' while attempting to activate ' ...
- 解决 Illegal DefaultValue null for parameter type integer 异常
该异常是由 swagger 引起的 swagger 版本 1.9.2 解决原因:重新导入 swagger-annotations 和 swagger-models 版本 为 1.5.21 pom.xm ...
- ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source,装cuda黑屏问题,装cuda循环登录问题)
ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source,装cuda黑屏问题,装cuda循环登录问题) 参考文章: (1)ubuntu1 ...
- Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题
Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题 参考文章: (1)Windows下Python 3.6 ...
- Sublime Text 解决 Unable to download XXX 问题
Sublime Text 解决 Unable to download XXX 问题 参考文章: (1)Sublime Text 解决 Unable to download XXX 问题 (2)http ...
- 解决Unable to create group (name already exists)
解决Unable to create group (name already exists) TF在保存模型tf.keras.models.Model.save(),出现ValueError: Una ...
最新文章
- nagios 数据库管理之 NDOUtils
- Debian,Ubuntu下安装zsh和oh-my-zsh
- [CISCN2018]crackme-java
- TensorFlow学习笔记(二十四)自制TFRecord数据集 读取、显示及代码详解
- TCP、UDP套接字的数据传输
- sql语句分析是否走索引_MySql 的SQL执行计划查看,判断是否走索引
- 什么是CDP(连续数据保护)?
- 大整数乘法(信息学奥赛一本通-T1174)
- tomcat 使用log4j 管理日志
- python相关函数_Python 函数相关概念
- 大部头出版物排版软件
- 2014年牡丹江现场赛打铁记
- VLC2.2.4命令参数
- 游戏任务剧情布局系统分析
- 九、cadence ic 5141 ——反相器版图绘制
- python文档生成_python文档生成工具:pydoc、sphinx;django如何使用sphinx?
- 国外服务器解决域名备案问题
- installation of package ‘igraph’ had non-zero exit status的解决方案
- 一般计算机电源都在多少压力,一般计算机电源都在多少电压 计算机电源一般都在多少电压...
- 微信小程序 input事件绑定