解决Unable to find encoder for type stored in a Dataset问题

问题描述

Error:(350, 43) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.val  userDataSet = spark.createDataset(usersForDSRDD)

出现问题的代码如下

import org.apache.spark
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Encoders, Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author:   hadoop* Email:    3165845957@qq.com* Date:     19-5-19 下午4:59* Description:**/
object RDD_Movie_Users_Analyzer {def main(args: Array[String]): Unit = {val  conf  = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case  class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval  userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样（把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)}
}

问题解决方法

首先需要引入隐式转换，然后将自定义的Case Class设置为全局变量。

import org.apache.spark
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Encoders, Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}/*** FileName: RDD_Movie_Users_Analyzer* Author:   hadoop* Email:    3165845957@qq.com* Date:     19-5-19 下午4:59* Description:**/
object RDD_Movie_Users_Analyzer {//创建case User来封装用户数据case class User(UserID:String,Gender:String,Age:String,OccupationID:String, Zip_Code:String)//创建case Rating来封装用户评分数据case  class Rating(UserID:String,MovieID:String,Rating:Double,Timestamp:String)def main(args: Array[String]): Unit = {val  conf  = new SparkConf().setMaster("local[*]").setAppName("RDD_Movie_Users_Analyzer")val spark = SparkSession.builder().config(conf).getOrCreate()import spark.implicits._val sc = spark.sparkContextsc.setLogLevel("WARN")val path = "file:///home/hadoop/movierecommend/dataset/"//user.dat UserID|age|Gender|Occuption|Zip-codeval usersRDD = sc.textFile(path + "user.dat")//movies.dat MovieId::Title::Genresval moviesRDD = sc.textFile(path+"movies.dat")//movies.dat UserID::MovieID::Rating::TimeStampval ratingsRDD = sc.textFile(path + "ratings.dat")//RDD: MovieID,Titleval movieInfo = moviesRDD.map(_.split("::")).map(x=>(x(0),x(1))).cache()//RDD: UserId,MovieId,Ratingval ratings = ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()//UserID,Genderval usersGender = usersRDD.map(_.split("\\|")).map(x=>(x(0),x(2)))dataSetOps(usersRDD,ratingsRDD,spark)spark.stop()}/*** 通过DataSet实战电影点评系统案例* @param usersRDD 用户信息RDD UserID|age|Gender|Occuption|Zip-code* @param ratingsRDD 用户电影评分数据 UserID::MovieID::Rating::TimeStamp* @param spark SparkSession*/def dataSetOps(usersRDD:RDD[String],ratingsRDD:RDD[String],spark:SparkSession): Unit ={//通过DataSet实战电影点评系统案例import spark.implicits._//将用户数据封装到User class中val usersForDSRDD = usersRDD.map(_.split("\\|")).map(line=>User(line(0).trim,line(2).trim,line(1).trim,line(3).trim,line(4).trim))//最后创建DateSetval  userDataSet = spark.createDataset(usersForDSRDD)userDataSet.show(10)//将评分数据封装到Rating class中val ratingsForDSRDD = ratingsRDD.map(_.split("::")).map(line=>Rating(line(0).trim,line(1).trim,line(2).trim.toDouble,line(3).trim))val ratingsDataSet = spark.createDataset(ratingsForDSRDD)//下面的实现代码和使用 DataFrame方法几乎完全一样（把 DataFrame换成DataSet即可)ratingsDataSet.filter(s" MovieID = 1193").join(userDataSet,"UserID").select("Gender","Age").groupBy("Gender","Age").count().orderBy($"Gender".desc,$"Age").show(10)}
}

这样问题就解决了。

更多问题可参考：http://mangocool.com/1477619031890.html

解决Unable to find encoder for type stored in a Dataset问题相关推荐

unable to find encoder for type stored in a dataset的解决方法
前言一般来说,在我们将数据读到DataFrame之后,会继续使用其他一些算子进行处理,如map,flatMap等,但是如果你直接对其调用这些算子时,可能会出现类似unable to find enc ...
spark报错 Unable to find encoder for type 你的样例类名
spark样例类为什么报错没有spark的Encoder // _ooOoo_ // o8888888o // 88" . "88 // (| -_- |) // O\ = /O ...
Unable to resolve service for type`***` while attempting to activatre `***`
Unable to resolve service for type 'XXX' while attempting to activate 'XXX'_Tiger_shl的博客- CSDN博客 Una ...
Unable to resolve service for type ‘Movies.Domin.MoviesAccount‘ while attempting to activate ‘Movies
出现的问题:Unable to resolve service for type 'Movies.Domin.MoviesAccount' while attempting to activate ' ...
解决 Illegal DefaultValue null for parameter type integer 异常
该异常是由 swagger 引起的 swagger 版本 1.9.2 解决原因:重新导入 swagger-annotations 和 swagger-models 版本为 1.5.21 pom.xm ...
ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source，装cuda黑屏问题，装cuda循环登录问题）
ubuntu14.04.5装cuda7.5记录(解决unable to locate the kernel source,装cuda黑屏问题,装cuda循环登录问题) 参考文章: (1)ubuntu1 ...
Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题
Windows下Python 3.6 + VS2017 + Anaconda 解决Unable to find vcvarsall.bat问题参考文章: (1)Windows下Python 3.6 ...
Sublime Text 解决 Unable to download XXX 问题
Sublime Text 解决 Unable to download XXX 问题参考文章: (1)Sublime Text 解决 Unable to download XXX 问题 (2)http ...
解决Unable to create group (name already exists)
解决Unable to create group (name already exists) TF在保存模型tf.keras.models.Model.save(),出现ValueError: Una ...

解决Unable to find encoder for type stored in a Dataset问题

问题描述

问题解决方法

解决Unable to find encoder for type stored in a Dataset问题相关推荐

最新文章

热门文章