scala学习-scala读取Hbase表中数据并且做join连接查询

1。业务需求：sparkSQL on hbase ,sparkSQL直接读取Hbase中的两个表，进行连接查询。
2。图示

绿色的线
上图中绿色的线是做过测试的，直接在hive中建表，然后load数据进去，数据文件是存储在HDFS上的。
（1）建表

create table mycase(
c_code string,
c_rcode string,
c_region string,
c_cate string,
c_start string,
c_end string,
c_start_m bigint,
c_end_m bigint,
c_name string,
c_mark string)
row format delimited fields terminated by ',' stored as textfile; load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/mycase.txt' overwrite into table mycase; create table p_case(
p_code string,
p_status string,
p_isend int
)
row format delimited fields terminated by ',' stored as textfile;load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/p_case.txt' overwrite into table p_case; create table crime_man(
m_acode string,
m_pcode string)
row format delimited fields terminated by ',' stored as textfile;load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/crime_man.txt' overwrite into table crime_man; create table wb(
w_id bigint,
w_region string,
w_wname string,
w_address string,
w_uname string,
w_code string,
w_start string,
w_end string,
w_start_m bigint,
w_end_m bigint
)
row format delimited fields terminated by ',' stored as textfile; load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/wbfile.txt' overwrite into table wb;  create table hotel(
h_id bigint,
h_region string,
h_hname string,
h_address string,
h_uname string,
h_code string,
h_start string,
h_end string,
h_start_m bigint,
h_end_m bigint,
h_homecode string)
row format delimited fields terminated by ',' stored as textfile; load data local inpath '/opt/moudles/spark-2.2.0-bin-hadoop2.7.data/data100/hotelfile.txt' overwrite into table hotel;

（2）添加数据
mycase.txt

A0,7,杭州市萧山区,杀人案件,2006/06/23 00:00:00,2006/06/23 21:00:00,1150992000000,1151067600000,案件名称0,暂无
A1,0,杭州市其他区,刑事案件,2006/06/25 00:00:00,2006/06/25 09:00:00,1151164800000,1151197200000,案件名称1,暂无
A2,1,杭州市上城区,强奸案件,2006/06/28 00:00:00,2006/06/28 10:00:00,1151424000000,1151460000000,案件名称2,暂无
A3,7,杭州市萧山区,杀人案件,2006/07/02 00:00:00,2006/07/02 01:00:00,1151769600000,1151773200000,案件名称3,暂无
A4,0,杭州市其他区,盗窃案件,2006/07/05 00:00:00,2006/07/05 16:00:00,1152028800000,1152086400000,案件名称4,暂无
A5,5,杭州市西湖区,强奸案件,2006/07/06 00:00:00,2006/07/06 21:00:00,1152115200000,1152190800000,案件名称5,暂无
A6,3,杭州市拱墅区,杀人案件,2006/07/06 00:00:00,2006/07/06 16:00:00,1152115200000,1152172800000,案件名称6,暂无
A7,3,杭州市拱墅区,杀人案件,2006/07/08 00:00:00,2006/07/08 10:00:00,1152288000000,1152324000000,案件名称7,暂无
A8,3,杭州市拱墅区,盗窃案件,2006/07/10 00:00:00,2006/07/10 02:00:00,1152460800000,1152468000000,案件名称8,暂无
A9,4,杭州市江干区,盗窃案件,2006/07/14 00:00:00,2006/07/14 13:00:00,1152806400000,1152853200000,案件名称9,暂无
A10,4,杭州市江干区,强奸案件,2006/07/17 00:00:00,2006/07/17 00:00:00,1153065600000,1153065600000,案件名称10,暂无
A11,1,杭州市上城区,杀人案件,2006/07/21 00:00:00,2006/07/21 21:00:00,1153411200000,1153486800000,案件名称11,暂无
A12,3,杭州市拱墅区,强奸案件,2006/07/21 00:00:00,2006/07/21 16:00:00,1153411200000,1153468800000,案件名称12,暂无
A13,7,杭州市萧山区,杀人案件,2006/07/21 00:00:00,2006/07/21 21:00:00,1153411200000,1153486800000,案件名称13,暂无
A14,4,杭州市江干区,盗窃案件,2006/07/23 00:00:00,2006/07/23 08:00:00,1153584000000,1153612800000,案件名称14,暂无
A15,2,杭州市下城区,盗窃案件,2006/07/26 00:00:00,2006/07/26 01:00:00,1153843200000,1153846800000,案件名称15,暂无
A16,3,杭州市拱墅区,刑事案件,2006/07/28 00:00:00,2006/07/28 10:00:00,1154016000000,1154052000000,案件名称16,暂无
A17,0,杭州市其他区,杀人案件,2006/07/28 00:00:00,2006/07/28 06:00:00,1154016000000,1154037600000,案件名称17,暂无
A18,0,杭州市其他区,刑事案件,2006/08/01 00:00:00,2006/08/01 15:00:00,1154361600000,1154415600000,案件名称18,暂无
A19,4,杭州市江干区,盗窃案件,2006/08/01 00:00:00,2006/08/01 20:00:00,1154361600000,1154433600000,案件名称19,暂无
A20,8,杭州市余杭区,杀人案件,2006/08/04 00:00:00,2006/08/04 06:00:00,1154620800000,1154642400000,案件名称20,暂无

p_case.txt

A0,移送起诉
A1,破案状态
A2,移送起诉
A3,破案状态
A4,移送起诉
A5,破案状态
A6,移送起诉
A7,移送起诉
A8,破案状态
A9,侦查终结
A10,侦查终结
A11,破案状态
A12,侦查终结
A13,破案状态
A14,移送起诉
A15,破案状态
A16,破案状态
A17,侦查终结
A18,移送起诉
A19,破案状态
A20,侦查终结

crime_man.txt

A0,U0
A0,U1
A1,U0
A1,U1
A1,U2
A1,U3
A1,U4
A1,U5
A1,U6
A1,U7
A1,U8
A2,U0
A2,U1
A2,U2
A2,U3
A2,U4
A2,U5
A2,U6
A3,U0
A3,U1
A4,U0
A4,U1
A4,U2
A4,U3
A5,U0
A6,U0
A6,U1
A6,U2
A6,U3
A6,U4
A6,U5
A6,U6
A7,U0
A8,U0
A8,U1
A8,U2
A8,U3
A8,U4
A8,U5
A9,U0
A9,U1
A10,U0
A10,U1
A10,U2
A10,U3
A10,U4
A10,U5
A10,U6
A11,U0
A11,U1
A11,U2
A11,U3
A12,U0
A13,U0
A13,U1
A13,U2
A13,U3
A13,U4
A13,U5
A13,U6
A13,U7
A13,U8
A14,U0
A14,U1
A14,U2
A14,U3
A14,U4
A14,U5
A14,U6
A14,U7
A15,U0
A15,U1
A15,U2
A15,U3
A16,U0
A16,U1
A17,U0
A17,U1
A17,U2
A17,U3
A17,U4
A18,U0
A18,U1
A19,U0
A19,U1
A19,U2
A19,U3
A19,U4
A19,U5
A19,U6
A20,U0
A20,U1
A20,U2
A20,U3
A20,U4
A20,U5
A20,U6
A20,U7
A20,U8

wbfile.txt

0,1,网吧583,杭州市上城区xx670路280号,姓名58,U86,2006/06/23 00:00:00,2006/06/23 19:00:00,1150992000000,1151060400000
1,0,网吧757,杭州市其他区xx570路266号,姓名55,U636,2006/06/23 00:00:00,2006/06/23 19:00:00,1150992000000,1151060400000
2,0,网吧283,杭州市其他区xx332路89号,姓名30,U793,2006/06/24 00:00:00,2006/06/24 19:00:00,1151078400000,1151146800000
3,3,网吧129,杭州市拱墅区xx662路713号,姓名33,U570,2006/06/27 00:00:00,2006/06/27 04:00:00,1151337600000,1151352000000
4,8,网吧434,杭州市余杭区xx975路721号,姓名59,U766,2006/06/29 00:00:00,2006/06/29 18:00:00,1151510400000,1151575200000
5,4,网吧80,杭州市江干区xx959路481号,姓名80,U318,2006/07/01 00:00:00,2006/07/01 18:00:00,1151683200000,1151748000000
6,6,网吧611,杭州市滨江区xx853路84号,姓名18,U220,2006/07/03 00:00:00,2006/07/03 19:00:00,1151856000000,1151924400000
7,1,网吧913,杭州市上城区xx560路157号,姓名56,U5,2006/07/03 00:00:00,2006/07/03 06:00:00,1151856000000,1151877600000
8,7,网吧684,杭州市萧山区xx754路827号,姓名34,U233,2006/07/07 00:00:00,2006/07/07 16:00:00,1152201600000,1152259200000
9,4,网吧545,杭州市江干区xx765路502号,姓名66,U167,2006/07/09 00:00:00,2006/07/09 21:00:00,1152374400000,1152450000000
10,2,网吧661,杭州市下城区xx690路657号,姓名96,U380,2006/07/09 00:00:00,2006/07/09 04:00:00,1152374400000,1152388800000
11,8,网吧928,杭州市余杭区xx61路688号,姓名90,U386,2006/07/12 00:00:00,2006/07/12 23:00:00,1152633600000,1152716400000
12,0,网吧979,杭州市其他区xx618路41号,姓名40,U378,2006/07/13 00:00:00,2006/07/13 09:00:00,1152720000000,1152752400000
13,1,网吧139,杭州市上城区xx666路869号,姓名97,U685,2006/07/13 00:00:00,2006/07/13 07:00:00,1152720000000,1152745200000
14,7,网吧109,杭州市萧山区xx558路485号,姓名32,U884,2006/07/15 00:00:00,2006/07/15 02:00:00,1152892800000,1152900000000
15,3,网吧866,杭州市拱墅区xx738路6号,姓名51,U629,2006/07/18 00:00:00,2006/07/18 09:00:00,1153152000000,1153184400000
16,0,网吧330,杭州市其他区xx251路887号,姓名79,U239,2006/07/22 00:00:00,2006/07/22 17:00:00,1153497600000,1153558800000
17,7,网吧138,杭州市萧山区xx385路448号,姓名57,U690,2006/07/22 00:00:00,2006/07/22 14:00:00,1153497600000,1153548000000
18,0,网吧816,杭州市其他区xx61路99号,姓名62,U137,2006/07/26 00:00:00,2006/07/26 01:00:00,1153843200000,1153846800000
19,5,网吧147,杭州市西湖区xx612路924号,姓名40,U569,2006/07/28 00:00:00,2006/07/28 17:00:00,1154016000000,1154077200000
20,0,网吧509,杭州市其他区xx569路234号,姓名54,U361,2006/07/30 00:00:00,2006/07/30 12:00:00,1154188800000,1154232000000

hotelfile.txt

1,5,宾馆598,杭州市西湖区xx268路894号,姓名38,U225,2006/06/24 00:00:00,2006/06/24 00:19:00,1151078400000,1151079540000,13
2,3,宾馆758,杭州市拱墅区xx480路729号,姓名92,U651,2006/06/25 00:00:00,2006/06/25 00:01:00,1151164800000,1151164860000,227
3,7,宾馆499,杭州市萧山区xx173路827号,姓名18,U329,2006/06/26 00:00:00,2006/06/26 00:04:00,1151251200000,1151251440000,794
4,7,宾馆478,杭州市萧山区xx620路622号,姓名57,U314,2006/06/27 00:00:00,2006/06/27 00:11:00,1151337600000,1151338260000,65
5,3,宾馆692,杭州市拱墅区xx165路624号,姓名15,U399,2006/06/28 00:00:00,2006/06/28 00:07:00,1151424000000,1151424420000,895
6,2,宾馆31,杭州市下城区xx635路833号,姓名60,U606,2006/06/29 00:00:00,2006/06/29 00:07:00,1151510400000,1151510820000,174
7,4,宾馆198,杭州市江干区xx622路536号,姓名71,U158,2006/06/29 00:00:00,2006/06/29 00:00:00,1151510400000,1151510400000,517
8,8,宾馆390,杭州市余杭区xx328路848号,姓名36,U27,2006/06/30 00:00:00,2006/06/30 00:11:00,1151596800000,1151597460000,670
9,4,宾馆398,杭州市江干区xx53路761号,姓名59,U624,2006/06/30 00:00:00,2006/06/30 00:01:00,1151596800000,1151596860000,878
10,0,宾馆1,杭州市其他区xx715路756号,姓名3,U703,2006/07/01 00:00:00,2006/07/01 00:00:00,1151683200000,1151683200000,898
11,4,宾馆53,杭州市江干区xx813路302号,姓名24,U226,2006/07/01 00:00:00,2006/07/01 00:10:00,1151683200000,1151683800000,983
12,8,宾馆718,杭州市余杭区xx911路813号,姓名1,U548,2006/07/01 00:00:00,2006/07/01 00:20:00,1151683200000,1151684400000,575
13,5,宾馆553,杭州市西湖区xx641路69号,姓名33,U265,2006/07/01 00:00:00,2006/07/01 00:06:00,1151683200000,1151683560000,122
14,4,宾馆179,杭州市江干区xx661路224号,姓名34,U262,2006/07/01 00:00:00,2006/07/01 00:17:00,1151683200000,1151684220000,131
15,4,宾馆582,杭州市江干区xx417路704号,姓名19,U813,2006/07/01 00:00:00,2006/07/01 00:23:00,1151683200000,1151684580000,0
16,8,宾馆895,杭州市余杭区xx527路341号,姓名80,U362,2006/07/02 00:00:00,2006/07/02 00:15:00,1151769600000,1151770500000,11
17,1,宾馆6,杭州市上城区xx62路637号,姓名35,U434,2006/07/02 00:00:00,2006/07/02 00:07:00,1151769600000,1151770020000,939
18,0,宾馆889,杭州市其他区xx943路239号,姓名46,U614,2006/07/02 00:00:00,2006/07/02 00:16:00,1151769600000,1151770560000,565
19,6,宾馆322,杭州市滨江区xx430路162号,姓名71,U911,2006/07/02 00:00:00,2006/07/02 00:10:00,1151769600000,1151770200000,542
20,4,宾馆491,杭州市江干区xx529路615号,姓名63,U911,2006/07/03 00:00:00,2006/07/03 00:09:00,1151856000000,1151856540000,385

（3）执行hive元数据命令

[root@bigdata01 ~]# hive --service metastore

（4）执行sparkSQL命令行

bin/spark-sql --master yarn-client --executor-memory 80g --conf spark.sql.warehouse.dir=hdfs://bigdata01.hzjs.co:8020/user/sparksql --conf spark.driver.maxResultSize=10g

（5）测试sql语句

案件发生区域内 2017年 盗窃案件 区域为3的 同时出现在网吧和宾馆的人 时间在两天内的
select c_rcode,c_code,c_name,c_region,p_status,h_region,h_hname,h_uname,h_code,w_region,w_wname,w_uname,w_code from mycase  left join p_case on mycase.c_code=p_case.p_code  left join hotel on mycase.c_rcode=hotel.h_region  left join wb on mycase.c_rcode=wb.w_region  where p_status !='破案状态' and c_cate='盗窃案件' and c_rcode = '3' and  3200000000 < c_start_m and c_start_m < 1514736000000  and h_code=w_code  and    ( c_start_m - 86400000 * 10 )< w_start_m  and   w_end_m < ( c_start_m + 86400000 * 10 ) and   ( c_start_m - 86400000 * 10 )< h_start_m  and   h_end_m < ( c_start_m + 86400000 * 10 ) ;

（6）执行结果

Time taken: 25.288 seconds, Fetched 25 row(s)

蓝色的线
蓝色的线只需要在建表的时候在hive里建立外部表，表指向Hbase中的一个表就可以了

create external table test_lcc_person (rowkey string,'name' string,'sex' string,'age' string) row format delimited fields terminated by '\t' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,lcc_liezu:name,lcc_liezu:sex,lcc_liezu:age") TBLPROPERTIES ("hbase.table.name" = "test_lcc_person");

test_lcc_person两处的名字要相同，该命令在hive命令行中执行

3。思路：你读出来的数据hbaseRDD通过transform转成dataframe,然后register 成table，再join，再save不就行呢？

4。做java的不会，只能读取一个表，还不会转换

package com.lcc.spark.hbase.test;import java.io.IOException;
import java.util.ArrayList;
import java.util.List;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Row;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.protobuf.generated.ClientProtos;
import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;import scala.Tuple2;public class SparkOnHbase {public static void main(String[] args) throws Exception {// TODO Auto-generated method stubSystem.setProperty("hadoop.home.dir", "E:\\02-hadoop\\hadoop-2.7.3\\");System.setProperty("HADOOP_USER_NAME", "root"); System.setProperty("HADOOP_USER_NAME", "root"); // System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer");SparkConf conf = new SparkConf();conf.setAppName("LG_CALCULATE");conf.setMaster("local");JavaSparkContext context = new JavaSparkContext(conf);Configuration configuration = HBaseConfiguration.create();  configuration.set("hbase.zookeeper.property.clientPort", "2181");  configuration.set("hbase.zookeeper.quorum", "192.168.10.82");  //configuration.set("hbase.master", "192.168.10.82:60000");  Scan scan = new Scan();String tableName = "test_lcc_person";configuration.set(TableInputFormat.INPUT_TABLE, tableName);ClientProtos.Scan proto = ProtobufUtil.toScan(scan);String ScanToString = Base64.encodeBytes(proto.toByteArray());configuration.set(TableInputFormat.SCAN, ScanToString);JavaPairRDD<ImmutableBytesWritable, Result> myRDD = context.newAPIHadoopRDD(configuration,TableInputFormat.class, ImmutableBytesWritable.class, Result.class);System.out.println(myRDD.count());myRDD.foreach(new VoidFunction<Tuple2<ImmutableBytesWritable,Result>>(){@Overridepublic void call(Tuple2<ImmutableBytesWritable, Result> tuple)throws Exception {Result result = tuple._2();String rowkey = Bytes.toString(result.getRow());String name = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("name")));String sex = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("sex")));String age = Bytes.toString(result.getValue(Bytes.toBytes("lcc_liezu"), Bytes.toBytes("age")));System.out.print(rowkey);System.out.print("\t");System.out.print(name);System.out.print("\t");System.out.print(sex);System.out.print("\t");System.out.print(age);System.out.println("\t");}});}}

5。采用scala学习

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}object Test {def main(args: Array[String]): Unit = {// 本地模式运行,便于测试val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")// 创建hbase configurationval hBaseConf = HBaseConfiguration.create()hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")// 创建 spark contextval sc = new SparkContext(sparkConf)val sqlContext = new SQLContext(sc)import sqlContext.implicits._// 从数据源获取数据val hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])// 将数据映射为表  也就是将 RDD转化为 dataframe schemaval shop = hbaseRDD.map(r=>(Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age"))))).toDF("name","sex","age")shop.registerTempTable("shop")// 测试val df2 = sqlContext.sql("SELECT * FROM shop")println(df2.count())df2.collect().foreach(print(_))//df2.foreach(println)}}

输出结果：

[梁川川1,男,12][梁川川2,男,12][梁川川3,男,12][梁川川4,男,12][梁川川5,男,12][梁川川6,男,12][梁川川7,男,17]

证明读出来的数据hbaseRDD通过transform转成dataframe,然后register 成table 这个想法是正确的。

6。试试两个表读取试试

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}object Test {def main(args: Array[String]): Unit = {// 本地模式运行,便于测试val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")// 创建hbase configurationval hBaseConf = HBaseConfiguration.create()hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")// 创建 spark contextval sc = new SparkContext(sparkConf)val sqlContext = new SQLContext(sc)import sqlContext.implicits._// 从数据源获取数据val hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])// 将数据映射为表  也就是将 RDD转化为 dataframe schemaval shop = hbaseRDD.map(r=>(Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age"))))).toDF("name","sex","age")shop.registerTempTable("shop")// 测试val df2 = sqlContext.sql("SELECT * FROM shop")println(df2.count())df2.collect().foreach(print(_))//df2.foreach(println)// 创建hbase configurationval hBaseConf2 = HBaseConfiguration.create()hBaseConf2.set("hbase.zookeeper.property.clientPort", "2181");  hBaseConf2.set("hbase.zookeeper.quorum", "192.168.10.82"); hBaseConf2.set(TableInputFormat.INPUT_TABLE,"test_lcc_card")// 创建 spark contextval sc2 = new SparkContext(sparkConf)val sqlContext2 = new SQLContext(sc2)import sqlContext.implicits._// 从数据源获取数据val hbaseRDD2 = sc2.newAPIHadoopRDD(hBaseConf2,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])// 将数据映射为表  也就是将 RDD转化为 dataframe schemaval card = hbaseRDD.map(r=>(Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("code"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("money"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("time"))))).toDF("code","money","time")card.registerTempTable("mycard")// 测试val df3 = sqlContext.sql("SELECT * FROM mycard")println(df3.count())df3.collect().foreach(print(_))}}

但是结果报错

[梁川川1,男,12][梁川川2,男,12][梁川川3,男,12][梁川川4,男,12][梁川川5,男,12][梁川川6,男,12][梁川川7,男,17]Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
Test$.main(Test.scala:25)
Test.main(Test.scala)at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2285)

只能有一个SparkContext 存在。

7。改进以下程序

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}object Test {def main(args: Array[String]): Unit = {// 本地模式运行,便于测试val sparkConf = new SparkConf().setMaster("local").setAppName("HBaseTest")// 创建hbase configurationval hBaseConf = HBaseConfiguration.create()hBaseConf.set("hbase.zookeeper.property.clientPort", "2181");  hBaseConf.set("hbase.zookeeper.quorum", "192.168.10.82"); //var con = ConnectionFactory.createConnection(hBaseConf)//var table = con.getTable(TableName.valueOf(""))hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_person")// 创建 spark contextval sc = new SparkContext(sparkConf)val sqlContext = new SQLContext(sc)import sqlContext.implicits._// 从数据源获取数据var hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])// 将数据映射为表  也就是将 RDD转化为 dataframe schemaval shop = hbaseRDD.map(r=>(Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("id"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("name"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("sex"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezu"),Bytes.toBytes("age"))))).toDF("id","name","sex","age")shop.registerTempTable("shop")// 测试val df2 = sqlContext.sql("SELECT * FROM shop")println(df2.count())df2.collect().foreach(print(_))//df2.foreach(println)hBaseConf.set(TableInputFormat.INPUT_TABLE,"test_lcc_card")hbaseRDD = sc.newAPIHadoopRDD(hBaseConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])// 将数据映射为表  也就是将 RDD转化为 dataframe schemaval card = hbaseRDD.map(r=>(Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("ids"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("code"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("money"))),Bytes.toString(r._2.getValue(Bytes.toBytes("lcc_liezus"),Bytes.toBytes("time"))))).toDF("ids","code","money","time")card.registerTempTable("mycard")// 测试val df3 = sqlContext.sql("SELECT * FROM mycard")println(df3.count())df3.collect().foreach(print(_))// 测试val df4 = sqlContext.sql("SELECT * FROM shop inner join mycard on id=ids")println(df4.count())df4.collect().foreach(println(_))}}

测试结果

[7,梁川川7,男,17,7,7777,7777,2015-10-11]
[3,梁川川3,男,12,3,3333,333,2015-10-11]
[5,梁川川5,男,12,5,55,55,2015-10-11]
[6,梁川川6,男,12,6,6666,6666,2015-10-11]
[1,梁川川1,男,12,1,1111111111,1111111,2015-10-11]
[4,梁川川4,男,12,4,444,444,2015-10-11]
[2,梁川川2,男,12,2,22222,22222,2015-10-11]

测试成功

scala学习-scala读取Hbase表中数据并且做join连接查询相关推荐

myeclipse读取mysql表中数据_在myeclipse中连接mysql查询数据
package com.ynu.www; import java.sql.*; public class ComInfluence { /** * @param args */ // 成功加载后,会将 ...
mysql支不支持fulljoin_mysql不支持full join的另一种解决办法和根据多个表中的相同分组来连接查询...
先看两张表: 1. user表: 2. animal表: 现在我想要查询各省市区对应的人名和动物名,即根据省市区来连接两张表. 考虑到user表中有的省市区可能在animal表中没有,animal表中 ...
spark数据查询语句select_sparksql读取hive表中数据
文章目录 spark sql与hive本地调试 new HiveContext空指针异常权限: 异常执行select查询的时候找不到host spark sql与hive本地调试将hive-si ...
TF学习——TF数据读取：TensorFlow中数据读这三张图片的5个epoch +把读取的结果重新存到read 文件夹中
TF学习--TF数据读取:TensorFlow中数据读这三张图片的5个epoch +把读取的结果重新存到read 文件夹中目录实验展示代码实现实验展示代码实现 1.如果设置shuffle为T ...
cockroachdb mysql_CockroachDB学习笔记——[译]CockroachDB中的SQL：映射表中数据到键值存储...
CockroachDB学习笔记--[译]CockroachDB中的SQL:映射表中数据到键值存储原文标题:SQL in CockroachDB: Mapping Table Data to Key- ...
spark学习 Java版SparkSQL程序读取Hbase表注册成表SQL查询
参考: spark学习-SparkSQL–11-scala版写的SparkSQL程序读取Hbase表注册成表SQL查询 http://blog.csdn.net/qq_21383435/article ...
php 去掉无关数据,php 读取 mysql 表中的double数据，去掉多余的0
php 读取 mysql 表中的double数据,去掉多余的0 mysql 中有 ttt 表结构和数据如下 CREATE TABLE `ttt` ( `id` int(11) NOT NULL AUT ...
oracle复制另一个字段,【学习笔记】Oracle存储过程表中列不同时动态复制表中数据到另一个表中...
天萃荷净分享一篇关于Oracle存储过程实现表之间数据复制功能.两表中列不同,动态的将一表中的数据复制到另一个表中案例因为要用到回收站功能,删除一条记录,要先放到一个delete表中,以便以后恢复 ...
Delphi读取并用ListView打印输出Excel表中数据
Delphi7读取并用ListView打印输出Excel表中数据我自己也是才开始学的,想做这么个功能在网上看了蛮多案例,感觉缺了很多细节,很多案例不能说是差不多吧,完全就是一模一样,属于是你抄我我抄 ...

scala学习-scala读取Hbase表中数据并且做join连接查询

scala学习-scala读取Hbase表中数据并且做join连接查询相关推荐

最新文章

热门文章