常用方法

show

scala> emp.show()
+----+------+-----+------+----------+---------+----+------+
|comm|deptno|empno| ename|  hiredate|      job| mgr|   sal|
+----+------+-----+------+----------+---------+----+------+
|null|    20| 7369| SMITH|1980-12-17|    CLERK|7902| 800.0|
| 300|    30| 7499| ALLEN|1981-02-20| SALESMAN|7698|1600.5|
| 500|    30| 7521|  WARD|1981-02-22| SALESMAN|7698|1250.0|
|null|    20| 7566| JONES|1981-04-02|  MANAGER|7839|2975.0|
|1400|    30| 7654|MARTIN|1981-09-28| SALESMAN|7698|1250.0|
|null|    30| 7698| BLAKE|1981-05-01|  MANAGER|7839|2850.0|
|null|    10| 7782| CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    20| 7788| SCOTT|1987-04-19|  ANALYST|7566|3000.0|
|null|    10| 7839|  KING|1981-11-17|PRESIDENT|null|5000.0|
|   0|    30| 7844|TURNER|1981-09-08| SALESMAN|7698|1500.0|
|null|    20| 7876| ADAMS|1987-05-23|    CLERK|7788|1100.0|
|null|    30| 7900| JAMES|1981-12-03|    CLERK|7698| 950.0|
|null|    20| 7902|  FORD|1981-12-02|  ANALYST|7566|3000.0|
|null|    10| 7934|MILLER|1982-01-23|    CLERK|7369|1300.0|
+----+------+-----+------+----------+---------+----+------+scala> emp.show(3)
+----+------+-----+-----+----------+--------+----+------+
|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|
+----+------+-----+-----+----------+--------+----+------+
|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|
| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
| 500|    30| 7521| WARD|1981-02-22|SALESMAN|7698|1250.0|
+----+------+-----+-----+----------+--------+----+------+
only showing top 3 rows

collect

scala> emp.collect
res8: Array[org.apache.spark.sql.Row] = Array([null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0], [300,30,7499,ALLEN,1981-02-20,SALESMAN,7698,1600.5], [500,30,7521,WARD,1981-02-22,SALESMAN,7698,1250.0], [null,20,7566,JONES,1981-04-02,MANAGER,7839,2975.0], [1400,30,7654,MARTIN,1981-09-28,SALESMAN,7698,1250.0], [null,30,7698,BLAKE,1981-05-01,MANAGER,7839,2850.0], [null,10,7782,CLARK,1981-06-09,MANAGER,7839,2450.0], [null,20,7788,SCOTT,1987-04-19,ANALYST,7566,3000.0], [null,10,7839,KING,1981-11-17,PRESIDENT,null,5000.0], [0,30,7844,TURNER,1981-09-08,SALESMAN,7698,1500.0], [null,20,7876,ADAMS,1987-05-23,CLERK,7788,1100.0], [null,30,7900,JAMES,1981-12-03,CLERK,7698,950.0], [null,20,7902,FORD,1981-12-02,ANALYST,7566,3000.0], [null,10,7934,MILLER,1982-01-23,CLERK,7369,1300.0])

collectAsList

scala> emp.collectAsList
res9: java.util.List[org.apache.spark.sql.Row] = [[null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0], [300,30,7499,ALLEN,1981-02-20,SALESMAN,7698,1600.5], [500,30,7521,WARD,1981-02-22,SALESMAN,7698,1250.0], [null,20,7566,JONES,1981-04-02,MANAGER,7839,2975.0], [1400,30,7654,MARTIN,1981-09-28,SALESMAN,7698,1250.0], [null,30,7698,BLAKE,1981-05-01,MANAGER,7839,2850.0], [null,10,7782,CLARK,1981-06-09,MANAGER,7839,2450.0], [null,20,7788,SCOTT,1987-04-19,ANALYST,7566,3000.0], [null,10,7839,KING,1981-11-17,PRESIDENT,null,5000.0], [0,30,7844,TURNER,1981-09-08,SALESMAN,7698,1500.0], [null,20,7876,ADAMS,1987-05-23,CLERK,7788,1100.0], [null,30,7900,JAMES,1981-12-03,CLERK,7698,950.0], [null,20,7902,FORD,1981-12-02,ANALYST,7566,3000.0], [null,10,7934,MILLER,1982-01-23,CLERK,7369,1300.0]]

first等

scala> emp.first
res10: org.apache.spark.sql.Row = [null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0]scala> emp.head
res11: org.apache.spark.sql.Row = [null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0]scala> emp.head(2)
res12: Array[org.apache.spark.sql.Row] = Array([null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0], [300,30,7499,ALLEN,1981-02-20,SALESMAN,7698,1600.5])scala> emp.take(2)
res13: Array[org.apache.spark.sql.Row] = Array([null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0], [300,30,7499,ALLEN,1981-02-20,SALESMAN,7698,1600.5])scala> emp.takeAsList(2)
res14: java.util.List[org.apache.spark.sql.Row] = [[null,20,7369,SMITH,1980-12-17,CLERK,7902,800.0], [300,30,7499,ALLEN,1981-02-20,SALESMAN,7698,1600.5]]

where

scala> emp.where("deptno=10").show
+----+------+-----+------+----------+---------+----+------+
|comm|deptno|empno| ename|  hiredate|      job| mgr|   sal|
+----+------+-----+------+----------+---------+----+------+
|null|    10| 7782| CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    10| 7839|  KING|1981-11-17|PRESIDENT|null|5000.0|
|null|    10| 7934|MILLER|1982-01-23|    CLERK|7369|1300.0|
+----+------+-----+------+----------+---------+----+------+scala> emp.where("deptno=10 and sal>1400").show
+----+------+-----+-----+----------+---------+----+------+
|comm|deptno|empno|ename|  hiredate|      job| mgr|   sal|
+----+------+-----+-----+----------+---------+----+------+
|null|    10| 7782|CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    10| 7839| KING|1981-11-17|PRESIDENT|null|5000.0|
+----+------+-----+-----+----------+---------+----+------+

filter

scala> emp.filter("deptno=10 and sal>1400").show
+----+------+-----+-----+----------+---------+----+------+
|comm|deptno|empno|ename|  hiredate|      job| mgr|   sal|
+----+------+-----+-----+----------+---------+----+------+
|null|    10| 7782|CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    10| 7839| KING|1981-11-17|PRESIDENT|null|5000.0|
+----+------+-----+-----+----------+---------+----+------+

select

scala> emp.select("empno","ename")show(3)
+-----+-----+
|empno|ename|
+-----+-----+
| 7369|SMITH|
| 7499|ALLEN|
| 7521| WARD|
+-----+-----+
only showing top 3 rowsscala> emp.select(emp("empno"),emp("sal")+1).show(3)
+-----+---------+
|empno|(sal + 1)|
+-----+---------+
| 7369|    801.0|
| 7499|   1601.5|
| 7521|   1251.0|
+-----+---------+
only showing top 3 rowsscala> emp.select(col("sal"),col("sal")+1).show(3)
+------+---------+
|   sal|(sal + 1)|
+------+---------+
| 800.0|    801.0|
|1600.5|   1601.5|
|1250.0|   1251.0|
+------+---------+
only showing top 3 rows

selectExpr

scala> emp.selectExpr("ename","empno as no","round(sal)").show(4)
+-----+----+-------------+
|ename|  no|round(sal, 0)|
+-----+----+-------------+
|SMITH|7369|        800.0|
|ALLEN|7499|       1601.0|
| WARD|7521|       1250.0|
|JONES|7566|       2975.0|
+-----+----+-------------+
only showing top 4 rows

col

scala> emp.select(col("ename")).show
+------+
| ename|
+------+
| SMITH|
| ALLEN|
|  WARD|
| JONES|
|MARTIN|
| BLAKE|
| CLARK|
| SCOTT|
|  KING|
|TURNER|
| ADAMS|
| JAMES|
|  FORD|
|MILLER|
+------+

drop

scala> emp.drop("empno").show(3)
+----+------+-----+----------+--------+----+------+
|comm|deptno|ename|  hiredate|     job| mgr|   sal|
+----+------+-----+----------+--------+----+------+
|null|    20|SMITH|1980-12-17|   CLERK|7902| 800.0|
| 300|    30|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
| 500|    30| WARD|1981-02-22|SALESMAN|7698|1250.0|
+----+------+-----+----------+--------+----+------+
only showing top 3 rows

limit

scala> emp.limit(3)
res26: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [comm: bigint, deptno: bigint ... 6 more fields]scala> emp.limit(3).show
+----+------+-----+-----+----------+--------+----+------+
|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|
+----+------+-----+-----+----------+--------+----+------+
|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|
| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
| 500|    30| 7521| WARD|1981-02-22|SALESMAN|7698|1250.0|
+----+------+-----+-----+----------+--------+----+------+

orderBy & sort

scala> emp.orderBy(-col("sal")).show(3)
+----+------+-----+-----+----------+---------+----+------+
|comm|deptno|empno|ename|  hiredate|      job| mgr|   sal|
+----+------+-----+-----+----------+---------+----+------+
|null|    10| 7839| KING|1981-11-17|PRESIDENT|null|5000.0|
|null|    20| 7788|SCOTT|1987-04-19|  ANALYST|7566|3000.0|
|null|    20| 7902| FORD|1981-12-02|  ANALYST|7566|3000.0|
+----+------+-----+-----+----------+---------+----+------+
only showing top 3 rowsscala> emp.sort(-col("sal")).show(3)
+----+------+-----+-----+----------+---------+----+------+
|comm|deptno|empno|ename|  hiredate|      job| mgr|   sal|
+----+------+-----+-----+----------+---------+----+------+
|null|    10| 7839| KING|1981-11-17|PRESIDENT|null|5000.0|
|null|    20| 7788|SCOTT|1987-04-19|  ANALYST|7566|3000.0|
|null|    20| 7902| FORD|1981-12-02|  ANALYST|7566|3000.0|
+----+------+-----+-----+----------+---------+----+------+
only showing top 3 rows

sortWithinPartitions

scala> emp.repartition(2).sortWithinPartitions("sal").show
+----+------+-----+------+----------+---------+----+------+
|comm|deptno|empno| ename|  hiredate|      job| mgr|   sal|
+----+------+-----+------+----------+---------+----+------+
|null|    20| 7369| SMITH|1980-12-17|    CLERK|7902| 800.0|
|null|    20| 7876| ADAMS|1987-05-23|    CLERK|7788|1100.0|
|1400|    30| 7654|MARTIN|1981-09-28| SALESMAN|7698|1250.0|
| 500|    30| 7521|  WARD|1981-02-22| SALESMAN|7698|1250.0|
|null|    10| 7934|MILLER|1982-01-23|    CLERK|7369|1300.0|
|null|    20| 7566| JONES|1981-04-02|  MANAGER|7839|2975.0|
|null|    20| 7788| SCOTT|1987-04-19|  ANALYST|7566|3000.0|
|null|    30| 7900| JAMES|1981-12-03|    CLERK|7698| 950.0|
|   0|    30| 7844|TURNER|1981-09-08| SALESMAN|7698|1500.0|
| 300|    30| 7499| ALLEN|1981-02-20| SALESMAN|7698|1600.5|
|null|    10| 7782| CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    30| 7698| BLAKE|1981-05-01|  MANAGER|7839|2850.0|
|null|    20| 7902|  FORD|1981-12-02|  ANALYST|7566|3000.0|
|null|    10| 7839|  KING|1981-11-17|PRESIDENT|null|5000.0|
+----+------+-----+------+----------+---------+----+------+

groupBy
scala> emp.groupBy(“deptno”).count.show

+------+-----+
|deptno|count|
+------+-----+
|    10|    3|
|    30|    6|
|    20|    5|
+------+-----+

cube

scala> emp.cube("deptno","job").sum("sal").show()
+------+---------+--------+
|deptno|      job|sum(sal)|
+------+---------+--------+
|    20|  MANAGER|  2975.0|
|  null|PRESIDENT|  5000.0|
|  null|     null| 29025.5|
|    10|PRESIDENT|  5000.0|
|  null| SALESMAN|  5600.5|
|    30|    CLERK|   950.0|
|    10|     null|  8750.0|
|    20|    CLERK|  1900.0|
|  null|  ANALYST|  6000.0|
|    30| SALESMAN|  5600.5|
|    20|     null| 10875.0|
|    10|    CLERK|  1300.0|
|  null|  MANAGER|  8275.0|
|    30|     null|  9400.5|
|  null|    CLERK|  4150.0|
|    20|  ANALYST|  6000.0|
|    30|  MANAGER|  2850.0|
|    10|  MANAGER|  2450.0|
+------+---------+--------+

rollup

scala> emp.rollup("deptno","job").sum("sal").show()
+------+---------+--------+
|deptno|      job|sum(sal)|
+------+---------+--------+
|    20|  MANAGER|  2975.0|
|  null|     null| 29025.5|
|    10|PRESIDENT|  5000.0|
|    30|    CLERK|   950.0|
|    10|     null|  8750.0|
|    20|    CLERK|  1900.0|
|    30| SALESMAN|  5600.5|
|    20|     null| 10875.0|
|    10|    CLERK|  1300.0|
|    30|     null|  9400.5|
|    20|  ANALYST|  6000.0|
|    30|  MANAGER|  2850.0|
|    10|  MANAGER|  2450.0|
+------+---------+--------+

max

scala> emp.groupBy("deptno").max("sal").show()
+------+--------+
|deptno|max(sal)|
+------+--------+
|    10|  5000.0|
|    30|  2850.0|
|    20|  3000.0|
+------+--------+

distinct

scala> emp.distinct.show(3)
+----+------+-----+-----+----------+-------+----+------+
|comm|deptno|empno|ename|  hiredate|    job| mgr|   sal|
+----+------+-----+-----+----------+-------+----+------+
|null|    20| 7876|ADAMS|1987-05-23|  CLERK|7788|1100.0|
|null|    20| 7902| FORD|1981-12-02|ANALYST|7566|3000.0|
|null|    20| 7788|SCOTT|1987-04-19|ANALYST|7566|3000.0|
+----+------+-----+-----+----------+-------+----+------+
only showing top 3 rows

dropDuplicates

scala> emp.dropDuplicates("deptno").show
+----+------+-----+-----+----------+--------+----+------+
|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|
+----+------+-----+-----+----------+--------+----+------+
|null|    10| 7782|CLARK|1981-06-09| MANAGER|7839|2450.0|
| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|
+----+------+-----+-----+----------+--------+----+------+

union

scala> emp.limit(1).union(emp.limit(1)).show
+----+------+-----+-----+----------+-----+----+-----+
|comm|deptno|empno|ename|  hiredate|  job| mgr|  sal|
+----+------+-----+-----+----------+-----+----+-----+
|null|    20| 7369|SMITH|1980-12-17|CLERK|7902|800.0|
|null|    20| 7369|SMITH|1980-12-17|CLERK|7902|800.0|
+----+------+-----+-----+----------+-----+----+-----+

join

scala> emp.join(emp,"ename").show
+------+----+------+-----+----------+---------+----+------+----+------+-----+----------+---------+----+------+
| ename|comm|deptno|empno|  hiredate|      job| mgr|   sal|comm|deptno|empno|  hiredate|      job| mgr|   sal|
+------+----+------+-----+----------+---------+----+------+----+------+-----+----------+---------+----+------+
| SMITH|null|    20| 7369|1980-12-17|    CLERK|7902| 800.0|null|    20| 7369|1980-12-17|    CLERK|7902| 800.0|
| ALLEN| 300|    30| 7499|1981-02-20| SALESMAN|7698|1600.5| 300|    30| 7499|1981-02-20| SALESMAN|7698|1600.5|
|  WARD| 500|    30| 7521|1981-02-22| SALESMAN|7698|1250.0| 500|    30| 7521|1981-02-22| SALESMAN|7698|1250.0|
| JONES|null|    20| 7566|1981-04-02|  MANAGER|7839|2975.0|null|    20| 7566|1981-04-02|  MANAGER|7839|2975.0|
|MARTIN|1400|    30| 7654|1981-09-28| SALESMAN|7698|1250.0|1400|    30| 7654|1981-09-28| SALESMAN|7698|1250.0|
| BLAKE|null|    30| 7698|1981-05-01|  MANAGER|7839|2850.0|null|    30| 7698|1981-05-01|  MANAGER|7839|2850.0|
| CLARK|null|    10| 7782|1981-06-09|  MANAGER|7839|2450.0|null|    10| 7782|1981-06-09|  MANAGER|7839|2450.0|
| SCOTT|null|    20| 7788|1987-04-19|  ANALYST|7566|3000.0|null|    20|
+------+----+------+-----+----------+---------+----+------+----+------+-----+----------+---------+----+------+

用多个字段连接.不一定要包装在Seq中,List,Array也可以

scala> emp.join(emp,Seq("ename","deptno")).show
+------+------+----+-----+----------+---------+----+------+----+-----+----------+---------+----+------+
| ename|deptno|comm|empno|  hiredate|      job| mgr|   sal|comm|empno|  hiredate|      job| mgr|   sal|
+------+------+----+-----+----------+---------+----+------+----+-----+----------+---------+----+------+
| SMITH|    20|null| 7369|1980-12-17|    CLERK|7902| 800.0|null| 7369|1980-12-17|    CLERK|7902| 800.0|
| ALLEN|    30| 300| 7499|1981-02-20| SALESMAN|7698|1600.5| 300| 7499|1981-02-20| SALESMAN|7698|1600.5|
|  WARD|    30| 500| 7521|1981-02-22| SALESMAN|7698|1250.0| 500| 7521|1981-02-22| SALESMAN|7698|1250.0|
| JONES|    20|null| 7566|1981-04-02|  MANAGER|7839|2975.0|null| 7566|1981-04-02|  MANAGER|7839|2975.0|
|MARTIN|    30|1400| 7654|1981-09-28| SALESMAN|7698|1250.0|1400| 7654|1981-09-28| SALESMAN|7698|1250.0|
| BLAKE|    30|null| 7698|1981-05-01|  MANAGER|7839|2850.0|null|
+------+------+----+-----+----------+---------+----+------+----+-----+----------+---------+----+------+

内连接

scala> emp.join(emp,Array("ename","deptno"),"inner").show(3)
+-----+------+----+-----+----------+--------+----+------+----+-----+----------+--------+----+------+
|ename|deptno|comm|empno|  hiredate|     job| mgr|   sal|comm|empno|  hiredate|     job| mgr|   sal|
+-----+------+----+-----+----------+--------+----+------+----+-----+----------+--------+----+------+
|SMITH|    20|null| 7369|1980-12-17|   CLERK|7902| 800.0|null| 7369|1980-12-17|   CLERK|7902| 800.0|
|ALLEN|    30| 300| 7499|1981-02-20|SALESMAN|7698|1600.5| 300| 7499|1981-02-20|SALESMAN|7698|1600.5|
| WARD|    30| 500| 7521|1981-02-22|SALESMAN|7698|1250.0| 500| 7521|1981-02-22|SALESMAN|7698|1250.0|
+-----+------+----+-----+----------+--------+----+------+----+-----+----------+--------+----+------+

换种写法

scala> emp.join(emp,emp("ename")===emp("ename")).show(3)
20/11/24 21:27:48 WARN sql.Column: Constructing trivially true equals predicate, 'ename#79 = ename#79'. Perhaps you need to use aliases.
+----+------+-----+-----+----------+--------+----+------+----+------+-----+-----+----------+--------+----+------+
|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|
+----+------+-----+-----+----------+--------+----+------+----+------+-----+-----+----------+--------+----+------+
|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|
| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
| 500|    30| 7521| WARD|1981-02-22|SALESMAN|7698|1250.0| 500|    30| 7521| WARD|1981-02-22|SALESMAN|7698|1250.0|
+----+------+-----+-----+----------+--------+----+------+----+------+-----+-----+----------+--------+----+------+

求交集

scala> emp.intersect(emp.limit(1)).show()
+----+------+-----+-----+----------+-----+----+-----+
|comm|deptno|empno|ename|  hiredate|  job| mgr|  sal|
+----+------+-----+-----+----------+-----+----+-----+
|null|    20| 7369|SMITH|1980-12-17|CLERK|7902|800.0|
+----+------+-----+-----+----------+-----+----+-----+

一个有而另一个没有

scala> emp.except(emp.limit(1)).show()
+----+------+-----+------+----------+---------+----+------+
|comm|deptno|empno| ename|  hiredate|      job| mgr|   sal|
+----+------+-----+------+----------+---------+----+------+
|null|    30| 7900| JAMES|1981-12-03|    CLERK|7698| 950.0|
|null|    20| 7902|  FORD|1981-12-02|  ANALYST|7566|3000.0|
|1400|    30| 7654|MARTIN|1981-09-28| SALESMAN|7698|1250.0|
| 300|    30| 7499| ALLEN|1981-02-20| SALESMAN|7698|1600.5|
|null|    10| 7782| CLARK|1981-06-09|  MANAGER|7839|2450.0|
|null|    20| 7788| SCOTT|1987-04-19|  ANALYST|7566|3000.0|
| 500|    30| 7521|  WARD|1981-02-22| SALESMAN|7698|1250.0|
|null|    10| 7934|MILLER|1982-01-23|    CLERK|7369|1300.0|
|null|    10| 7839|  KING|1981-11-17|PRESIDENT|null|5000.0|
|   0|    30| 7844|TURNER|1981-09-08| SALESMAN|7698|1500.0|
|null|    30| 7698| BLAKE|1981-05-01|  MANAGER|7839|2850.0|
|null|    20| 7566| JONES|1981-04-02|  MANAGER|7839|2975.0|
|null|    20| 7876| ADAMS|1987-05-23|    CLERK|7788|1100.0|
+----+------+-----+------+----------+---------+----+------+

重命名

scala> emp.withColumnRenamed("deptno",  "no").show(3)
+----+---+-----+-----+----------+--------+----+------+
|comm| no|empno|ename|  hiredate|     job| mgr|   sal|
+----+---+-----+-----+----------+--------+----+------+
|null| 20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|
| 300| 30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5|
| 500| 30| 7521| WARD|1981-02-22|SALESMAN|7698|1250.0|
+----+---+-----+-----+----------+--------+----+------+

新增1列

scala> emp.withColumn("salary2",emp("sal")).show(2)
+----+------+-----+-----+----------+--------+----+------+-------+
|comm|deptno|empno|ename|  hiredate|     job| mgr|   sal|salary2|
+----+------+-----+-----+----------+--------+----+------+-------+
|null|    20| 7369|SMITH|1980-12-17|   CLERK|7902| 800.0|  800.0|
| 300|    30| 7499|ALLEN|1981-02-20|SALESMAN|7698|1600.5| 1600.5|
+----+------+-----+-----+----------+--------+----+------+-------+

两种风格的写法

DSL风格写法

//降序排列
val df1: Unit = df.select("empno","ename","deptno").orderBy($"deptno".desc).show()

SQL风格写法

 df.createTempView("emp")val sql="select * from emp where deptno<30"private val frame: DataFrame = spark.sql(sql.stripMargin)frame.show()

spark sql常用方法相关推荐

  1. Spark SQL原理及常用方法详解(二)

    Spark SQL 一.Spark SQL基础知识 1.Spark SQL简介 (1)简单介绍 (2)Datasets & DataFrames (3)Spark SQL架构 (4)Spark ...

  2. 「Spark从入门到精通系列」4.Spark SQL和DataFrames:内置数据源简介

    来源 |  Learning Spark Lightning-Fast Data Analytics,Second Edition 作者 | Damji,et al. 翻译 | 吴邪 大数据4年从业经 ...

  3. 大数据Hadoop之——Spark SQL+Spark Streaming

    文章目录 一.Spark SQL概述 二.SparkSQL版本 1)SparkSQL的演变之路 2)shark与SparkSQL对比 3)SparkSession 三.RDD.DataFrames和D ...

  4. Spark SQL基本操作以及函数的使用

    2019独角兽企业重金招聘Python工程师标准>>> 引语: 本篇博客主要介绍了Spark SQL中的filter过滤数据.去重.集合等基本操作,以及一些常用日期函数,随机函数,字 ...

  5. Spark SQL与外部数据源的操作(Spark SQL ——> CSV/JSON/Parquet/hive/mysql)

    目录 一.Spark SQL支持的外部数据源 二.Spark SQL -> CSV 2.1 读CSV文件 a.有列名 b.无列名 2.2 写CSV文件 三.Spark SQL -> JSO ...

  6. 【Spark Summit East 2017】Spark SQL:Tungsten之后另一个可以达到16倍速度的利器

    更多精彩内容参见云栖社区大数据频道https://yq.aliyun.com/big-data:此外,通过Maxcompute及其配套产品,低廉的大数据分析仅需几步,详情访问https://www.a ...

  7. 【未完成】[Spark SQL_2] 在 IDEA 中编写 Spark SQL 程序

    0. 说明 在 IDEA 中编写 Spark SQL 程序,分别编写 Java 程序 & Scala 程序 1. 编写 Java 程序 待补充 2. 编写 Scala 程序 待补充 转载于:h ...

  8. spark sql 本地调试_Spark精华问答|Spark的三种运行模式有何区别?

    戳蓝字"CSDN云计算"关注我们哦! Spark是一个针对超大数据集合的低延迟的集群分布式计算系统,比MapReducer快40倍左右,是hadoop的升级版本,Hadoop作为第 ...

  9. spark SQL学习(综合案例-日志分析)

    日志分析 scala> import org.apache.spark.sql.types._ scala> import org.apache.spark.sql.Rowscala> ...

最新文章

  1. 900万张标注图像,谷歌发布Open Images最新V3版
  2. python读取excel表格-python怎么读取excel表格
  3. poj 1190 生日蛋糕 难|供自己瞻仰
  4. 数据中心机房消防演练方案
  5. 牛客挑战赛47 A 一道GCD问题
  6. 交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统
  7. 多表关系介绍 mysql
  8. java基础—自定义一个比较器,按照字符串的长度升序的方法来比较字符串进行储存(java集合三)
  9. BBC Studios与字节跳动达成海量内容合作协议
  10. 服务启动失败_将控制台程序转换为服务运行
  11. python print 换行_和我一起学Python?第1讲——Print()函数
  12. 蓝桥杯_算法训练_未名湖畔的烦恼
  13. SpringBoot+Querydsl 框架,大大简化复杂查询操作
  14. idea查找当前方法的实现_intellij idea快速查看当前类中的所有方法(推荐)
  15. 在Power BI中如何计算同比增长?
  16. vue3+vite2警告提示The above dynamic import cannot be analyzed by vite问题,vite中import动态引入
  17. 全球与中国线锯钢线市场深度研究分析报告
  18. Excel的使用-查看公式引用的单元格【跬步】
  19. android app 马甲包,教你一招APP如何快速定制马甲包
  20. 高速数据采集卡之FMC子板丨FMC接口AD/DA子卡丨坤驰科技

热门文章

  1. Android常见界面布局(详细介绍)
  2. java分配数组空间使用的关键字_创建数组时为数组元素分配内存空间的 Java 关键字是________(5.0分)_学小易找答案...
  3. matlab中用于离散系统求解的命令,MATLAB求解规划问题(线性规划, 整数规划, 非线性规划)...
  4. post url 后面跟参数_都2019年了,还问GET和POST的区别
  5. 常用idea快捷键大全
  6. idea解决代码冲突与乱码
  7. 怎么把程序内部坐标转为屏幕坐标,如何将工作空间坐标转换为屏幕坐标?
  8. String、StringBuffer、StringBuilder有什么区别
  9. MySQL保存计算结果_在数据库中保存计算结果如何对应到相应的工程文件
  10. 堆积柱形图显示总数_在Excel堆积柱形图中显示合计值