1 Spark Streaming另类在线实验

2 瞬间理解Spark Streaming本质

Spark中程序最容易出错的是流处理,流处理也是目前spark技术瓶颈之一,所以要做出一个优秀的spark发行版的话,对流处理的优化是必需的。

根据spark历史演进的趋势,spark graphX,机器学习已经发展得非常好。对它进行改进是重要的,单不是最重要的。最最重要的还是流处理,而流处理最为核心的是流处理结合机器学习,图计算的一体化结合使用,真正的实现一个堆栈rum them all .

1 流处理最容易出错

2 流处理结合图计算和机器学习将发挥出巨大的潜力

3 构造出复杂的实时数据处理的应用程序

流处理其实是构建在spark core之上的一个应用程序

代码如下:

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}/*** Created by hadoop on 2016/4/18.* 背景描述 在广告点击计费系统中 我们在线过滤掉 黑名单的点击 进而包含广告商的利益* 只有效的广告点击计费** 1、DT大数据梦工厂微信公众号DT_Spark* 2、IMF晚8点大数据实战YY直播频道号:68917580* 3、新浪微博:http://www.weibo.com/ilovepains*/
object OnlineBlanckListFilter extends App{//val basePath = "hdfs://master:9000/streaming"val conf = new SparkConf().setAppName("SparkStreamingOnHDFS")if(args.length == 0) conf.setMaster("spark://Master:7077")val ssc = new StreamingContext(conf, Seconds(30))val blackList = Array(("hadoop", true) , ("mahout", true), ("spark", false))val backListRDD = ssc.sparkContext.parallelize(blackList)val adsClickStream = ssc.socketTextStream("192.168.74.132", 9000, StorageLevel.MEMORY_AND_DISK_SER_2)val rdd = adsClickStream.map{ads => (ads.split(" ")(1), ads)}val validClicked = rdd.transform(userClickRDD => {val joinedBlackRDD = userClickRDD.leftOuterJoin(backListRDD)joinedBlackRDD.filter(joinedItem => {if(joinedItem._2._2.getOrElse(false)){false}else{true}})})validClicked.map(validClicked => {validClicked._2._1}).print()ssc.start()ssc.awaitTermination()
}

16/05/01 17:00:31 INFO scheduler.DAGScheduler: ResultStage 1 ( start at OnlineBlanckListFilter.scala:40) finished in 4.234 s
16/05/01 17:00:31 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Job 0 finished: start at OnlineBlanckListFilter.scala:40, took 81.504046 s
16/05/01 17:00:32 INFO scheduler.ReceiverTracker: Starting 1 receivers
16/05/01 17:00:32 INFO scheduler.ReceiverTracker: ReceiverTracker started
16/05/01 17:00:32 INFO dstream.ForEachDStream: metadataCleanupDelay = -1
16/05/01 17:00:32 INFO dstream.MappedDStream: metadataCleanupDelay = -1
16/05/01 17:00:32 INFO dstream.TransformedDStream: metadataCleanupDelay = -1
16/05/01 17:00:32 INFO dstream.MappedDStream: metadataCleanupDelay = -1
16/05/01 17:00:32 INFO dstream.SocketInputDStream: metadataCleanupDelay = -1
16/05/01 17:00:32 INFO dstream.SocketInputDStream: Slide time = 30000 ms
16/05/01 17:00:32 INFO dstream.SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/01 17:00:32 INFO dstream.SocketInputDStream: Checkpoint interval = null
16/05/01 17:00:32 INFO dstream.SocketInputDStream: Remember duration = 30000 ms
16/05/01 17:00:32 INFO dstream.SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@2f432294
16/05/01 17:00:32 INFO dstream.MappedDStream: Slide time = 30000 ms
16/05/01 17:00:32 INFO dstream.MappedDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/01 17:00:32 INFO dstream.MappedDStream: Checkpoint interval = null
16/05/01 17:00:32 INFO dstream.MappedDStream: Remember duration = 30000 ms
16/05/01 17:00:32 INFO dstream.MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@99b8aa7
16/05/01 17:00:32 INFO dstream.TransformedDStream: Slide time = 30000 ms
16/05/01 17:00:32 INFO dstream.TransformedDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/01 17:00:32 INFO dstream.TransformedDStream: Checkpoint interval = null
16/05/01 17:00:32 INFO dstream.TransformedDStream: Remember duration = 30000 ms
16/05/01 17:00:32 INFO dstream.TransformedDStream: Initialized and validated org.apache.spark.streaming.dstream.TransformedDStream@7e9127a
16/05/01 17:00:32 INFO dstream.MappedDStream: Slide time = 30000 ms
16/05/01 17:00:32 INFO dstream.MappedDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/01 17:00:32 INFO dstream.MappedDStream: Checkpoint interval = null
16/05/01 17:00:32 INFO dstream.MappedDStream: Remember duration = 30000 ms
16/05/01 17:00:32 INFO dstream.MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@bc51cce
16/05/01 17:00:32 INFO dstream.ForEachDStream: Slide time = 30000 ms
16/05/01 17:00:32 INFO dstream.ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/05/01 17:00:32 INFO dstream.ForEachDStream: Checkpoint interval = null
16/05/01 17:00:32 INFO dstream.ForEachDStream: Remember duration = 30000 ms
16/05/01 17:00:32 INFO dstream.ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@3d24c8a0
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Got job 1 (start at OnlineBlanckListFilter.scala:40) with 1 output partitions
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (start at OnlineBlanckListFilter.scala:40)
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Missing parents: List()
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (Receiver 0 ParallelCollectionRDD[4] at makeRDD at ReceiverTracker.scala:588), which has no missing parents
16/05/01 17:00:32 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 61.2 KB, free 69.7 KB)
16/05/01 17:00:32 INFO scheduler.ReceiverTracker: Receiver 0 started
16/05/01 17:00:32 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 20.5 KB, free 90.1 KB)
16/05/01 17:00:32 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.74.130:48586 (size: 20.5 KB, free: 152.8 MB)
16/05/01 17:00:32 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
16/05/01 17:00:32 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (Receiver 0 ParallelCollectionRDD[4] at makeRDD at ReceiverTracker.scala:588)
16/05/01 17:00:32 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/05/01 17:00:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, Worker2, partition 0,PROCESS_LOCAL, 2294 bytes)
16/05/01 17:00:33 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Worker2:42963 (size: 20.5 KB, free: 169.6 MB)
16/05/01 17:00:33 INFO util.RecurringTimer: Started timer for JobGenerator at time 1462093260000
16/05/01 17:00:33 INFO scheduler.JobGenerator: Started JobGenerator at 1462093260000 ms
16/05/01 17:00:33 INFO scheduler.JobScheduler: Started JobScheduler
16/05/01 17:00:34 INFO streaming.StreamingContext: StreamingContext started
16/05/01 17:00:35 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from Worker2:51587
16/05/01 17:00:35 INFO storage.BlockManagerInfo: Added input-0-1462093235000 in memory on Worker2:42963 (size: 12.0 B, free: 169.6 MB)
16/05/01 17:00:35 INFO storage.BlockManagerInfo: Added input-0-1462093235000 in memory on Worker1:40373 (size: 12.0 B, free: 169.6 MB)
16/05/01 17:00:35 INFO storage.BlockManagerInfo: Added input-0-1462093235200 in memory on Worker2:42963 (size: 24.0 B, free: 169.6 MB)
16/05/01 17:00:35 INFO storage.BlockManagerInfo: Added input-0-1462093235200 in memory on Worker1:40373 (size: 24.0 B, free: 169.6 MB)
16/05/01 17:01:01 INFO spark.SparkContext: Starting job: collect at OnlineBlanckListFilter.scala:26
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Registering RDD 6 (map at OnlineBlanckListFilter.scala:23)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Registering RDD 0 (parallelize at OnlineBlanckListFilter.scala:20)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Got job 2 (collect at OnlineBlanckListFilter.scala:26) with 2 output partitions
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (collect at OnlineBlanckListFilter.scala:26)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3, ShuffleMapStage 4)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3, ShuffleMapStage 4)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[6] at map at OnlineBlanckListFilter.scala:23), which has no missing parents
16/05/01 17:01:01 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.3 KB, free 92.4 KB)
16/05/01 17:01:01 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1454.0 B, free 93.8 KB)
16/05/01 17:01:01 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.74.130:48586 (size: 1454.0 B, free: 152.8 MB)
16/05/01 17:01:01 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[6] at map at OnlineBlanckListFilter.scala:23)
16/05/01 17:01:01 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 4 (ParallelCollectionRDD[0] at parallelize at OnlineBlanckListFilter.scala:20), which has no missing parents
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 71, Worker1, partition 0,NODE_LOCAL, 2058 bytes)
16/05/01 17:01:01 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 1808.0 B, free 95.6 KB)
16/05/01 17:01:01 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1133.0 B, free 96.7 KB)
16/05/01 17:01:01 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.74.130:48586 (size: 1133.0 B, free: 152.8 MB)
16/05/01 17:01:01 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
16/05/01 17:01:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 4 (ParallelCollectionRDD[0] at parallelize at OnlineBlanckListFilter.scala:20)
16/05/01 17:01:01 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 72, Master, partition 0,PROCESS_LOCAL, 2028 bytes)
16/05/01 17:01:01 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on Master:57087 (size: 1133.0 B, free: 169.6 MB)
16/05/01 17:01:01 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on Worker1:40373 (size: 1454.0 B, free: 169.6 MB)
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 73, Worker1, partition 1,NODE_LOCAL, 2058 bytes)
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 71) in 171 ms on Worker1 (1/2)
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 74, Worker1, partition 1,PROCESS_LOCAL, 2039 bytes)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (map at OnlineBlanckListFilter.scala:23) finished in 0.192 s
16/05/01 17:01:01 INFO scheduler.DAGScheduler: looking for newly runnable stages
16/05/01 17:01:01 INFO scheduler.DAGScheduler: running: Set(ResultStage 2, ShuffleMapStage 4)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 5)
16/05/01 17:01:01 INFO scheduler.DAGScheduler: failed: Set()
16/05/01 17:01:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 73) in 23 ms on Worker1 (2/2)
16/05/01 17:01:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
16/05/01 17:01:02 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 72) in 620 ms on Master (1/2)
16/05/01 17:01:03 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on Worker1:40373 (size: 1133.0 B, free: 169.6 MB)
16/05/01 17:01:03 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 74) in 1412 ms on Worker1 (2/2)
16/05/01 17:01:03 INFO scheduler.DAGScheduler: ShuffleMapStage 4 (parallelize at OnlineBlanckListFilter.scala:20) finished in 1.573 s
16/05/01 17:01:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
16/05/01 17:01:03 INFO scheduler.DAGScheduler: looking for newly runnable stages
16/05/01 17:01:03 INFO scheduler.DAGScheduler: running: Set(ResultStage 2)
16/05/01 17:01:03 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 5)
16/05/01 17:01:03 INFO scheduler.DAGScheduler: failed: Set()
16/05/01 17:01:03 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[9] at leftOuterJoin at OnlineBlanckListFilter.scala:25), which has no missing parents
16/05/01 17:01:03 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.1 KB, free 99.8 KB)
16/05/01 17:01:03 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 1753.0 B, free 101.5 KB)
16/05/01 17:01:03 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.74.130:48586 (size: 1753.0 B, free: 152.8 MB)
16/05/01 17:01:03 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
16/05/01 17:01:03 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[9] at leftOuterJoin at OnlineBlanckListFilter.scala:25)
16/05/01 17:01:03 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
16/05/01 17:01:03 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 75, Master, partition 0,PROCESS_LOCAL, 2019 bytes)
16/05/01 17:01:03 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 5.0 (TID 76, Worker1, partition 1,PROCESS_LOCAL, 2019 bytes)
16/05/01 17:01:03 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on Master:57087 (size: 1753.0 B, free: 169.6 MB)
16/05/01 17:01:03 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on Worker1:40373 (size: 1753.0 B, free: 169.6 MB)
16/05/01 17:01:03 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to Worker1:40424
16/05/01 17:01:03 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 148 bytes
16/05/01 17:01:03 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to Worker1:40424
16/05/01 17:01:03 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 2 is 165 bytes
16/05/01 17:01:04 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to Master:48964
16/05/01 17:01:04 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to Master:48964
16/05/01 17:01:06 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 75) in 3733 ms on Master (1/2)
16/05/01 17:01:07 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 5.0 (TID 76) in 4126 ms on Worker1 (2/2)
16/05/01 17:01:07 INFO scheduler.DAGScheduler: ResultStage 5 (collect at OnlineBlanckListFilter.scala:26) finished in 4.128 s
16/05/01 17:01:07 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
16/05/01 17:01:07 INFO scheduler.DAGScheduler: Job 2 finished: collect at OnlineBlanckListFilter.scala:26, took 5.896825 s
(flink,(2056 flink,None))
(uity,(4589 uity,None))
(hdaoop,(4589 hdaoop,None))
16/05/01 17:01:09 INFO scheduler.JobScheduler: Added jobs for time 1462093260000 ms

16/05/01 17:01:09 INFO scheduler.JobScheduler: Starting job streaming job 1462093260000 ms.0 from job set of time 1462093260000 ms

下面是WEBUI展示的内容:

我们看到这里有2次真正的StreamingJOB

我们一个一个看下这些任务都是什么?

JOB 0 :

思考一个问题:这里为啥有这个任务 ,我们写的代码中并有没这些转换?

我们查看Stage0 的详情:

这里启动了50个JOB 我们从源码中找答案:

/*** Get the receivers from the ReceiverInputDStreams, distributes them to the* worker nodes as a parallel collection, and runs them.*/private def launchReceivers(): Unit = {val receivers = receiverInputStreams.map(nis => {val rcvr = nis.getReceiver()rcvr.setReceiverId(nis.id)rcvr})runDummySparkJob()logInfo("Starting " + receivers.length + " receivers")endpoint.send(StartAllReceivers(receivers))}                                                                                                                                                        
 我们关注 <pre name="code" class="java">runDummySparkJob 这个方法:<pre name="code" class="java">/*** Run the dummy Spark job to ensure that all slaves have registered. This avoids all the* receivers to be scheduled on the same node.** TODO Should poll the executor number and wait for executors according to* "spark.scheduler.minRegisteredResourcesRatio" and* "spark.scheduler.maxRegisteredResourcesWaitingTime" rather than running a dummy job.*/private def runDummySparkJob(): Unit = {if (!ssc.sparkContext.isLocal) {ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey(_ + _, 20).collect()}assert(getExecutors.nonEmpty)}
 这里启动了50个JOB,这个注释说明了这个方法是为了避免在同一个节点上启动 receivers 
</pre><pre name="code" class="java"> 我们回到JOB 1:
 <img src="https://img-blog.csdn.net/20160501220204990?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />
<pre name="code" class="java">  我们继续查看详情:
  <img src="" alt="" />
  我们看到这个任务是在Worker2 上执行的 而我打开Scoket 也是在Worker2 ,任务发生在数据产生的节点!!!
  我们从源码中找到答案:
  <pre name="code" class="java">RegisterReceiver 中的  startReceiver
// Function to start the receiver on the worker nodeval startReceiverFunc: Iterator[Receiver[_]] => Unit =(iterator: Iterator[Receiver[_]]) => {if (!iterator.hasNext) {throw new SparkException("Could not start receiver as object not found.")}if (TaskContext.get().attemptNumber() == 0) {val receiver = iterator.next()assert(iterator.hasNext == false)val supervisor = new ReceiverSupervisorImpl(receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption)supervisor.start()supervisor.awaitTermination()} else {// It's restarted by TaskScheduler, but we want to reschedule it again. So exit it.}}// Create the RDD using the scheduledLocations to run the receiver in a Spark jobval receiverRDD: RDD[Receiver[_]] =if (scheduledLocations.isEmpty) {ssc.sc.makeRDD(Seq(receiver), 1)} else {val preferredLocations = scheduledLocations.map(_.toString).distinctssc.sc.makeRDD(Seq(receiver -> preferredLocations))}receiverRDD.setName(s"Receiver $receiverId")ssc.sparkContext.setJobDescription(s"Streaming job running receiver $receiverId")ssc.sparkContext.setCallSite(Option(ssc.getStartSite()).getOrElse(Utils.getCallSite()))val future = ssc.sparkContext.submitJob[Receiver[_], Unit, Unit](receiverRDD, startReceiverFunc, Seq(0), (_, _) => Unit, ())

这里就是收集数据的节点,一个节点接收数据!

   思考:数据的接收是一个节点上,那计算发生在哪里?
   

第1课:通过案例对SparkStreaming 透彻理解三板斧相关推荐

  1. 通过案例对SparkStreaming透彻理解-3

    2019独角兽企业重金招聘Python工程师标准>>> 本期内容: 解密Spark Streaming Job架构和运行机制 解密Spark Streaming 容错架构和运行机制 ...

  2. 第3课:SparkStreaming 透彻理解三板斧之三:解密SparkStreaming运行机制和架构进阶之Job和容错...

    本期内容: 解密Spark Streaming Job架构和运行机制 解密Spark Streaming容错架构和运行机制 理解SparkStreaming的Job的整个架构和运行机制对于精通Spar ...

  3. 通过案例对 spark streaming 透彻理解三板斧之一: spark streaming 另类实验

    本期内容 : spark streaming另类在线实验 瞬间理解spark streaming本质 一.  我们最开始将从Spark Streaming入手 为何从Spark Streaming切入 ...

  4. 通过案例对 spark streaming 透彻理解三板斧之三:spark streaming运行机制与架构

    本期内容: 1. Spark Streaming Job架构与运行机制 2. Spark Streaming 容错架构与运行机制 事实上时间是不存在的,是由人的感官系统感觉时间的存在而已,是一种虚幻的 ...

  5. 职高计算机教学案例 反思,关于职高数学优质课教学案例的研究与反思

    [摘 要] 随着社会的发展和世界经济的相互融合,我国社会的技能型人才已经供不应求,同时,国家也加强了对职业教育的扶持力度,政策上也向职高倾斜.职高数学是每一个学生必须掌握的基础知识,是其学好专业知识的 ...

  6. Transformer课程 第8课 NER案例模型训练及预测

    Transformer课程 第8课 NER案例模型训练及预测 Train Our Classification Model 现在,我们的输入数据已正确格式化,是时候对BERT模型进行微调了. 4.1. ...

  7. Transformer课程 第8课NER案例代码笔记-IOB标记

    Transformer课程 第8课NER案例代码笔记-IOB标记 NER Tags and IOB Format 训练集和测试集都是包含餐厅相关文本(主要是评论和查询)的单个文件,其中每个单词都有一个 ...

  8. 优秀计算机基础微课案例,大学计算机基础——大学微课实用案例教学

    大学计算机基础--大学微课实用案例教学 语音 编辑 锁定 讨论 上传视频 <大学计算机基础--大学微课实用案例教学> 是清华大学出版社于2006年出版的图书.作者是徐军.李翠梅.杨丽君. ...

  9. Transformer课程 第8课NER案例代码笔记-部署简介

    Transformer课程 第8课NER案例代码笔记 BERT微调器 NER是信息提取的子任务,旨在将非结构化文本中提到的命名实体定位并分类为预定义类别,如人名.组织.位置.医疗代码.时间表达式.数量 ...

  10. 电子商务案例分析php,2020知到《西安邮电大学网课电子商务案例分析》单元测试答案2020高校邦《ThinkPHP框架技术》答案免费...

    2020知到<西安邮电大学网课电子商务案例分析>单元测试答案2020高校邦<ThinkPHP框架技术>答案免费 更多相关问题 [判断题] 动物性原料最适宜剞蓑衣花刀. [问答题 ...

最新文章

  1. 小目标检测的增强算法
  2. mysql5.6.40升级到mysql8.0.11 的步骤
  3. Windows权限设置详解
  4. 【实用】MB52库存报表转网格格式
  5. stk 坐标系_STK中文用户手册.pdf
  6. Java怎么样?学完后前途怎么样?
  7. Activiti 监听器的配置使用
  8. er图-为什么画er图?有哪些规范?
  9. Facebook like 按钮的语言设置
  10. jquery ui 主题_使用jQuery UI主题
  11. 实现QT打开Word文档
  12. WARN o.m.s.m.ClassPathMapperScanner - [warn,44] - No MyBatis mapper was found in ‘[com.ruoyi.**.map
  13. ruby rails + grape + sidekiq 项目实践
  14. 财路网每日原创推送: 新华网:十字路口的区块链
  15. 【算法设计与分析】5个数7次比较排序的算法
  16. [Unity实战]一个简单的unity手写摇杆[入门级][手写demo][开箱可用]
  17. July, 20(R)
  18. Python绘制气象实用地图(附代码和测试数据) !
  19. VBScript的好处
  20. wap ios android,wap 唤起App 的两种方式Schema Universal Link

热门文章

  1. ltm是什么门的缩写_公司简报中的“LTM EBITDA”是什么意思啊?对应的中文是什么?...
  2. html修改鼠标代码,在HTML页面上更改鼠标光标
  3. STM32多通道DMA—ADC采样
  4. 蓝桥杯2021年PYTHON 真题,跳房子
  5. 第二章-FPGA的概要-《FPGA的原理与结构》
  6. java规则计算_亲属计算规则算法--java实现(关键算法摘要)
  7. HTML5、CSS、JS基础
  8. (二)ubuntu下安装Amd RX470驱动
  9. 电信物联卡网络怎么样_中国电信物联网专用卡(中国电信物联网卡怎么样)
  10. 战争英雄、同性恋和计算机科学的奠基人