spark streaming 5: InputDStream
所有的输入流都某个时间间隔将数据以block的形式保存到spark memory中,但以spark core不同的是,spark streaming默认是将对象序列化后保存到内存中。
/** * This is the abstract base class . This class provides methods * start() and stop() which is called by Spark Streaming system to . * Input streams that can  For example, * FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for * new files and generates RDDs with the new files. . * * @param ssc_ Streaming context that will execute this input stream */abstract class T@transient extends Tprivatevar lastValidTimenull


/** * Abstract class for defining any [[org.apache.spark.streaming.dstream.InputDStream]] * that has to start a receiver on worker nodes to receive external data. *  * @param ssc_ Streaming context that will execute this input stream * @tparam T Class type of the object of this stream */abstract class T@transient extends T/** Keeps all received blocks information */  private lazy val new , /** This is an unique identifier for the network input stream. */  val id 

/** * Gets the receiver object that will be sent to the worker nodes * to receive data. This method needs to defined by any specific implementation * of a NetworkInputDStream. */def getReceiver(): Receiver[T]

/** Ask ReceiverInputTracker for received data blocks and generates RDDs with them. */override def compute(validTime: Time): Option[RDD[T]] = {// If this is called for any time before the start time of the context,  // then this returns an empty RDD. This may happen when recovering from a  // master failure  if (validTime >= graph.startTime) {val blockInfo = ssc.scheduler.receiverTracker.getReceivedBlockInfo(id)receivedBlockInfo(validTime) = blockInfoval blockIds =[BlockId])Some(new BlockRDD[T](, blockIds))  } else {Some(new BlockRDD[T](, Array[BlockId]()))  }}

