package dstream
Various implementations of DStream's.
- Alphabetic
- By Inheritance
- dstream
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
class
ConstantInputDStream[T] extends InputDStream[T]
An input stream that always returns the same RDD on each time step.
An input stream that always returns the same RDD on each time step. Useful for testing.
-
abstract
class
DStream[T] extends Serializable with Logging
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).
A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs). DStreams can either be created from live data (such as, data from TCP sockets, Kafka, etc.) using a org.apache.spark.streaming.StreamingContext or it can be generated by transforming existing DStreams using operations such as
map
,window
andreduceByKeyAndWindow
. While a Spark Streaming program is running, each DStream periodically generates a RDD, either from live data or by transforming the RDD generated by a parent DStream.This class contains the basic operations available on all DStreams, such as
map
,filter
andwindow
. In addition, org.apache.spark.streaming.dstream.PairDStreamFunctions contains operations available only on DStreams of key-value pairs, such asgroupByKeyAndWindow
andjoin
. These operations are automatically available on any DStream of pairs (e.g., DStream[(Int, Int)] through implicit conversions.A DStream internally is characterized by a few basic properties:
- A list of other DStreams that the DStream depends on
- A time interval at which the DStream generates an RDD
- A function that is used to generate an RDD after each time interval
-
abstract
class
InputDStream[T] extends DStream[T]
This is the abstract base class for all input streams.
This is the abstract base class for all input streams. This class provides methods start() and stop() which are called by Spark Streaming system to start and stop receiving data, respectively. Input streams that can generate RDDs from new data by running a service/thread only on the driver node (that is, without running a receiver on worker nodes), can be implemented by directly inheriting this InputDStream. For example, FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for new files and generates RDDs with the new files. For implementing input streams that requires running a receiver on the worker nodes, use org.apache.spark.streaming.dstream.ReceiverInputDStream as the parent class.
-
sealed abstract
class
MapWithStateDStream[KeyType, ValueType, StateType, MappedType] extends DStream[MappedType]
:: Experimental :: DStream representing the stream of data generated by
mapWithState
operation on a pair DStream.:: Experimental :: DStream representing the stream of data generated by
mapWithState
operation on a pair DStream. Additionally, it also gives access to the stream of state snapshots, that is, the state data of all keys after a batch has updated them.- KeyType
Class of the key
- ValueType
Class of the value
- StateType
Class of the state data
- MappedType
Class of the mapped data
- Annotations
- @Experimental()
-
class
PairDStreamFunctions[K, V] extends Serializable
Extra functions available on DStream of (key, value) pairs through an implicit conversion.
-
abstract
class
ReceiverInputDStream[T] extends InputDStream[T]
Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data.
Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data. Specific implementations of ReceiverInputDStream must define getReceiver function that gets the receiver object of type org.apache.spark.streaming.receiver.Receiver that will be sent to the workers to receive data.
- T
Class type of the object of this stream