Packages

package dstream

Various implementations of DStream's.

See also

org.apache.spark.streaming.dstream.DStream

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. dstream
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class ConstantInputDStream[T] extends InputDStream[T]

    An input stream that always returns the same RDD on each time step.

    An input stream that always returns the same RDD on each time step. Useful for testing.

  2. abstract class DStream[T] extends Serializable with Logging

    A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs).

    A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see org.apache.spark.rdd.RDD in the Spark core documentation for more details on RDDs). DStreams can either be created from live data (such as, data from TCP sockets, Kafka, etc.) using a org.apache.spark.streaming.StreamingContext or it can be generated by transforming existing DStreams using operations such as map, window and reduceByKeyAndWindow. While a Spark Streaming program is running, each DStream periodically generates a RDD, either from live data or by transforming the RDD generated by a parent DStream.

    This class contains the basic operations available on all DStreams, such as map, filter and window. In addition, org.apache.spark.streaming.dstream.PairDStreamFunctions contains operations available only on DStreams of key-value pairs, such as groupByKeyAndWindow and join. These operations are automatically available on any DStream of pairs (e.g., DStream[(Int, Int)] through implicit conversions.

    A DStream internally is characterized by a few basic properties:

    • A list of other DStreams that the DStream depends on
    • A time interval at which the DStream generates an RDD
    • A function that is used to generate an RDD after each time interval
  3. abstract class InputDStream[T] extends DStream[T]

    This is the abstract base class for all input streams.

    This is the abstract base class for all input streams. This class provides methods start() and stop() which are called by Spark Streaming system to start and stop receiving data, respectively. Input streams that can generate RDDs from new data by running a service/thread only on the driver node (that is, without running a receiver on worker nodes), can be implemented by directly inheriting this InputDStream. For example, FileInputDStream, a subclass of InputDStream, monitors a HDFS directory from the driver for new files and generates RDDs with the new files. For implementing input streams that requires running a receiver on the worker nodes, use org.apache.spark.streaming.dstream.ReceiverInputDStream as the parent class.

  4. sealed abstract class MapWithStateDStream[KeyType, ValueType, StateType, MappedType] extends DStream[MappedType]

    :: Experimental :: DStream representing the stream of data generated by mapWithState operation on a pair DStream.

    :: Experimental :: DStream representing the stream of data generated by mapWithState operation on a pair DStream. Additionally, it also gives access to the stream of state snapshots, that is, the state data of all keys after a batch has updated them.

    KeyType

    Class of the key

    ValueType

    Class of the value

    StateType

    Class of the state data

    MappedType

    Class of the mapped data

    Annotations
    @Experimental()
  5. class PairDStreamFunctions[K, V] extends Serializable

    Extra functions available on DStream of (key, value) pairs through an implicit conversion.

  6. abstract class ReceiverInputDStream[T] extends InputDStream[T]

    Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data.

    Abstract class for defining any org.apache.spark.streaming.dstream.InputDStream that has to start a receiver on worker nodes to receive external data. Specific implementations of ReceiverInputDStream must define getReceiver function that gets the receiver object of type org.apache.spark.streaming.receiver.Receiver that will be sent to the workers to receive data.

    T

    Class type of the object of this stream

Value Members

  1. object DStream extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped