Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package broadcast

    Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

    Spark's broadcast variables, used to broadcast immutable datasets to all nodes.

    Definition Classes
    spark
  • Broadcast

abstract class Broadcast[T] extends Serializable with Logging

A broadcast variable. Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input dataset in an efficient manner. Spark also attempts to distribute broadcast variables using efficient broadcast algorithms to reduce communication cost.

Broadcast variables are created from a variable v by calling org.apache.spark.SparkContext#broadcast. The broadcast variable is a wrapper around v, and its value can be accessed by calling the value method. The interpreter session below shows this:

scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))
broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)

scala> broadcastVar.value
res0: Array[Int] = Array(1, 2, 3)

After the broadcast variable is created, it should be used instead of the value v in any functions run on the cluster so that v is not shipped to the nodes more than once. In addition, the object v should not be modified after it is broadcast in order to ensure that all nodes get the same value of the broadcast variable (e.g. if the variable is shipped to a new node later).

T

Type of the data contained in the broadcast variable.

Linear Supertypes
Logging, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Broadcast
  2. Logging
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Broadcast(id: Long)(implicit arg0: ClassTag[T])

    id

    A unique identifier for the broadcast variable.

Abstract Value Members

  1. abstract def doDestroy(blocking: Boolean): Unit

    Actually destroy all data and metadata related to this broadcast variable.

    Actually destroy all data and metadata related to this broadcast variable. Implementation of Broadcast class must define their own logic to destroy their own state.

    Attributes
    protected
  2. abstract def doUnpersist(blocking: Boolean): Unit

    Actually unpersist the broadcasted value on the executors.

    Actually unpersist the broadcasted value on the executors. Concrete implementations of Broadcast class must define their own logic to unpersist their own data.

    Attributes
    protected
  3. abstract def getValue(): T

    Actually get the broadcasted value.

    Actually get the broadcasted value. Concrete implementations of Broadcast class must define their own way to get the value.

    Attributes
    protected

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def assertValid(): Unit

    Check if this broadcast is valid.

    Check if this broadcast is valid. If not valid, exception is thrown.

    Attributes
    protected
  6. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  7. def destroy(): Unit

    Destroy all data and metadata related to this broadcast variable.

    Destroy all data and metadata related to this broadcast variable. Use this with caution; once a broadcast variable has been destroyed, it cannot be used again. This method blocks until destroy has completed

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. val id: Long
  14. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  15. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  18. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  20. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  31. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  32. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  33. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  34. def toString(): String
    Definition Classes
    Broadcast → AnyRef → Any
  35. def unpersist(blocking: Boolean): Unit

    Delete cached copies of this broadcast on the executors.

    Delete cached copies of this broadcast on the executors. If the broadcast is used after this is called, it will need to be re-sent to each executor.

    blocking

    Whether to block until unpersisting has completed

  36. def unpersist(): Unit

    Asynchronously delete cached copies of this broadcast on the executors.

    Asynchronously delete cached copies of this broadcast on the executors. If the broadcast is used after this is called, it will need to be re-sent to each executor.

  37. def value: T

    Get the broadcasted value.

  38. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )

Inherited from Logging

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped