Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package api

    Contains API classes that are specific to a single language (i.e.

    Contains API classes that are specific to a single language (i.e. Java).

    Definition Classes
    sql
  • package catalog
    Definition Classes
    sql
  • package expressions
    Definition Classes
    sql
  • package javalang
  • package scalalang
  • Aggregator
  • MutableAggregationBuffer
  • UserDefinedAggregateFunction
  • UserDefinedFunction
  • Window
  • WindowSpec
  • package hive

    Support for running Spark SQL queries using functionality from Apache Hive (does not require an existing Hive installation).

    Support for running Spark SQL queries using functionality from Apache Hive (does not require an existing Hive installation). Supported Hive features include:

    • Using HiveQL to express queries.
    • Reading metadata from the Hive Metastore using HiveSerDes.
    • Hive UDFs, UDAs, UDTs

    Users that would like access to this functionality should create a HiveContext instead of a SQLContext.

    Definition Classes
    sql
  • package jdbc
    Definition Classes
    sql
  • package sources

    A set of APIs for adding data sources to Spark SQL.

    A set of APIs for adding data sources to Spark SQL.

    Definition Classes
    sql
  • package streaming
    Definition Classes
    sql
  • package types

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Contains a type system for attributes produced by relations, including complex types like structs, arrays and maps.

    Definition Classes
    sql
  • package util
    Definition Classes
    sql
  • package vectorized
    Definition Classes
    sql
p

org.apache.spark.sql

expressions

package expressions

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. abstract class Aggregator[-IN, BUF, OUT] extends Serializable

    :: Experimental :: A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    :: Experimental :: A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.

    For example, the following aggregator extracts an int from a specific class and adds them up:

    case class Data(i: Int)
    
    val customSummer =  new Aggregator[Data, Int, Int] {
      def zero: Int = 0
      def reduce(b: Int, a: Data): Int = b + a.i
      def merge(b1: Int, b2: Int): Int = b1 + b2
      def finish(r: Int): Int = r
    }.toColumn()
    
    val ds: Dataset[Data] = ...
    val aggregated = ds.select(customSummer)

    Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird

    IN

    The input type for the aggregation.

    BUF

    The type of the intermediate value of the reduction.

    OUT

    The type of the final output result.

    Annotations
    @Experimental() @Evolving()
    Since

    1.6.0

  2. abstract class MutableAggregationBuffer extends Row

    A Row representing a mutable aggregation buffer.

    A Row representing a mutable aggregation buffer.

    This is not meant to be extended outside of Spark.

    Annotations
    @Stable()
    Since

    1.5.0

  3. abstract class UserDefinedAggregateFunction extends Serializable

    The base class for implementing user-defined aggregate functions (UDAF).

    The base class for implementing user-defined aggregate functions (UDAF).

    Annotations
    @Stable()
    Since

    1.5.0

  4. sealed trait UserDefinedFunction extends AnyRef

    A user-defined function.

    A user-defined function. To create one, use the udf functions in functions.

    As an example:

    // Define a UDF that returns true or false based on some numeric score.
    val predict = udf((score: Double) => score > 0.5)
    
    // Projects a column that adds a prediction column based on the score column.
    df.select( predict(df("score")) )
    Annotations
    @Stable()
    Since

    1.3.0

  5. class Window extends AnyRef

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date")
      .rowsBetween(Window.unboundedPreceding, Window.currentRow)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

  6. class WindowSpec extends AnyRef

    A window specification that defines the partitioning, ordering, and frame boundaries.

    A window specification that defines the partitioning, ordering, and frame boundaries.

    Use the static methods in Window to create a WindowSpec.

    Annotations
    @Stable()
    Since

    1.4.0

Value Members

  1. object Window

    Utility functions for defining window in DataFrames.

    Utility functions for defining window in DataFrames.

    // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    Window.partitionBy("country").orderBy("date")
      .rowsBetween(Window.unboundedPreceding, Window.currentRow)
    
    // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
    Annotations
    @Stable()
    Since

    1.4.0

    Note

    When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.

Ungrouped