Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark

    Core Spark functionality.

    Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.

    In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions.

    Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.

    Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.

    Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.

    Definition Classes
    apache
  • package sql

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Allows the execution of relational queries, including those expressed in SQL using Spark.

    Definition Classes
    spark
  • package hive

    Support for running Spark SQL queries using functionality from Apache Hive (does not require an existing Hive installation).

    Support for running Spark SQL queries using functionality from Apache Hive (does not require an existing Hive installation). Supported Hive features include:

    • Using HiveQL to express queries.
    • Reading metadata from the Hive Metastore using HiveSerDes.
    • Hive UDFs, UDAs, UDTs

    Users that would like access to this functionality should create a HiveContext instead of a SQLContext.

    Definition Classes
    sql
  • package execution
    Definition Classes
    hive
  • CreateHiveTableAsSelectCommand
  • HiveFileFormat
  • HiveOptions
  • HiveOutputWriter
  • HiveScriptIOSchema
  • InsertIntoHiveDirCommand
  • InsertIntoHiveTable
  • ScriptTransformationExec
  • package orc
    Definition Classes
    hive

package execution

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class CreateHiveTableAsSelectCommand(tableDesc: CatalogTable, query: LogicalPlan, outputColumnNames: Seq[String], mode: SaveMode) extends LogicalPlan with DataWritingCommand with Product with Serializable

    Create table and insert the query result into it.

    Create table and insert the query result into it.

    tableDesc

    the Table Describe, which may contain serde, storage handler etc.

    query

    the query whose result will be insert into the new relation

    mode

    SaveMode

  2. class HiveFileFormat extends FileFormat with DataSourceRegister with Logging

    FileFormat for writing Hive tables.

    FileFormat for writing Hive tables.

    TODO: implement the read logic.

  3. class HiveOptions extends Serializable

    Options for the Hive data source.

    Options for the Hive data source. Note that rule DetermineHiveSerde will extract Hive serde/format information from these options.

  4. class HiveOutputWriter extends OutputWriter with HiveInspectors
  5. case class HiveScriptIOSchema(inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], inputSerdeClass: Option[String], outputSerdeClass: Option[String], inputSerdeProps: Seq[(String, String)], outputSerdeProps: Seq[(String, String)], recordReaderClass: Option[String], recordWriterClass: Option[String], schemaLess: Boolean) extends HiveInspectors with Product with Serializable

    The wrapper class of Hive input and output schema properties

  6. case class InsertIntoHiveDirCommand(isLocal: Boolean, storage: CatalogStorageFormat, query: LogicalPlan, overwrite: Boolean, outputColumnNames: Seq[String]) extends LogicalPlan with SaveAsHiveFile with Product with Serializable

    Command for writing the results of query to file system.

    Command for writing the results of query to file system.

    The syntax of using this command in SQL is:

    INSERT OVERWRITE [LOCAL] DIRECTORY
    path
    [ROW FORMAT row_format]
    [STORED AS file_format]
    SELECT ...
    isLocal

    whether the path specified in storage is a local directory

    storage

    storage format used to describe how the query result is stored.

    query

    the logical plan representing data to write to

    overwrite

    whether overwrites existing directory

  7. case class InsertIntoHiveTable(table: CatalogTable, partition: Map[String, Option[String]], query: LogicalPlan, overwrite: Boolean, ifPartitionNotExists: Boolean, outputColumnNames: Seq[String]) extends LogicalPlan with SaveAsHiveFile with Product with Serializable

    Command for writing data out to a Hive table.

    Command for writing data out to a Hive table.

    This class is mostly a mess, for legacy reasons (since it evolved in organic ways and had to follow Hive's internal implementations closely, which itself was a mess too). Please don't blame Reynold for this! He was just moving code around!

    In the future we should converge the write path for Hive with the normal data source write path, as defined in org.apache.spark.sql.execution.datasources.FileFormatWriter.

    table

    the metadata of the table.

    partition

    a map from the partition key to the partition value (optional). If the partition value is optional, dynamic partition insert will be performed. As an example, INSERT INTO tbl PARTITION (a=1, b=2) AS ... would have

    Map('a' -> Some('1'), 'b' -> Some('2'))

    and INSERT INTO tbl PARTITION (a=1, b) AS ... would have

    Map('a' -> Some('1'), 'b' -> None)

    .

    query

    the logical plan representing data to write to.

    overwrite

    overwrite existing table or partitions.

    ifPartitionNotExists

    If true, only write if the partition does not exist. Only valid for static partitions.

  8. case class ScriptTransformationExec(input: Seq[Expression], script: String, output: Seq[Attribute], child: SparkPlan, ioschema: HiveScriptIOSchema) extends SparkPlan with UnaryExecNode with Product with Serializable

    Transforms the input by forking and running the specified script.

    Transforms the input by forking and running the specified script.

    input

    the set of expression that should be passed to the script.

    script

    the command that should be executed.

    output

    the attributes that are produced by the script.

Value Members

  1. object HiveOptions extends Serializable
  2. object HiveScriptIOSchema extends Serializable

Ungrouped