class HadoopMapReduceCommitProtocol extends FileCommitProtocol with Serializable with Logging
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.
- Alphabetic
- By Inheritance
- HadoopMapReduceCommitProtocol
- Logging
- Serializable
- Serializable
- FileCommitProtocol
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
HadoopMapReduceCommitProtocol(jobId: String, path: String, dynamicPartitionOverwrite: Boolean = false)
- jobId
the job's or stage's id
- path
the job's output path, or null if committer acts as a noop
- dynamicPartitionOverwrite
If true, Spark will overwrite partition directories at runtime dynamically, i.e., we first write files under a staging directory with partition path, e.g. /path/to/staging/a=1/b=1/xxx.parquet. When committing the job, we first clean up the corresponding partition directories at destination path, e.g. /path/to/destination/a=1/b=1, and move files from staging directory to the corresponding partition directories under destination path.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
abortJob(jobContext: JobContext): Unit
Aborts a job after the writes fail.
Aborts a job after the writes fail. Must be called on the driver.
Calling this function is a best-effort attempt, because it is possible that the driver just crashes (or killed) before it can call abort.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
def
abortTask(taskContext: TaskAttemptContext): Unit
Aborts a task after the writes have failed.
Aborts a task after the writes have failed. Must be called on the executors when running tasks.
Calling this function is a best-effort attempt, because it is possible that the executor just crashes (or killed) before it can call abort.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit
Commits a job after the writes succeed.
Commits a job after the writes succeed. Must be called on the driver.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
def
commitTask(taskContext: TaskAttemptContext): TaskCommitMessage
Commits a task after the writes succeed.
Commits a task after the writes succeed. Must be called on the executors when running tasks.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
def
deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean
Specifies that a file should be deleted with the commit of this job.
Specifies that a file should be deleted with the commit of this job. The default implementation deletes the file immediately.
- Definition Classes
- FileCommitProtocol
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
newTaskTempFile(taskContext: TaskAttemptContext, dir: Option[String], ext: String): String
Notifies the commit protocol to add a new file, and gets back the full path that should be used.
Notifies the commit protocol to add a new file, and gets back the full path that should be used. Must be called on the executors when running tasks.
Note that the returned temp file may have an arbitrary path. The commit protocol only promises that the file will be at the location specified by the arguments after job commit.
A full file path consists of the following parts:
- the base path 2. some sub-directory within the base path, used to specify partitioning 3. file prefix, usually some unique job id with the task id 4. bucket id 5. source specific file extension, e.g. ".snappy.parquet"
The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest are left to the commit protocol implementation to decide.
Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
def
newTaskTempFileAbsPath(taskContext: TaskAttemptContext, absoluteDir: String, ext: String): String
Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
Similar to newTaskTempFile(), but allows files to committed to an absolute output location. Depending on the implementation, there may be weaker guarantees around adding files this way.
Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onTaskCommit(taskCommit: TaskCommitMessage): Unit
Called on the driver after a task commits.
Called on the driver after a task commits. This can be used to access task commit messages before the job has finished. These same task commit messages will be passed to commitJob() if the entire job succeeds.
- Definition Classes
- FileCommitProtocol
-
def
setupCommitter(context: TaskAttemptContext): OutputCommitter
- Attributes
- protected
-
def
setupJob(jobContext: JobContext): Unit
Setups up a job.
Setups up a job. Must be called on the driver before any other methods can be invoked.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
def
setupTask(taskContext: TaskAttemptContext): Unit
Sets up a task within a job.
Sets up a task within a job. Must be called before any other task related methods can be invoked.
- Definition Classes
- HadoopMapReduceCommitProtocol → FileCommitProtocol
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )