object Correlation
API for correlation functions in MLlib, compatible with DataFrames and Datasets.
The functions in this package generalize the functions in org.apache.spark.sql.Dataset#stat to spark.ml's Vector types.
- Annotations
 - @Since( "2.2.0" ) @Experimental()
 
- Alphabetic
 - By Inheritance
 
- Correlation
 - AnyRef
 - Any
 
- Hide All
 - Show All
 
- Public
 - All
 
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      
- Attributes
 - protected[java.lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @native() @throws( ... )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        corr(dataset: Dataset[_], column: String): DataFrame
      
      
      
Compute the Pearson correlation matrix for the input Dataset of Vectors.
Compute the Pearson correlation matrix for the input Dataset of Vectors.
- Annotations
 - @Since( "2.2.0" )
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        corr(dataset: Dataset[_], column: String, method: String): DataFrame
      
      
      
:: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method.
:: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method. Methods currently supported:
pearson(default),spearman.- dataset
 A dataset or a dataframe
- column
 The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects.
- method
 String specifying the method to use for computing correlation. Supported:
pearson(default),spearman- returns
 A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name '$METHODNAME($COLUMN)'.
- Annotations
 - @Since( "2.2.0" )
 - Exceptions thrown
 if the column is not a valid column in the dataset, or if the content of this column is not of type Vector. Here is how to access the correlation coefficient:
val data: Dataset[Vector] = ... val Row(coeff: Matrix) = Correlation.corr(data, "value").head // coeff now contains the Pearson correlation matrix.
- Note
 For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input Dataset before calling corr with
method = "spearman"to avoid recomputing the common lineage.
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      
- Attributes
 - protected[java.lang]
 - Definition Classes
 - AnyRef
 - Annotations
 - @throws( classOf[java.lang.Throwable] )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      
- Definition Classes
 - AnyRef → Any
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      
- Definition Classes
 - Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native()
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      
- Definition Classes
 - AnyRef
 
 - 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      
- Definition Classes
 - AnyRef → Any
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @throws( ... )
 
 - 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      
- Definition Classes
 - AnyRef
 - Annotations
 - @native() @throws( ... )