to_labeled_point
elephas.utils.rdd_utils.to_labeled_point(sc, features, labels, categorical=False)
Convert numpy arrays of features and labels into a LabeledPoint RDD for MLlib and ML integration.
:param sc: Spark context :param features: numpy array with features :param labels: numpy array with labels :param categorical: boolean, whether labels are already one-hot encoded or not :return: LabeledPoint RDD with features and labels
from_labeled_point
elephas.utils.rdd_utils.from_labeled_point(rdd, categorical=False, nb_classes=None)
Convert a LabeledPoint RDD back to a pair of numpy arrays
:param rdd: LabeledPoint RDD :param categorical: boolean, if labels should be one-hot encode when returned :param nb_classes: optional int, indicating the number of class labels :return: pair of numpy arrays, features and labels
encode_label
elephas.utils.rdd_utils.encode_label(label, nb_classes)
One-hot encoding of a single label
:param label: class label (int or double without floating point digits) :param nb_classes: int, number of total classes :return: one-hot encoded vector
lp_to_simple_rdd
elephas.utils.rdd_utils.lp_to_simple_rdd(lp_rdd, categorical=False, nb_classes=None)
Convert a LabeledPoint RDD into an RDD of feature-label pairs
:param lp_rdd: LabeledPoint RDD of features and labels :param categorical: boolean, if labels should be one-hot encode when returned :param nb_classes: int, number of total classes :return: Spark RDD with feature-label pairs
to_simple_rdd
elephas.utils.rdd_utils.to_simple_rdd(sc, features, labels)
Convert numpy arrays of features and labels into an RDD of pairs.
:param sc: Spark context :param features: numpy array with features :param labels: numpy array with labels :return: Spark RDD with feature-label pairs