to_labeled_point

elephas.utils.rdd_utils.to_labeled_point(sc, features, labels, categorical=False)

Convert numpy arrays of features and labels into a LabeledPoint RDD for MLlib and ML integration.

:param sc: Spark context :param features: numpy array with features :param labels: numpy array with labels :param categorical: boolean, whether labels are already one-hot encoded or not :return: LabeledPoint RDD with features and labels


from_labeled_point

elephas.utils.rdd_utils.from_labeled_point(rdd, categorical=False, nb_classes=None)

Convert a LabeledPoint RDD back to a pair of numpy arrays

:param rdd: LabeledPoint RDD :param categorical: boolean, if labels should be one-hot encode when returned :param nb_classes: optional int, indicating the number of class labels :return: pair of numpy arrays, features and labels


encode_label

elephas.utils.rdd_utils.encode_label(label, nb_classes)

One-hot encoding of a single label

:param label: class label (int or double without floating point digits) :param nb_classes: int, number of total classes :return: one-hot encoded vector


lp_to_simple_rdd

elephas.utils.rdd_utils.lp_to_simple_rdd(lp_rdd, categorical=False, nb_classes=None)

Convert a LabeledPoint RDD into an RDD of feature-label pairs

:param lp_rdd: LabeledPoint RDD of features and labels :param categorical: boolean, if labels should be one-hot encode when returned :param nb_classes: int, number of total classes :return: Spark RDD with feature-label pairs


to_simple_rdd

elephas.utils.rdd_utils.to_simple_rdd(sc, features, labels)

Convert numpy arrays of features and labels into an RDD of pairs.

:param sc: Spark context :param features: numpy array with features :param labels: numpy array with labels :return: Spark RDD with feature-label pairs