Overview

{lightning} allows the user to generate deviates from different distributions.

{lightning} is a personal learning project. I wanted to learn more about Scala invocations using {sparklyr} and chose generating random deviates from distributions using the RandomRDDs singleton object.

Installation

You can install:

  • the development version from GitHub with
# install.packages("remotes")
remotes::install_github("nathaneastwood/lightning")

Usage

{lightning} provides two methods for generating random variates. Firstly, we can generate N values from a Distribution class:

sc <- sparklyr::spark_connect(master = "local")

library(lightning)

norm <- Normal$new(sc = sc, size = 10L, num_partitions = 1L, seed = 1L)
norm$count()
# [1] 10
norm$collect()
#  [1] -0.7364418  1.1537268  0.4631666  1.7794325  0.3503825 -1.2078423
#  [7]  0.1825577 -0.2811541  0.1794811 -1.4066039
norm$first()
# [1] -0.7364418
norm$get_num_partitions()
# [1] 1

Secondly we can generate single values from a Generator class:

norm_gen <- NormalGenerator$new(sc = sc)
norm_gen$set_seed(1L)
norm_gen$next_value()
# [1] -1.032273

TODO

  • Allow the ability to map() the distributions.
  • Convert the RandomRDDs to a Spark DataFrame.

Acknowledgments

jozefhajnala and yitao-li for their help on this topic.