Skip to contents

Spark SQL can cache tables using an in-memory columnar format by calling cache_table(). Spark SQL will scan only required columns and will automatically tune compression to minimize memory usage and GC pressure. You can call uncache_table() to remove the table from memory. Similarly you can call clear_cache() to remove all cached tables from the in-memory cache. Finally, use is_cached() to test whether or not a table is cached.

Usage

cache_table(sc, table)

clear_cache(sc)

is_cached(sc, table)

uncache_table(sc, table)

Arguments

sc

A spark_connection.

table

character(1). The name of the table.

Value

  • cache_table(): If successful, TRUE, otherwise FALSE.

  • clear_cache(): NULL, invisibly.

  • is_cached(): A logical(1) vector indicating TRUE if the table is cached and FALSE otherwise.

  • uncache_table(): NULL, invisibly.

Examples

if (FALSE) {
sc <- sparklyr::spark_connect(master = "local")
mtcars_spark <- sparklyr::copy_to(dest = sc, df = mtcars)

# By default the table is not cached
is_cached(sc = sc, table = "mtcars")

# We can manually cache the table
cache_table(sc = sc, table = "mtcars")
# And now the table is cached
is_cached(sc = sc, table = "mtcars")

# We can uncache the table
uncache_table(sc = sc, table = "mtcars")
is_cached(sc = sc, table = "mtcars")
}