This function will union the records from multiple data sets returning only the requested columns (all of which are assumed to be named the same between data sets).

union_select(.data, ..., .all = TRUE)

Arguments

.data

A list() of data.frames or tbl_sparks.

...

<tidy-select> One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data.frame, so expressions like x:y can be used to select a range of variables.

.all

logical(1). Whether to keep duplicate records (def: TRUE) or not (FALSE).

Value

A tbl_spark or a data.frame depending on the input, .data.

Examples

a <- data.frame(col1 = c(1:10, 10), col2 = 6) b <- data.frame(col1 = c(1:5, 5), col2 = 4) c <- data.frame(col1 = c(0, 1, 1, 2, 3, 5, 8)) # You can union specific columns union_select(.data = list(a, b, c), "col1")
#> col1 #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 #> 11 10 #> 12 1 #> 13 2 #> 14 3 #> 15 4 #> 16 5 #> 17 5 #> 18 0 #> 19 1 #> 20 1 #> 21 2 #> 22 3 #> 23 5 #> 24 8
# And you can remove duplicate records union_select(.data = list(a, b, c), ends_with("1"), .all = FALSE)
#> col1 #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 #> 11 0