This function will union the records from multiple data sets returning only the requested columns (all of which are assumed to be named the same between data sets).
union_select(.data, ..., .all = TRUE)
.data | A |
---|---|
... | < |
.all |
|
A tbl_spark
or a data.frame
depending on the input, .data
.
a <- data.frame(col1 = c(1:10, 10), col2 = 6) b <- data.frame(col1 = c(1:5, 5), col2 = 4) c <- data.frame(col1 = c(0, 1, 1, 2, 3, 5, 8)) # You can union specific columns union_select(.data = list(a, b, c), "col1")#> col1 #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 #> 11 10 #> 12 1 #> 13 2 #> 14 3 #> 15 4 #> 16 5 #> 17 5 #> 18 0 #> 19 1 #> 20 1 #> 21 2 #> 22 3 #> 23 5 #> 24 8# And you can remove duplicate records union_select(.data = list(a, b, c), ends_with("1"), .all = FALSE)#> col1 #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 #> 11 0