The mutating joins add columns from y to x, matching rows based on the keys:
inner_join(): includes all rows inxandy.left_join(): includes all rows inx.right_join(): includes all rows iny.full_join(): includes all rows inxory.
If a row in x matches multiple rows in y, all the rows in y will be returned once for each matching row in x.
Usage
inner_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
na_matches = c("na", "never")
)
left_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)
right_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)
full_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)Arguments
- x, y
The
data.frames to join.- by
A character vector of variables to join by. If
NULL, the default,*_join()will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).To join by different variables on x and y use a named vector. For example,
by = c("a" = "b")will matchx.atoy.b.To join by multiple variables, use a vector with length > 1. For example,
by = c("a", "b")will matchx$atoy$aandx$btoy$b. Use a named vector to match different variables inxandy. For example,by = c("a" = "b", "c" = "d")will matchx$atoy$bandx$ctoy$d.To perform a cross-join, generating all combinations of
xandy, useby = character().- suffix
character(2). If there are non-joined duplicate variables inxandy, these suffixes will be added to the output to disambiguate them.- ...
Additional arguments to pass to
merge()- na_matches
Should
NAandNaNvalues match one another?The default,
"na", treats twoNAorNaNvalues as equal, like%in%,match(),merge().Use
"never"to always treat twoNAorNaNvalues as different, like joins for database sources, similarly tomerge(incomparables = FALSE).- keep
logical(1). Should the join keys from bothxandybe preserved in the output? Only applies toleft_join(),right_join(), andfull_join().
Value
A data.frame. The order of the rows and columns of x is preserved as much as possible. The output has the
following properties:
For
inner_join(), a subset ofxrows. Forleft_join(), allxrows. Forright_join(), a subset ofxrows, followed by unmatchedyrows. Forfull_join(), allxrows, followed by unmatchedyrows.For all joins, rows will be duplicated if one or more rows in
xmatches multiple rows iny.Output columns include all
xcolumns and allycolumns. If columns inxandyhave the same name (and aren't included inby),suffixes are added to disambiguate.Output columns included in
byare coerced to common type acrossxandy.Groups are taken from
x.
Examples
# If a row in `x` matches multiple rows in `y`, all the rows in `y` will be
# returned once for each matching row in `x`
df1 <- data.frame(x = 1:3)
df2 <- data.frame(x = c(1, 1, 2), y = c("first", "second", "third"))
df1 %>% left_join(df2)
#> Joining, by = "x"
#> x y
#> 1 1 first
#> 2 1 second
#> 3 2 third
#> 4 3 <NA>
# By default, NAs match other NAs so that there are two
# rows in the output of this join:
df1 <- data.frame(x = c(1, NA), y = 2)
df2 <- data.frame(x = c(1, NA), z = 3)
left_join(df1, df2)
#> Joining, by = "x"
#> x y z
#> 1 1 2 3
#> 2 NA 2 3
# You can optionally request that NAs don't match, giving a
# a result that more closely resembles SQL joins
left_join(df1, df2, na_matches = "never")
#> Joining, by = "x"
#> x y z
#> 1 1 2 3
#> 2 NA 2 NA
