The mutating joins add columns from y
to x
, matching rows based on the keys:
inner_join()
: includes all rows inx
andy
.left_join()
: includes all rows inx
.right_join()
: includes all rows iny
.full_join()
: includes all rows inx
ory
.
If a row in x
matches multiple rows in y
, all the rows in y
will be returned once for each matching row in x
.
Usage
inner_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
na_matches = c("na", "never")
)
left_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)
right_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)
full_join(
x,
y,
by = NULL,
suffix = c(".x", ".y"),
...,
keep = FALSE,
na_matches = c("na", "never")
)
Arguments
- x, y
The
data.frame
s to join.- by
A character vector of variables to join by. If
NULL
, the default,*_join()
will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).To join by different variables on x and y use a named vector. For example,
by = c("a" = "b")
will matchx.a
toy.b
.To join by multiple variables, use a vector with length > 1. For example,
by = c("a", "b")
will matchx$a
toy$a
andx$b
toy$b
. Use a named vector to match different variables inx
andy
. For example,by = c("a" = "b", "c" = "d")
will matchx$a
toy$b
andx$c
toy$d
.To perform a cross-join, generating all combinations of
x
andy
, useby = character()
.- suffix
character(2)
. If there are non-joined duplicate variables inx
andy
, these suffixes will be added to the output to disambiguate them.- ...
Additional arguments to pass to
merge()
- na_matches
Should
NA
andNaN
values match one another?The default,
"na"
, treats twoNA
orNaN
values as equal, like%in%
,match()
,merge()
.Use
"never"
to always treat twoNA
orNaN
values as different, like joins for database sources, similarly tomerge(incomparables = FALSE)
.- keep
logical(1)
. Should the join keys from bothx
andy
be preserved in the output? Only applies toleft_join()
,right_join()
, andfull_join()
.
Value
A data.frame
. The order of the rows and columns of x
is preserved as much as possible. The output has the
following properties:
For
inner_join()
, a subset ofx
rows. Forleft_join()
, allx
rows. Forright_join()
, a subset ofx
rows, followed by unmatchedy
rows. Forfull_join()
, allx
rows, followed by unmatchedy
rows.For all joins, rows will be duplicated if one or more rows in
x
matches multiple rows iny
.Output columns include all
x
columns and ally
columns. If columns inx
andy
have the same name (and aren't included inby
),suffix
es are added to disambiguate.Output columns included in
by
are coerced to common type acrossx
andy
.Groups are taken from
x
.
Examples
# If a row in `x` matches multiple rows in `y`, all the rows in `y` will be
# returned once for each matching row in `x`
df1 <- data.frame(x = 1:3)
df2 <- data.frame(x = c(1, 1, 2), y = c("first", "second", "third"))
df1 %>% left_join(df2)
#> Joining, by = "x"
#> x y
#> 1 1 first
#> 2 1 second
#> 3 2 third
#> 4 3 <NA>
# By default, NAs match other NAs so that there are two
# rows in the output of this join:
df1 <- data.frame(x = c(1, NA), y = 2)
df2 <- data.frame(x = c(1, NA), z = 3)
left_join(df1, df2)
#> Joining, by = "x"
#> x y z
#> 1 1 2 3
#> 2 NA 2 3
# You can optionally request that NAs don't match, giving a
# a result that more closely resembles SQL joins
left_join(df1, df2, na_matches = "never")
#> Joining, by = "x"
#> x y z
#> 1 1 2 3
#> 2 NA 2 NA