Select only distinct/unique rows from a data.frame.

distinct(.data, ..., .keep_all = FALSE)

Arguments

.data

A data.frame.

...

Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.

.keep_all

logical(1). If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

Value

A data.frame with the following properties:

  • Rows are a subset of the input but appear in the same order.

  • Columns are not modified if ... is empty or .keep_all is TRUE. Otherwise, distinct() first calls mutate() to create new columns.

  • Groups are not modified.

  • data.frame attributes are preserved.

Examples

df <- data.frame( x = sample(10, 100, rep = TRUE), y = sample(10, 100, rep = TRUE) ) nrow(df)
#> [1] 100
nrow(distinct(df))
#> [1] 63
nrow(distinct(df, x, y))
#> [1] 63
distinct(df, x)
#> x #> 1 7 #> 2 5 #> 3 9 #> 4 4 #> 5 6 #> 6 8 #> 8 3 #> 13 2 #> 16 10 #> 17 1
distinct(df, y)
#> y #> 1 2 #> 2 10 #> 3 8 #> 5 6 #> 7 1 #> 8 5 #> 11 9 #> 15 4 #> 18 7 #> 28 3
# You can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE)
#> x y #> 1 7 2 #> 2 5 10 #> 3 9 8 #> 4 4 2 #> 5 6 6 #> 6 8 6 #> 8 3 5 #> 13 2 1 #> 16 10 2 #> 17 1 8
distinct(df, y, .keep_all = TRUE)
#> x y #> 1 7 2 #> 2 5 10 #> 3 9 8 #> 5 6 6 #> 7 7 1 #> 8 3 5 #> 11 8 9 #> 15 2 4 #> 18 5 7 #> 28 9 3
# You can also use distinct on computed variables distinct(df, diff = abs(x - y))
#> diff #> 1 5 #> 3 1 #> 4 2 #> 5 0 #> 7 6 #> 9 3 #> 12 7 #> 16 8 #> 23 4
# The same behaviour applies for grouped data frames, # except that the grouping variables are always included df <- data.frame( g = c(1, 1, 2, 2), x = c(1, 1, 2, 1) ) %>% group_by(g) df %>% distinct(x)
#> g x #> 1 1 1 #> 3 2 2 #> 4 2 1