What is the select() function in R?

Overview

The select() function is used to pick specific variables or features of a DataFrame or tibble. It selects columns based on provided conditions like contains, matches, starts with, ends with, and so on.

Note: The num_range(), matches(), contains(), starts_with(), and ends_with() functions are some useful functions that are found in the dplyrR package package and used as filters in the select() function.

Syntax


select(.data, ...)

Parameter values

The select() function takes the following argument values:

.data: This can be a DataFrame, a tibble, or a lazy DataFrame.

...: These are unquoted expressions separated by commas, variable names, or expressions like x:y that can be used to select a range of values.

Return value

This function returns an object of the same type as .data.

Example

Here are four use cases of the select() function in R:

# load dplyr library
library(dplyr, warn.conflict = FALSE, quietly = TRUE)
# select only height feature from starwars dataset
starwars %>% select(height)
  • Line 2: We load the dplyr package in the program with the warn.conflict argument set to FALSE. This doesn't show any library compatibility warning.
  • Line 4: We invoke the select() function to filter the height feature from the starwars dataset.
# load dplyr library
library(dplyr, warn.conflict = FALSE, quietly = TRUE)
# select name to skin_color features.
starwars %>% select(name:skin_color)
  • Line 4: We select the feature columns from name to skin_color from the starwars dataset.
# load dplyr library
library(dplyr, warn.conflict = FALSE, quietly = TRUE)
# select excluding the range name:mass
starwars %>% select(!(name:mass))
  • Line 4: select(!(name:mass)) only selects features that aren't common between name:mass from the starwars dataset.
# load dplyr library
library(dplyr, warn.conflict = FALSE, quietly = TRUE)
#
iris %>% select(!ends_with("Width"))
  • Line 4: select(!ends_with("Width")) only selects feature columns of the iris dataset whose labels don't end with the keyword "width".

Note: %>% is a forward pipeline operator in R. It allows for command chaining and forwards one expression's results or values into the next expression.

Free Resources